Hacker News new | past | comments | ask | show | jobs | submit login

OP mentioned that Elixir is better at concurrency and parallelism. Python is better at processing (more libraries/toolsets available).

As someone who wants to use Elixir more and see its community flourish, what libraries does Elixir need so this project could be done without Python?

I'm guessing some sort of Beautiful Soup or Nokogiri equivalent?




No, actually, there is nothing lacking in Elixir ecosystem if you want to write a web scraper! That's the point. I used HTTPoison for sane (and efficient) HTTP requests and I could have used (and used in some other projects) Floki (https://github.com/philss/floki) for HTML parsing and querying.

However, there are things like generating PDFs with graphs based on the tabular data on the pages or running some more involved Pandas TimeSeries transformations which are simply not available in Elixir. Nor they should be, I think: reportlab or Pandas are already written and do a good job at what they are meant to do. This is the idea: we write a crawler in Elixir and delegate processing to something else. Anything else, in fact: Python was chosen because of ErlPort and how easy it is to integrate, but your workers can in practice be written in anything that understands JSON.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: