Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

    (>= 6 months of 1 gig of data per day)
You can parse JSON at several GB/s: https://github.com/simdjson/simdjson And you could scale that by one or two orders of magnitude with thread-based parallelism on recent AMD Epyc or Intel Xeon CPUs. So parsing alone should not pose a problem (maybe even sub-second for 6 months of data). We would need a more precise problem statement to judge whether horizontal scaling is needed.


> https://github.com/simdjson/simdjson

Was not aware of this but seems it is not there natively in Python,but seems cool. Will try out in future.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: