Hacker News new | past | comments | ask | show | jobs | submit login

Any chance of something similar for CSV? (full RFC-4180 including quotes, escaping etc).

Terabytes of "big data" get passed around as CSV.




CSV is on our list; this is a simpler task than JSON due to the absence of arbitrary nesting.


I doubt someone using CSV for big data is going to follow that rule...


What do you mean? It's not a rule, it's just not possible in the CSV format to have arbitrary nesting.


It's probably relevant to mention https://github.com/BurntSushi/rust-csv. It uses a state machine (which seems to be the author's expertise) to parse CSVs really fast. Based on some other work, you can do better if you use some of the new SIMD instructions.


I've developped a full RFC compliant CSV parser with Python bindings and supporting SSE4 to AVX-512 instruction sets, however i'm struggling with my hierarchy to open-source it at the moment.

But, the goal of my message is not to tease you with an unavailable code. It's just to say it is a lot more simpler to write a CSV parser than a JSON parser.

So, do not hesitate to write one yourself ! It's easy and a nice way to introduce yourself to SIMD instructions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: