Hacker News new | past | comments | ask | show | jobs | submit login

My main takeaway is that if you care about performance so much, just ETL the data to be avro, thrift, protobuf, hdf5, netcdf, parquet, arrow, anything but plain text.



When I was grinding away on the 128B edge graph on my laptop, literally 85% of the time was parsing integers. Get your data to native binary as soon as you can (edit: or out of plain text at least, per parent).


Yep. After working with JSON logs from 50GB up, converting them to Parquet (plus snappy compression) was really liberating. This also led me to Apache Drill, one of the best onprem data exploration tools IMO




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: