Hacker News new | past | comments | ask | show | jobs | submit login

If both parties implement RFC 4180 and use a consistent character set encoding then I don't think there are actually any undefined behaviors. But in practice a lot of implementations are simply broken, including those from major tech companies that ought to know better.



I don't think RFC 4180 differentiates between an empty string and a null value. As long as you add a check that all string columns are free of empty values before writing you should be good.

I think in polars it's

    df.filter(pl.col(pl.Utf8).str.len_bytes() == 0).shape[0] == 0
although there's probably a better way to write this.


Well I would consider differentiation between empty string versus null as simply being out of scope for CSV rather than undefined behavior. It was never intended as a complete database dump format.


And the application doesn't try to convert the cells into non-string data types like numbers, dates, etc.


Converting strings into other data types is out of scope for CSV, not really undefined behavior. The type conversions happen at a later stage of the import process.


It's out of scope for the RFC, but it could still be undefined behavior for the import/export process.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: