Hacker News new | past | comments | ask | show | jobs | submit login

Armin has backed off of this stance since then. And for good reason.

As someone who works with Python text processing extensively, I can tell you that the Python 2.7 text model is broken and dangerous, due to the silent bytes-unicode coercion and misguided use of ascii instead of UTF-8 as the default text encoding. Many people don't realize this and will argue that it's not broken, because they have never fed non-ascii text through their app to watch it blow up! And once they realize that they have a problem, they then have to deal with a rat's nest of silent bytes-unicode coercions happening implicitly all over their app, sometimes impossible to deal with due to library code outside their control.

There is a good discussion to be had on whether a language should prioritize bytes or unicode strings as the main data type, but there is no excuse for the "ticking timebomb" string data type design that pre-3 Python has with strings and the default encoding.

For this reason alone I'm very happy that 2.7 is starting to lose its grip. Its continued support is a problem, and I have no love for people who are trying to hold on to it.

There are many other features in 3 that I can no longer live without - most of them now available through backports modules - but types and asyncio can't be easily backported either, and people are starting to use them extensively.




Yeah, but if you are dealing only with a subset of the English Language in the U.S., and your API endpoint that you are scraping wants to serve to all peoples in all locales in all situations, you are fucked if you want to use Python3 and its csv module.

You genuinely are better off using Python 2.7.x and its naive approach to text.


I don't understand what you mean by "your API endpoint that you are scraping wants to serve to all peoples in all locales in all situations".

That would mean to me that the API endpoint could be sending me Unicode, in which case Python 3's Unicode-aware CSV is going to work great, and Python 2's csv is fucked. The limitations of Python 2's csv module was one of the key points that moved my company to Python 3.

On Python 3, if you want to be naive about text (not sure why you're celebrating only working in a subset of English, but you have this option), you could open the file as Latin-1 and get the same results as Python 2.

Many CSVs are made with Excel. Excel's only form of Unicode CSV is tab-separated UTF-16. Python 2's csv can't parse those at all, can it?


> Python 2's csv can't parse those at all, can it?

Nope, not without re-encoding to UTF-8 before parsing (learned that out the hard way and found out it's easier to just take excel files as input).

P2's CSV module works byte-based, and basically only handles ASCII-compatible supersets, assuming your special characters (quote chars, field and record separators) are straight ASCII.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: