this is incredible - give me unicode support in python 2. asyncio ? probably .. ...

masklinn · on Dec 10, 2016

> give me unicode support in python 2

Nope. Can't be done without getting Python 3 either way, because Python 3's text model is not compatible with Python 2's. That is why the core team allowed the other breaking changes, because software was going to be broken in the first place.

sandGorgon · on Dec 10, 2016

not entirely true - http://python-future.org/unicode_literals.html

it's not 100% seamless, but its almost there.

smallnamespace · on Dec 10, 2016

> Changing to `unicode_literals` will likely introduce regressions on Python 2 that require an initial investment of time to find and fix. The APIs may be changed in subtle ways that are not immediately obvious.

Unless you're willing to put in the time and energy to extensively test, this is basically a recipe for disaster. You're basically paying at least 80% of the cost of a full Py2 -> Py3 port (since looking for regressions is a huge part of that cost), for only a fraction of the benefit.

dagss · on Dec 10, 2016

Being able to change things in one part of the code base at the time -- or one feature/data field at the time, is a HUGE thing. For production systems you just cannot put the backlog on pause for a month or two while porting. You can however incrementally fix over the course of some years.

smallnamespace · on Dec 10, 2016

That's probably true, but unicode_literals isn't the right tool to make an incremental port, because it neither obeys good py2 nor py3 text handling conventions.

It would become a substantial detour and end up being a kit more total work.

Also, porting can be done in parallel to normal dev. You aim your port at a specific release while you continue fixing bugs. When the port is done, you port over all the patches. Repeat until port and original version converge.

dagss · on Dec 10, 2016

Isn't it as simple as a) do proper string handling in py2 unicode and b) if you write more u"" in a file than "", flip the switch and write "" and b"" instead?

smallnamespace · on Dec 10, 2016

Not quite, because the py2 library is often not compatible with unicode literals, and also because if you flip the switch in a module it changes the API of any functions that returned strings. That might break calling code from other modules.

So then you might have to add code to work around these issues, code that is needed in neither pure py2 or pure py3 -- hence making it a weird detour.

In general, it is more compelling to use `unicode_literals` when back-porting new or existing Python 3 code to Python 2/3 than when porting existing Python 2 code to 2/3 [1]

[1] http://python-future.org/unicode_literals.html

dagss · on Dec 10, 2016

Right.. our codebase is already unicode all over the place because otherwise we could not i18n properly. So we basically have py2 unicode-correct code with encode/decode in the proper places for interfacing witg stdlib. I didn't consider people might use str and not unicode for text..

dagss · on Dec 10, 2016

In fact first step is never use "", always use b"", I guess..

masklinn · on Dec 10, 2016

As the page notes, many Py2 APIs are completely broken with unicode literals, some noisily and others silently (due to implicit ascii encoding/decoding) leading to hard to diagnose regressions in the codebase under Python 2.

I think leveraging PEP414 and carefully using bytes, unicode or "native" literals is a much more resilient mode of operation.

ubernostrum · on Dec 10, 2016

give me unicode support in python 2

If by this you mean "give me exactly Python 3's text model without breaking my unmodified Python 2 code", you should be aware that A) this is impossible because B) the Python 3 text model is backwards-incompatible with unmodified Python 2 code, and C) that's kinda why Python 3 existed and was backwards-incompatible.

orf · on Dec 10, 2016

Python 2 supports unicode and asyncio.

sandGorgon · on Dec 10, 2016

i know it does - but it would be nice to have the everything-is-unicode mechanism of python 3. Asian developers hit unicode problems before US based developers because of the natural differences in underlying OS language.

lloeki · on Dec 10, 2016

But then it wouldn't be compatible with Python 2 code, so you might as well use Python 3

sandGorgon · on Dec 10, 2016

http://python-future.org/unicode_literals.html

nope - you can mix them in python 2 code. The way it should have been done in the first place.

orf · on Dec 10, 2016

... You can't have your cake and eat it. If you want Unicode everywhere then you use Python 3, because it's a breaking change. Saying you'd want it in Python 2 is a confused way of saying "I need to use Python 3".