“Worked?” Nothing about the Python 2 to 3 transition “worked” and people are sti...

BiteCode_dev · on March 28, 2020

I've been coding in Python for 15 years, and from what I see in the numerous places I get to work, yes, it worked.

It's just that aghast people are the most vocal. You don't hear the 90% of people that are happy. They don't take the time to speak up. But the unhappy complain all the time.

ComputerGuru · on March 29, 2020

I’m not speaking about developers complaining, I’m speaking about the fact that it’s still a fractured ecosystem and plenty of popular programs, packages, and scripts still in use or even still actively developed rely on or integrate with Python 2 (exclusively).

joshuamorton · on March 29, 2020

> I’m not speaking about developers complaining, I’m speaking about the fact that it’s still a fractured ecosystem and plenty of popular programs, packages, and scripts still in use or even still actively developed rely on or integrate with Python 2 (exclusively).

Such as? What actively developed tools or ecosystems rely or integrate with python2 exclusively?

BiteCode_dev · on March 29, 2020

There are people still using Windows XP.

There are programs still using COBOL.

That's life in IT.

rpedela · on March 28, 2020

Yeah I'm one of the happy ones. I always loathed Python 2 and text handling, Python 3 was a huge improvement from my perspective.

slovenlyrobot · on March 28, 2020

I'm several years into the "upgrade" and find myself still swearing daily at the idiocy of the whole thing. Hundreds of scripts used maybe once a year, such as 'dups.py' I tried to run today, broken by a missing parenthesis, and a function moved around.

Utterly pointless and reputationally ruinous. I don't do serious work in Python any more

joshuamorton · on March 28, 2020

That has...nothing to do with python3 though, that's just your broken code being broken.

slovenlyrobot · on March 30, 2020

I recognize it may be difficult to understand, but this is a thread about backwards compatibility. Of course if I had any faith in the contemporary Python community, I would still be treating Python as a serious programming language and not be suggesting this is a difficult concept for someone to grasp.

joshuamorton · on March 30, 2020

I'm not sure what you're getting at. Perhaps if you could translate to less-smug, the rest of us could understand.

slovenlyrobot · on March 30, 2020

There's really not much to it, but if it's a foreign concept it may require a kind of conceptual leap. Essentially, this code worked perfectly well, and suddenly it no longer worked. From your perspective, as you quite effortlessly put it, this code is broken. For others elsewhere in more conservative parts of the world, it was not my code that broke, but the surrounding ecosystem.

There is a fabulously unending depth to explore in the chasm lying between these opposing world views. It would be more than possible to write a book on the topic and fail to cover it all, however here are some of the most important aspects, from my perspective at least:

* given a perfectly functional tool relied on heavily by its user to perform their job, and given that tool suddenly decides to change shape such that it no longer fits the user's hand without retraining, nor fits with the remainder of the user's toolset, including custom tools the user has invested in producing, the continued utility of the no-longer-functioning tool is called into question, along with a deserved reappraisal of the tool's applicability in the context of the user's original intended problem domain.

* when the reason for its reshaping is to solve what are highly important problems from the perspective of the tool, but much less so from the perspective of the average user, and that user's application of the tool to real world problems, it can no longer be said that the tool is simply an implement that may be called and relied upon at any point in future -- the tool develops a chaotic life and importance all of its own, and may choose to reshape once again at any future moment (and indeed in this case it has). It is no longer a tool, but some sentient entity demanding unpredictable ongoing costs and attention paid all of its own.

* given a tool that promises to cease functioning 'correctly' at any future moment based on its own whim, preferences, industry fashions and styles, in an ecosystem where many similar such tools exist that explicitly promise not to cease functioning over the same time period, it is a fool's errand to pick the tool that promises to externalize additional costs on the user when alternatives exist that avoid any such cost.

* given tool designers who externalize almost frivolously minor technical costs on to every user, where each 'minor' change is amplified perhaps 10000 times over and directly translates into expensive engineering time, the question is easily raised whether the philosophy of the tool is appropriate for its advertised utility, and whether continued reliance on the tool makes business sense. In economic terms, what was the cost to productivity of the retraining and re-tooling of users compared to any alleged future productivity improvement?

* had I written these scripts in bash, C# or C++, they would not have broken even remotely to the same degree. Of course these are not some completely unevolving entities either, however all take the promise of forwards compatibility deadly seriously, and it is more than possible to find 10-20 year old programs written in C++ or bash that continue functioning to the present day. From my perspective, they are therefore excellent and highly dependable tools.

joshuamorton · on March 30, 2020

I think in your rush to appear superior, you missed the much simpler explanation. Your first comment was easy to misunderstand:

> print vs print() is what I think the parent comment was referring to.

as another person mentioned is likely what you meant, but it was easy to take your comment as meaning that you had mismatched parens somewhere, which would imply broken python2, as well as 3.

> * given a tool that promises to cease functioning 'correctly' at any future moment based on its own whim, preferences, industry fashions and styles, in an ecosystem where many similar such tools exist that explicitly promise not to cease functioning over the same time period, it is a fool's errand to pick the tool that promises to externalize additional costs on the user when alternatives exist that avoid any such cost.

This is clearly a falsehood, if the tool provides additional value. To bring things back to the topic at hand, if C++ were allowed to break ABI compatibility in very specific ways, it could be faster. stl container types are slower than many third party ones (absl, for example).

Which is to say that if you want "the best" that C++ has to offer, you have to be willing to have your libraries make backwards incompatible changes.

To jump back to python,

> given tool designers who externalize almost frivolously minor technical costs on to every user, where each 'minor' change is amplified perhaps 10000 times over and directly translates into expensive engineering time

I disagree that this happened. The examples you give are trivially fixable with 2to3. There are harder problems that 2to3 doesn't solve, but it sounds like you don't have any, so the frivolously minor technical costs are frivolously minor, and translates into running `2to3 my_code/` one time, to fix all the missing parens and most of the moved functinos.

> bash that continue functioning to the present day

I have yet to encounter a 20 year old program written in bash that functions to this day. It might be syntactically valid, but it won't do what it intended to do.

slovenlyrobot · on March 31, 2020

I mistakenly thought you wanted clarity on my reasoning, instead it seems I've been made a fool of by providing an opportunity for you to argue a position I already understood and couldn't care less about. It's funny, this is also pretty much the reason I stopped relying on Python.

You're welcome to absorb all the externalized costs your heart desires, but in future please consider reviewing HN's rules before bandying attacks like "smug" and "superior".

joshuamorton · on March 31, 2020

Understanding ones views doesn't imply agreeing with them. A tool is only useful if, well, it's useful. A slow C++ is less useful than a fast one. So how is it that you can claim so strongly that the value from backwards compatibility is greater than the value from speed by default?

Bluntly, if I could break the ABI to get a 10% speed boost across the board, is that not worth it?

And I'm well aware of the site guidelines. I certainly don't think asking someone to tone down the holier-than-thou in their comments is a violation of them. Just the opposite, it's encouraged. So I'll continue to ask that you do so if you choose to respond.

jjnoakes · on March 29, 2020

print vs print() is what I think the parent comment was referring to.

mjevans · on March 28, 2020

The Unicode issue is one of the most painful sticking points. Handle legacy filesystems with the chance of filesystems that have previously valid (non / non \0 containing filenames) files in various encoding soup nightmares? Python3 is NOT the the tool for that job! Which means you can't write any systems tool stuff in Python3 because there's a good chance it'll blow up unexpectedly, or that you suddenly have to handle all sorts of things that in any other language you can just GIGO and move on.

BiteCode_dev · on March 29, 2020

Python 3 promotes the most likely scenario: you are on a modern system with normal looking filenames.

It would be unwise to design an API that promotes a niche need like dealing with a legacy file system with corrupted file names. This is what Python 3 fixed, making the easy things easy, and the complicated things possible, not the other way around.

But Python 3 is absolutely up to the task of handling legacy file systems with random encoding mixed in, you just need to tell it explicitly you are doing so.

Let's create a file with a completely garbage name, made of random bytes, which is allowed on Unix:

    >>> import os, sys
    >>> sys.version_info          
    sys.version_info(major=3, minor=7, micro=5, releaselevel='final', serial=0)
    >>> with open(os.urandom(32), 'wb') as f: 
    ...     f.write(os.urandom(200))

If you pass bytes to any file system function, it will return file names as bytes:

    >>> filename = os.listdir(b'.')[0]
    >>> type(filename)
    <class 'bytes'>
    >>> filename[:10]   
    b'\xf8-U\xa5\x1dq\xad?\xbf\xa2'

And you can just open that:

    >>> data = open(filename, 'rb').read() 
    >>> data[:10]
    b'E\x05\xce*M \xf5\xfeK\x18'
    >>> type(data)
    <class 'bytes'>

You do exactly the same as what you did with Python 2, and treat the files as raw bytes entirely, without thinking about the content.

Some API in Python require text. If you want to pass the file names to those API, you can use surrogate escape, which let you convert back and forth between arbitrary bytes and utf8 text, without loosing information:

    >>> as_text = filename.decode('utf8', errors='surrogateescape')
    >>> as_text[:10]
    '\udcf8-U\udca5\x1dq\udcad?\udcbf\udca2'
    >>> type(as_text)
    <class 'str'>
    >>> as_text.encode('utf8',  errors='surrogateescape') == filename
    True

This is a good thing, it forces the dev to be explicit about the places in your code where you are dealing with a specific scenario. It also makes you pay the price of doing so opt in, not opt out.

If you have to do a robust version of this with Python 2, you will have to do that anyway: at some point mixed encoding will bite you if you don't have a neutral representation for them. Python 2 gave you the illusion of robustness, because it said "yes" to most operations.

I remember quite well that a lot of Python 2 programs didn't work in Europe because your user directory would contain your name, which could be non ascii. Python 2 programs are opt in to deal with it. It's the opposite philosophy, and caused so many crashes.

BiteCode_dev · on March 29, 2020

I failed to mention that it's only useful to do this manually if you need to decode path with mixed encodings, or share said path with the rest of the world.

If all you need is to work with arbitrary mixed bags of file names, you can just pass "str", and Python will use automatically and transparently surrogateescape everywhere.

afiori · on March 29, 2020

I think that the main complaint is that the early versions of python 3 had deep issues with this and it took years to fix them.

BiteCode_dev · on March 29, 2020

The first versions of python 3 were indeed, not suitable for serious work. Python 3 started to be usable from 3.3, interesting from 3.4, comfortable from 3.5, and objectively way better than 2.7 from 3.6.

It's not a surprise, as most softwares need iterations to get get good. Python 2.7 has not started as the amazing tool it is now, and as I started my career with 2.4, you remember some funny stuff.

This is why Python 2.7 was kept around for 13 years after Python 3 first came out.

Now, 3.6 came out in 2016. It solves many issues 2.7 had, and add tons of goodies. It's very ergonomic, can be installed easily. It's a great software.

It's 2020, let's enjoy the goodness of Python 3.

afiori · on March 29, 2020

But this is exactly why people keep piling on about python 3, as an "upgrade" it was worse than what tried to replace for 8 years.

BiteCode_dev · on March 29, 2020

Sure.

And now it has been better for the last 5 years, and does solve the problems it intended to solve.

afiori · on March 29, 2020

No one disputes that today's Python 3 is better that today's Python 2, especially for new projects. (A point can be made that Python 2 now is perfectly frozen in time and so 100% stable, but most people do not care that much)

The only thing this means is that the botched upgrade did not end up killing Python; it says nothing about whether it was done badly or not.

(I have nothing against python, I just believe it is important to understand why and how what happened happened to avoid similar errors in the future)