By experience, people with large Python projects often overblown the difficulty of porting in their head. Unless you have a very rare irreplaceable dependency or some terrible C extension, porting is easy.
It's tedious yes. Boring even. But most projects get away with 2 weeks of investment. And yes, it pays back. Python 3 is a vastly superior language when it's about introducing less bugs or debugging existing ones. It's just not stuff what had be sold to the public. People want to hear about better perf or fancy features, not better error messages or banned logical errors.
The hardest code base I saw ported was the Twisted project. It's a good counter example: it was long, hard, and required very gifted people (thanks Hawkowl !).
But most projects are not like that. They are full of dumb operational logic that can be ported 80% by 2to3, and a bit of manual fix.
The fact is, people love to complain, so you will hear massively people with that particular example from the movie industry, or this guy who coded this Fortran extension that was in such a tough situation. Well guess what, that's not what most migrations are about.
Most migrations are Django/flask websites, math teachers exercises, physicists scripts, sysadmin tools, etc. Straightforward stuff. I should know, I've been moving from industry to industry for 10 years, all my clients are different, but most of them end up with similar stuff in their repo because Pareto is a thing.
Nonetheless, the screams from the first ones scared the later.
I think it's less about the difficulty of manual changes and more about the cost to verify that all of the changes actually work.
Dynamic typing on the language side, and a less than perfect test suite on the user side are not a good combination for large projects facing a project wide migration.
+1 to that. Add on top of it that by design, a lot of Python coders are not professional programmers, and you get yourself a better model of why migrating was so long. People coded an untested unstructured project that worked on their machine. It was enough to get the job done, and was a big reason they choose this language in the first place. But migrating that is not fun. Plus they didn't understand their own system, so touching it on such large scale seemed overwhelming, and hence the "it's too hard".
It's about risk. Python is a scripting language. That means that it's often auxiliary to a primary programming language. Your project might be in C++ but you use Python to package it or generate some assets, etc. Do you really care what language your auxiliary scripts are written in? No. Are you going to spend any amount of time converting them to Python 3 syntax? No. Your boss doesn't care, and you don't care. It would only end badly.
Scripting is only one use of python, it's also a math language that is used for implementing servers, machine learning and statistical analytics, data processing pipelines, and more. Numpy/Pandas allows fast math operations because it makes python the interface for operations implemented on c-arrays underneath the surface. For many many people, python is the primary programming language. The ability to use the same language for scripting as for your primary application is part of the charm of python.
The existence of the cloud means that you can scale wide instead of scaling tall, which makes developer time and complexity management more valuable than hardware efficiency. Python is optimized for primary applications in exactly that kind of environment, where the performance characteristics of something like C++ don't make up for the clumsiness of the language.
> Unless you have a very rare irreplaceable dependency or some terrible C extension, porting is easy.
In my very brief experience with python over the last few weeks this is very common. Half of our dependencies were abandoned before python 3 existed, when mercurial shuts off the hg we'll even lose the source to some of them. Python 3 get's the blame but the real problem is that the company has ignored maintenance for decades, it's entirely there own fault. There's no "business value" in maintenance until the whole lot needs to be rewritten.
Another underlying cause is package tools like pip, they make it too easy to take on dependencies with zero thought given to their maintenance windows and management think the can just ignore upgrading/replacing them regularly.
> when mercurial shuts off the hg we'll even lose the source to some of them.
Are you perhaps thinking of when Atlassian removes the Mercurial repositories hosted on Bitbucket?
Mercurial isn't going anywhere or shutting down anything; it's open-source software with some very large users and committed developers.
(The majority of engineers at Facebook all work in one enormous Mercurial repo. Facebook naturally employs a few people to work full-time on Mercurial, and I don't think they're the only ones.)
Atlassian's decision about Bitbucket is regrettable. Especially regrettable is to actually delete repos for so many open-source projects that may not have active maintainers (rather than keep them online but read-only.) To my mind it marks a stain on their reputation that should make anyone think twice about relying on Atlassian for years to come.
But you should be able to avoid losing the source to any of your own dependencies. Between now and May 2020, go through all your dependencies and make sure you make your own clone of all the source repos.
Yeah we have this particular problem, a Python program written in Python2 as late as 2014 for RHEL6, using Cython extensions. It compiles and packages fine for RHEL7, but when I run it on it just crashes upon receving a UDP packet from the network.
The original programmer is long gone, and I while I am a semi-competent Python scripter and a slightly above average C programmer, I'm not looking forward to having to dig deep into this code, and patch it up to keep running.
> By experience, people with large Python projects often overblown the difficulty of porting in their head. ... most projects get away with 2 weeks of investment.
Where did you come up with this "2 weeks" estimate? That has not at all been my experience.
> What has been your experience [converting Python 2 to 3]?
Many months.
Step zero is education. Your team has been programming in Python 2. You need to make sure they know the differences in the environments, how to use create cross-compatible code, and how to use six. They also need to setup a second dev enthronement, and become comfortable switching between 2 and 3 regularly.
Step 1/2 is prioritizing. Determine how important is the move to Python 3, and what other features and dev work will you have to sacrifice to make it happen. While having a nice plan in place may provide some level of comfort to management, you can be sure it will thrown out, amended, extended, and/or ignored throughout the project. Upgrading Python dev environments brings no near or mid term value to the company. So expect developers to continue their current work-load while also attending to this tech debt.
Step one is getting the libraries into shape. This is easy if your project only relies on actively developed libraries with good teams behind them, but nearly impossible for abandoned libraries with no new commits in the last few years. Sure in retrospect, it was a bad idea to use these libraries, but at the time, they were incredibly powerful or popular. Generally, for abandoned libraries, it's easier to find a newer Python 3 library than work on fixing someone else's code. But this may mean rewriting large parts of an app, and may alter functionality, so constant communication with product mangers and a flexible approach is necessary.
Second step is working on your own code. One or two modules is no problem. But more than a dozen takes time. Scripts like 2to3 are not helpful, you need to use tools like six and modernize.
Third step is testing. If you have a good team, then you should already have good tests with known coverage. In that case, you should make sure your coverage matches, and take a very hard look at the code that is not covered. If your tests cover less than say 60% of code, you need to invest quite a bit of time of either testing directly, or building many more tests.
Step 4 is partial rollout. There will inevitably be issues you didn't think about or catch, so you need to plan the roll-out carefully and either split traffic or at the very least be able to quickly revert.
Step 5 is to watch carefully for customer complaints. Any complaint may or may not be related to the switch from 2 to 3, and you need someone on your team who stays on top of that, and can communicate the issue to the correct module owner.
Step 6 is deciding when to drop support for Python 2 altogether, as there will have to be a time where you have the two environments running side-by-side while you're testing. After all of this work, you'd think this part would be easy, but in every organization some folks will be wary about dropping support for something that already works.
Obviously, this isn't just a linear progression, you can do some of these in parallel, and will likely have to take a step back a few times. My back-of-the napkin, experience based, non-scientific estimate suggests that a 100k LOC project can be managed by two developers in two-three weeks, but a 1M LOC project (including libraries) is roughly 5-8X that much work.
The entire science stack is "terrible C extensions". This is also the kind of tight-budgeted stuff that needs relatively rare Python/C developers, not just your average overpaid Python web developer.
I'm not saying we would've cured cancer if people didn't have to deal with this bullshit, but it's definitely a setback.
The science stack is also the one that starting migrating the earliest. Hell, numpy started supporting Python 3 eight years ago, while five years ago people were still complaining so much they got the EOL delayed. Not saying there are not some specific things that were late or never happened, but again, they are about a minority of projects. The thing is this minority is making all the noise.
Given the reach and popularity of Python, of course you will always find plenty of testimonies saying they suffered. Again, it's important to remember you can't make everybody happy, especially if your job is important. If porting is easy for 90% of projects, you did a great work, period. 10% of millions of users having it hard is still going to raise a lot of voices against you by the sheer power of numbers, but you did ok.
There is also dishonesty. Most people reporting this problem or that problem didn't actually encounter it. They are reporting somebody else experience they heard about because they wanted to make a point. Or vaguely tried something and ran away after they saw 10 minutes of fiddling didn't solve it.
In 10 years going from company to company, I met only 2 projects that were hard to port, but much, much more complainers. When I looked at what was really going on, the fact was just that they were afraid to port, and so just repeated all the stuff other people told them would go wrong. Or they tried running a few scripts for 5 minutes on Python 3 and gave up seeing too many error messages.
I know there are honest situations in the lot. But again, the most honest people are not making noise, because they just ported their code and noticed it wasn't the hardship they've been told.
> The science stack is also the one that starting migrating the earliest.
...but the last to actually finish migrating. In fact, it's still ongoing.
> Given the reach and popularity of Python, of course you will always find plenty of testimonies saying they suffered
Everyone suffered. Everyone had to deal with Python2 versus Python3 bullshit. It's not just about migrating some codebase.
Everyone had to use an inferior version of Python, whether it was because library choice was limited (Python3) or because the language was intentionally left without feature backports (Python2). There was a huge amount of plain "brokenness" in the ecosystem and it was a huge waste of everyone's time.
> Everyone suffered. Everyone had to deal with Python2 versus Python3 bullshit. It's not just about migrating some codebase.
Updating software version is part of the job. Like creating tests, writing documentation, training newbies and dealing with customers. It's not hard just because we don't like to do it. Porting to Python 3 was not hard for most people. They just really, really didn't want to do it.
I get it. I didn't want to add that on my plate too. Like I didn't want to migrate from my Centos 7, I didn't want to move from mysql to postgres and I didn't want to learn the entire setup of Webpack. 3 times. But "suffering" is a big word that has no place for the vast majority of projects.
Like I said, it is not just about updating software. You can't realistically update all your dependencies by yourself.
You are stuck with a tough choice: Do I start out with Python3 and tons of broken packages? Do I limit myself to Python2 and face a costly migration later on? Do I run the extra cost of supporting both? This the choice you had face for the better part of ten years of migration. Perhaps it's not obvious that all Python-based software was worse for it, but that's what happened.
I'm not talking about some web backend service where most of what you do is trivial stuff. You can write and re-write that in almost anything, it doesn't matter.
You're right, most packages appear to be Python3 now. Nevertheless, a lot of client code has been written for the Python2 stack and it's still in widespread use. That stuff won't get migrated soon, if ever. Security doesn't really matter there anyway.
It's tedious yes. Boring even. But most projects get away with 2 weeks of investment. And yes, it pays back. Python 3 is a vastly superior language when it's about introducing less bugs or debugging existing ones. It's just not stuff what had be sold to the public. People want to hear about better perf or fancy features, not better error messages or banned logical errors.
The hardest code base I saw ported was the Twisted project. It's a good counter example: it was long, hard, and required very gifted people (thanks Hawkowl !).
But most projects are not like that. They are full of dumb operational logic that can be ported 80% by 2to3, and a bit of manual fix.
The fact is, people love to complain, so you will hear massively people with that particular example from the movie industry, or this guy who coded this Fortran extension that was in such a tough situation. Well guess what, that's not what most migrations are about.
Most migrations are Django/flask websites, math teachers exercises, physicists scripts, sysadmin tools, etc. Straightforward stuff. I should know, I've been moving from industry to industry for 10 years, all my clients are different, but most of them end up with similar stuff in their repo because Pareto is a thing.
Nonetheless, the screams from the first ones scared the later.