Go + Services = One Goliath Project

purple-again · on Dec 21, 2019

We turned our monolith into a bunch of micro services almost 6 years ago to the day. For a long time I was very happy with the new pattern but over the years the weight of keeping everything updated along with the inevitable corners that fall behind and have...questionable..security due to how long they sit neglected has really left me wondering if I am happy with it after all.

I would love hear some thoughts from others that made the move, especially anyone that decided to move back to a monolith repo.

scottrogowski · on Dec 22, 2019

A company I am affiliated with made a decision to rewrite their code in microservices-oriented architecture thinking it would only take one year. Now we're 7 years into the transition and starting to come up against some hard deadlines that threaten revenue streams. It seems obvious to everyone except the leadership and the architects that this has been an unmitigated disaster. Other comments on this thread seem to indicate that many have had similar experiences.

For those who are curious, here is a classic article on why rewriting code from scratch is a bad idea: https://www.joelonsoftware.com/2000/04/06/things-you-should-....

For a more in-depth analysis on the unforeseen challenges of microservices in particular, I would encourage a lot of careful research into how other companies have tried and failed at this. In particular, I might look at Uber's ongoing difficulties.

All I have to say to the Khan Academy engineers is to buckle up because frankly, moving from Python 2->3 is not that hard and you have no idea what you are getting yourself into.

andy_ppp · on Dec 22, 2019

Yes, the micro services + Golang vanity project because you think you’re Google. I really don’t think people understand error states in distributed systems very well, putting possible network partitions everywhere is not a great idea. I would strongly suggest trying a Golang monolith first and seeing if there are one or two heavily used services that need splitting off. Also monorepo. Always.

asdfasgasdgasdg · on Dec 22, 2019

The really fascinating thing about this tendency is that Google itself never completed nor even really started the wholesale transition of programs and services to Golang/microservices. Google does have services that are micro with respect to the overall codebase. But they aren't what most people out there in the wider world would think of as micro. And Golang remains a niche language at Google, perhaps more popular than server python, but far smaller in usage than Java or C++.

stavros · on Dec 22, 2019

Microservices have always seemed dangerous to me. "Bugs thrive in the seams between libraries, so let's put more and deeper seams in!"

Microservices have an immense cost, and you have to make sure they're worth it. Many teams years ago found it a nice pattern and implemented it because why not, and now we're at the "oops this isn't actually amazing" part of the cycle.

andy_ppp · on Dec 22, 2019

If you work at Google where the standard is higher fine, but 99% of places people end up making very buggy stuff that interrelates in weird ways!

arez · on Dec 22, 2019

Do you have a reference for the uber microservice difficulties?

tmpfile · on Dec 22, 2019

this. 100% this.

_skel · on Dec 22, 2019

In my experience, the biggest benefit of microservices is decoupling teams.

Developer productivity is very hard to maintain in a monolithic app as the number of developers increases and the legacy code piles up. Breaking up services and giving each dev team control over their own codebases enables them to develop their own products at their own pace.

If you only have one dev team, microservices are a lot less attractive. However, there are still some benefits, such as being able to refactor parts of your codebase in isolation (including perhaps rewriting them in different languages), and the ability to individually adjust the runtime scale of different parts of your codebase.

pjmlp · on Dec 22, 2019

I was already achieving that around 2008 by having each team responsible for their modules, delivered over Maven, or on the late 90's by having each team responsible for their COM modules.

No need to over-engineering modularity with distributed systems algorithms into the mix.

whatsmyusername · on Dec 22, 2019

This. The problem with decoupling services is usually there end up being a couple services that are critical but not sexy.

No one wants to touch them so they sit around unmaintained until an unrelated change or unpatched security issue comes around. Suddenly you've got a big problem with a mystery codebase.

viraptor · on Dec 22, 2019

That sounds very familiar, but I'm not sure this is something that can be blamed on decoupling itself. An unpopular module is going to need as much attention as a micro service from code point of view. For upgrades / patching, there would be a company-wide process around it that doesn't care that much how the code is organised.

whatsmyusername · on Dec 23, 2019

'company wide' process is either, 'squeaky wheel gets the grease,' or, 'no one even knows this exists,' 100% of the time in my experience. This is from going from 10k+ to 150 to 5k+ to 50 people.

viraptor · on Dec 23, 2019

I wrote company-wide, but it's not the case everywhere. At some scale you'll want department, or even project-wide process. But the policy should be fairly common - who owns it, what's the response time, how to escalate urgent things, etc.

whatsmyusername · on Dec 31, 2019

I mean, I can write policy documents all day. Most of them would have had a better life as toilet paper.

ako · on Dec 22, 2019

Indeed, microservices is mostly about scaling development.

NicoJuicy · on Dec 22, 2019

Microservices is changing dev complexity to ops.

That's why most companies are promoting devs to do ops .

So then you have devops

jordanbeiber · on Dec 22, 2019

It think in ways this is true.

Containerization, autoscaling, service discovery, tracing, metrics and monitoring et al - lot of it is required to do larger scale, distributed systems. Even if you do not call them microservices.

IshKebab · on Dec 22, 2019

This is nonsense. You can already do that via libraries. The choice of RPC vs local procedure calls has no effect on scaling development.

IMO the only reason to use micro-services is the one they mentioned - you can have different parts of your system running on different machines so they can be spun up independently. But I think most people aren't "web scale" enough to need that anyway.

pjmlp · on Dec 22, 2019

Modular programming as well.

kodablah · on Dec 22, 2019

I've found there's a happy middleground. You need medium-sized-services that still share code libraries. For many companies, this is often 7 or 8. The key is to combine like business units/features, not necessarily fragmenting at every visible code boundary. A deployed "service" can really just be several HTTP paths and/or gRPC services in one repo. You still get to keep decent separation of work, deployment, versioning, dev focus, etc with these medium sized services without sacrificing the benefits of larger, more centralized/shared reuse.

stult · on Dec 22, 2019

I’ve accidentally landed on an architecture like this and I’m actually pretty happy about it. It was driven by a desire to kill a large monolith slowly, by extracting key features into separate services. Sold to the customer as microservices because sexy fad of the moment. Our real motivation was we had way too much trouble recruiting in the monolith stack (.NET) and had a surplus of embedded C++, python, and JS engineers. Anyway, turns out our teams naturally self-organize around four or five domain+language clusters that effectively form separate services that are too large to really be micro, but too dissimilar to play nicely together in a monolith. Eg, python data science module wrapped in a Flask API providing physics calcs, legacy C# service providing simple data models via REST, JS react front end served from a separate node service, a weird C++/python hybrid used for an embedded device sim service, etc. It’s not what I planned as the tech lead/architect, but I think we organically reached what is really the best approach for our team. Definitely an element of Conway’s law in action, but in a good way. We are making the most of our organizational structure rather than fighting against it. Would this scale to google levels? No, probably not. But we don’t need it to, and it’s incredibly unlikely we ever would given our specific business.

bborud · on Dec 22, 2019

Start with a monolith that has clear internal APIs that are designed so they can later be made into network APIs. This gives you the development speed of a monolith while maintaining an options for the future. When you do break things out into separate services: try to make as few of them as possible and maintain the ability to build as a monolith.

Forget everything you have heard about micro services. Most of it is bullshit from people who don’t actually think for themselves.

bcrosby95 · on Dec 22, 2019

This. If you can't design a well segmented monolith, you can't design a well segmented system of microservices either. The microservices will just be buggier and much harder to fix after the fact.

bborud · on Dec 22, 2019

Design is best evolved. If you can get a lot of that done while you still are able to run a well structured system as a monolith you can save a lot of time. It is cheaper to change an interface in Java/Golang than a REST API.

stult · on Dec 22, 2019

Could not agree more. I recently led a refactor of a monolithic .NET MVC app and took this exact approach. We made all of the controllers thin, with almost no logic at all beyond specifying the route and dependency injection. Then redirected the request to a “service”, which originally was just a reworked combination of the old controller/model logic hidden behind a common service interface. Then, slowly we replaced the C# services with microservices. So we went from ball of spaghetti to monolithic service oriented architecture lite to actual microservices with the monolith converted into an API gateway. If we didn’t have independent motives for going to microservices, sticking to the clean and well organized internal APIs of the refactored monolith would have been totally fine.

tuckerconnelly · on Dec 21, 2019

I moved back to monolith and am very happy. I think of the monolith now as a collection of modules. The rule is now, one should be able to drag any of the modules to the top-level of our monorepo and create a new microservice pretty easily when the time comes. I think the microservices book (that came from that Uber engineer...?) suggests a rule of 5 engineers per service.

virmundi · on Dec 22, 2019

How are you preventing transaction couplings? For example, module A and B are called by C. C starts a transaction that wraps A and B. If you move B up as a network feature, you lose the transactionalty.

tuckerconnelly · on Dec 22, 2019

Good question, and this rule is meant to be bent in those scenarios. I try to avoid these dependencies if at all possible, but if not possible, the "writes" for those modules all belong to a single service, and any other service, depending on how "pure" I need to be in the project, will make network calls to the other service, or just grab that data directly from the database.

For a more concrete example, I recently built a service ("scraper") that scraped data and upserted a large tree of structured data to postgres in a transaction. Writes were only allowed from scraper, but "api" could SELECT data for reporting to the frontend "web" as much as it wanted. In the future, "api" might make be refactored to make internal HTTP request to "scraper," so they could have totally separate databases.

m0zg · on Dec 22, 2019

Usually the answer to this is "we pray to god it doesn't fail".

dangoor · on Dec 22, 2019

Our approach to services at Khan Academy is likely a bit different from most. We're sticking with a monorepo (the code for all services lives in one repository). We have a single go.mod file at the top of the repo, so all services use the _same versions_ of dependencies.

We're still building out our deployment system to better support multiple services, but we're planning to redeploy all of the services when library code changes (which is something we're trying to minimize).

All of this ensures that we don't have trouble with services lagging behind on critical updates.

solidasparagus · on Dec 22, 2019

I don't really understand moving to microservices if you aren't going to give the service teams the autonomy to make their own decisions and move at their own pace. Microservices always seemed to me to be more of an organizational strategy than a technical one - if you have services, but they can't operate independently, it feels a bit like you are just creating a distributed monolith.

dangoor · on Dec 22, 2019

We're making a certain set of tradeoffs. For example, we're not adopting the "write code in whatever language you want" form of microservices that some folks adopt, because we don't feel like we're large enough to support that.

Like I said, though, we do want to minimize the library footprint. The vast majority of deploys in this new world will be single service deploys, with the benefits that come with that. We already deploy our monolith several times a day. These services will speed that further.

solidasparagus · on Dec 22, 2019

It'll be interesting to see how it works. I would probably fight to not have shared dependencies because the relatively small benefit does not seem to merit the increased coupling between teams - if all teams needs to agree before a library can be upgraded, I can't imagine it will be very easy to keep libraries up-to-date. To a certain extent it depends on how large your engineering org is, which I don't know.

I've been in companies that moved from monolith to microservices and I saw it work well except when teams had such tight cross-service dependencies that they had to get other teams' ok before making changes to internal details of their service. Then developer velocity was slower than before because it took time to make the cross-team discussion happen and political capital to make other team care when they have other priorities.

dangoor · on Dec 22, 2019

We'll see how it plays out, but the situation that I've described isn't really different from the one we have today (because we have a monolith). Hopefully, it will be better because of how the Go project is trying to get library maintainers to follow semver and avoid breaking changes. When we need to upgrade a dependency, we can do so in one diff, catching the errors with the compiler and test runs. If the upgrade seems risky, we'll watch that deploy carefully, and we already have a process for "risky" deploys. Plus, these are likely some of the easiest changes to rollback if need be, because they are unlikely to change persisted data.

Ultimately, though, if we find that this plan reduces velocity, it won't be that hard to change later.

vips7L · on Dec 22, 2019

Didnt you guys recently decide to write some services in kotlin? Are you rewriting those in go now since everything is going to be one language?

dangoor · on Dec 22, 2019

We have some dataflow jobs written in Kotlin (we blogged about that in June 2018[1]). We also have an internal service written in Kotlin.

We ideally want one language. But Apache Beam (which is behind Google Dataflow) doesn't yet have production support for Go. More importantly, though, we have no time pressure on switching the Kotlin code over, so that's a long way out.

[1]: https://engineering.khanacademy.org/posts/kotlin-adoption.ht...

vips7L · on Dec 22, 2019

Thanks for answering! That was the specific article I remember reading.

gtirloni · on Dec 22, 2019

Choosing the dependencies' version is not all there's to decide. Teams have a lot of control of basically everything else.

andyroid · on Dec 22, 2019

So, no choice of language and no choice of the libraries used. What’s “everything else” exactly? Sounds like missing out on the more interesting parts of a micro service architecture.

sangnoir · on Dec 22, 2019

Yeah, I'll never work for a company[1] where service teams are free to choose any language; 2 or 3 options at most is fine, but more than that is a hard no. I'll have to read/work on that code sooner or later, and I have no time to be dealing with a hodgepodge of languages

1. In the 10-5000 employee range: tech giants are a different beast when it comes to team accountability.

solidasparagus · on Dec 22, 2019

> I'll have to read/work on that code sooner or later

In the microservices organizations I've seen, this isn't true. The other service teams provide an API and, like any other SaaS you use, you do not need to be able to read the implementation. You would only work on that code if you switch service teams.

phumbe · on Dec 22, 2019

Sounds similar to some work I’ve been doing, so thanks for unknowingly validating my design!

At my employer, I’m spearheading a wholesale reimplementation of outdated process automation software, turning everything into Django web apps.

I’ve been working with a monorepo and monolithic deployments to maintain development velocity but recently started transitioning the CI/CD pipeline to deploy each application/service in the monorepo independently. The pipeline packages common assets (including, e.g., manage.py, common HTML templates, and the dependency spec...all housed in the same monorepo) into each app directory before the deploy stage.

Meanwhile, local developers clone the entire monorepo, and when they launch localhost, all of the services come online simultaneously. (That’s the goal, at least!)

I was already excited to see my work come to fruition, and now I’ll be keeping an eye on Khan Academy, too!

dangoor · on Dec 22, 2019

That does sound pretty similar (though our services all have the luxury of serving nothing other than GraphQL!).

Our current plan for local development is to continue cloning the monorepo and firing up all of the services. Go services don't take a whole lot of resources, so we think this plan will work fine for quite a while.

sethammons · on Dec 22, 2019

I'm in an org that runs multiple services in Go. The dependencies on library stuff has been a very minimal need. I think you are optimizing for an imaginary problem. With microservises, a team should have non breaking API versions running and work with teams to transition to a new API version when needed. If the underlying uuid lib or kaftka lib changes, teams may or may not need to update, but they can do so on their own time.

bakul · on Dec 22, 2019

IMHO a microservice should follow the unix philosophy of doing one thing well. But interfacing with other services is not as simple as unix pipes. In particular it is more of a request/response communication. Consequently, interfacing needs to be thought through carefully and made as simple as possible. They should evolve much more slowly than individual service code. While you may use (g)rpc or http at a lower level, your specific protocols will have many more constraints. Ideally you have codified assertions about their behavior in tests early on.

Note that they are not all going to start/stop at exactly the same time and during development they may even crash so each microservice should survive such transitions. You may have two different versions of service code or even completely different implementations or two different version of the API in use at the same time. This can happen as different services evolve at a different rate. And you may want to transition to a new version in a piecemeal manner so as to not bring the whole service down. All these considerations complicate things. So ideally you have factored out these common tasks in shared library/packages. And ideally you write your code such that if necessary more than one service can be compiled into the same binary for performance reasons.

In a monolith some things become easier since everything dies at once! But somethings become more complicated - such as supporting evolving code. And monolitha require more discipline to keep things modular. Over time this gets harder and harder. Lack of modularity means you have to understand a lot more code and when you evolve things, more code will have to change and there may be unforeseen side-effects. And scaling can become harder.

dashwav · on Dec 21, 2019

I think that the fact you have much more visibility over which of the services are behind and have lower security practices one of the things I love about our microservices - we recently (a year ago) broke up our monolithic codebase into a service oriented architecture (I would hesitate to call our services micro personally) and I was astounded at all of the hidden security issues and random code in the far reaching corners of the monolith that hadn't been touched or thought about in years.

It is much easier (imo) to pop into a repo of one of our services and look through the code in it's enitrety and see when things were last touched and where the issues are. I would make the argument that the "inevitable corners that fall behind and have questionable security" is something that is inevitable in any codebase that grows to a certain complexity, and microservices (or SOA in general) make it much easier to see those things as they are decomposed.

marcosdumay · on Dec 22, 2019

Well, I can only agree with all that... But the move to services adds a lot of surface that needs it's own security and architecture maintenance.

The more you break down your code (the smaller the size of the services), the more maintenance need is created from the division, and the easier it is to fall behind on it.

zo1 · on Dec 21, 2019

A recent project I was exposed to has been struggling with Microservices and a multi-repo setup. Even with CI/CD and a lot of good tooling around their setup.

The overhead introduced with having such a setup in a corporate environment that has not-so-well-though-out requirements and design is ridiculous. Keeping track of dependencies, arcane knowledge of inter-service dependency quirks being siloed and hidden, keeping individual services up to date, dealing with older services and their interaction with newer services till they "migrate" to newer tooling/common code, etc.

Everyone then skirts around the fact that the problem could potentially be Microservices or a micro-repo setup. Instead, they throw process, sign-off, complicated promotion pipelines and just plain warm-bodies at the problem in an attempt to mitigate it. But the damage is done, velocity has slowed to a crawl and everyone is miserable, especially when having to explain the whole thing to newcomers.

pm90 · on Dec 22, 2019

The best ideas can be implemented poorly. Software design principle that works for nimble Silicon Valley startups doesn’t work in your big corporate environment? Big surprise.

When I worked at a pretty large software co, a team there were always adopting the latest techniques and tools but their deployment pipeline was an over architected disaster that nobody could reliably deploy. Reason? Their CI/CD system consisted of a person manually clicking around to build Jenkins Jobs. There will be other such silly nonsense (hopefully less extreme) in other corporate environments too.

osdev · on Dec 21, 2019

Like many things in life, there are no absolutes. So rather than going to opposite ends of the spectrum, check out modular monoliths.

This approach is a nice balance (IMHO), since you start off with monolith but it is broken down into cleanly separated modules. Each module can potentially become its own micro-service if or when the time comes.

In terms of implementation, this can be easily done, for example we do it JVM/Kotlin where each microservice is its own project that produces a binary/jar. All the projects are part of a multi-project build. Lastly we have a common project for shared code, utils, types, enums, interfaces, etc and an application project that loads/sets up all the microservices from each project. Works great so far. And when you do have break up 1 project into its own service, the effort is fairly manageable.

rraghur · on Dec 22, 2019

Do you also break up the data for each service up front or do you pull that out later?

silisili · on Dec 22, 2019

Same. I tried microservices before and while it cleaned up the code base, I didn't quite like the results down the road. Some things go stale, you multiply ops * number of microservices, a change in one can mean a change in multiple others. I'm not against services in general, but not a fan of so called microservices.

jordanbeiber · on Dec 22, 2019

I’ve been through a transition with about 60 devs that went DDD plus microservices. A few monotliths ended up as a couple of hundred services, and looking back I feel we got basically all positives.

What other people say about scaling teams is true, but I have a few other points as well:

- personally, I spent 6 months writing tooling for service lifecycle management and setting strict conventions. This was before the microservices decision and I was recruited to do “devops” which gave me a lot of head room. :p

- when talks about microservices surfaced I had to fight for several months to get the lead devs on-board with tooling and conventions. From the infra/cm/systems management side we’re used to manage many thousands of “configuration items” - devs are usually not, and many underestimate the value of proper automated lifecycle management.

- once everyone was on-board we all used a common language. Big win.

- I could form a “devops” team to help develop the tooling further and the infra platform as well.

- almost all teams worked in mobs - amongst other things it really helped with the ownership part, something that’s absolutely crucial. Well defined domains and accountable mob teams, just awesome!

- quality rose by a mile. Small commits and somewhat robust tooling under the eyes of a mob.

- graph the pains. If outdated versions are a problem - put it on a graph, green, yellow, red. Show services in relation to other services - the application is the sum of all connected services.

I could go on, but the post is getting long! :)

DelightOne · on Dec 22, 2019

Can you explain what you mean with service lifecycle management in regards to micro services please, or do you have a book on it? I'm currently studying SE and it's the first time it has come up. Thank you :)

jordanbeiber · on Dec 22, 2019

Well - I’ve got a “service management” background, so it’s completely natural to talk in these terms. :)

A service have in a way two interfaces - one business (the work it’s doing), and one technical (how it does it; a port publishing an endpoint or whatever).

No business process to support, no service to manage.

The lifecycle of the technical service will consist of a bunch of actions that will be taken, perpetually, until the sunsetting/decommissioning of the business process. Actions: init, code pushed, deploy, monitor, trace, update, decommissioning etc.

You take these actions and put then in a lifecycle circle and you have a nice powerpoint!

As much as possible, preferably everything, in this cycle have to be governed by conventions and automations.

- Automatic follow-up if a service have no upstream or downstream services for example. Why do we have a dangling service?!

- Or, you have a key service tied to an SLA, but upstreams are not matching this?

- you have services that have not been touched in a timely maner.

- etc... just drop the relevant team an automated slack message with the option to initiate whatever is required to keep the lifecycle churning.

With many thousands of assets/ci (configuration items) almost everything have to be automated or you will grind to a stop eventually.

If you can couple business process to automated technical service management - big wins!

GreekPete · on Dec 22, 2019

Never heard about this concept, I really like it.

jordanbeiber · on Dec 22, 2019

To me, it’s kind of what ”DevOps” should be about, from a technical perspective:

Take the best/reasonable parts from ITIL (concepts/principles), mix it with principles from the agile manifesto and the 12-factor app. Automate the lot of it.

Doing this in practice gives you dev & ops.

It’s quite a journey that is more difficult the bigger you are. It scales though, so start small, prove the concepts, and grow organically.

gtirloni · on Dec 22, 2019

Thanks for sharing, that's pretty exciting. I think most people don't realize the work they have to put in to make this work (and reap the benefits).

jordanbeiber · on Dec 22, 2019

Thanks for reading.

The more i write and talk about it, the more I realize that it is about ”externalities”, so to speak.

A microservice is just code - one small piece that does one, tightly defined thing. We’ve been doing code always and smaller pieces of is easier to deal with and reason about.

The structure around keeping 100s or 1000s of moving pieces in concert is where a lot of the work is shifted. It takes teamwork as well as a common vision and language. The above sentence tangents “culture”.

dehrmann · on Dec 21, 2019

> weight of keeping everything updated...monolith repo

You need CD or this is an accident waiting to happen.

josteink · on Dec 22, 2019

> I would love hear some thoughts from others that made the move, especially anyone that decided to move back to a monolith repo.

We had a big monolith where I work.

We’ve been slowly, but surely isolating parts of the monolith as separate deliverables, extracted into their own repos. But only when appropriate, and not as a forced exercise.

The remaining “monolith” is still pretty big, but it does (mostly) represent one logical deliverable, so effort to split it up has some what stalled.

There’s been small points of friction, but nothing near as painful as we used to have it. No way we’re going back.

So everything in moderation. Micro services architecture is a tool. Use it when it’s the right one.

hendry · on Dec 22, 2019

I way prefer monolith compared to dabbling in microservices here: https://natalian.org/2019/05/16/Microservices_pitfalls/

kureikain · on Dec 22, 2019

Go is a very weird language :-(.

It's very limited when you started to do complex thing. Example, let's say you are building websocket. You will have a hard time to write type safe websocket handler to process the payload from client for all the events...

I started to do Rust/Crystal and both of them are better than Go(performance, type system).

Yet, whenever I build something for work, I come back to Go :-(. I told myself to use Rust or Crystal.

Then I realized that Go is a practical language. It compiled fast so it makes testing easier. The cross compiler just make it so easy to build binary run on everything thing. And the limitation of Go makes it very consistent on how you do thing. This makes working with Go become faster event by the fact that it slows you down on other parts.

So I think Go is a language that people easier to fall into because it has the speed of interpreter language like Ruby/Python(or even faster) during development and have a better performance/type safe story.

apta · on Dec 22, 2019

Java and C# provide the benefits you mentioned, while being more expressive languages and having better runtimes compared to golang.

hu3 · on Dec 22, 2019

Hardly. OP mentioned fast compilation and limited ways to code the same solution.

C# and Java are slower to compile and offer way more options to do the same thing.

Here's Uncle Bob take on testing with Go: https://m.youtube.com/watch?v=2dKZ-dWaCiU&t=36m40s

apta · on Dec 22, 2019

> C# and Java are slower to compile

Not for any meaningful work in my experience. As a matter of fact, I found the change/compile/run loop in golang to be slower on projects I've been working on due to the fact that it doesn't support incremental compilation, so any change I make ends up recompiling the entire program and writing out a 100+MB binary anyway. Compared to a Scala project I worked on before (and Scala is notorious for slow compiles), after the first compilation, all modifications happen very quickly as only the respective classes are re-compiled.

> Here's Uncle Bob take on testing with Go

Again, this doesn't apply for any non-trivial/large project. On a project I'm working on, it literally takes 7-8 minutes to do a clean build + run all unit tests in golang.

hu3 · on Dec 22, 2019

Go absolutely does incremental builds by default and has been like that since I can remember. Packages are only rebuilt when their source or their dependencies change.

Same for tests which are cached by default so during typical development only a subset of tests are executed and compilation time can be a big part. Leave full tests for CI.

An anecdote I found from 5 years ago:

On my 1.7GHz processor it takes 10 seconds to build the whole standard library from scratch (300k lines of code).

It's fast AF.

plinkplonk · on Dec 22, 2019

just trying to understand - you guys think moving a Python2 monolith to Python 3 is too painful, and so you are going to port all the code from Python2 to a completely new language (Go), change the architecture (monolith -> microservices) and move the HTTP API to React + GraphQL, all in one year?

2020 is going to be in an interesting year at Khan Academy ;-)

CGamesPlay · on Dec 22, 2019

The move from Python 2 to 3 would likely have also involved changing the architecture so they could migrate components incrementally. Since they were going to do that regardless, and if they already wanted to change the interfaces from HTTP to GraphQL, this is a natural time to do it. Though, this migration has nothing to do with React--they were already and will continue to be using it.

zmmmmm · on Dec 23, 2019

That isn't what they said. They carefully explained that they could migrate to Python3 but that the benefit of doing so was small so they looked at the performance benefits of using other languages and decided that the performance benefit was large. The performance benefits of using Go (or Kotlin) were the deciding factor.

My guess is that the framework for them thinking about this is that they have already been thinking about migrating languages to get better performance and the Python3 migration seems like completely wasted effort if you then throw it all away to go to another language shortly after.

dangoor · on Dec 22, 2019

2020 is absolutely going to be an interesting year.

One thing that might not have come across clearly in the blog post: we're already well on the path to using React everywhere (we started using React a week after it became public… 6 years ago?). We made the decision to move to GraphQL in 2017, so we've already got a lot in our GraphQL schema. Finishing those switchovers will make our move to Go happen more quickly.

conradfr · on Dec 22, 2019

At least the article makes it seem it's not a decision taken on a whim and they did some kind of POC and planned the transition.

Of course the obvious thing missing in the article is how they expect to deliver new business features while recoding everything in a new language.

It's fun to read this and the Etsy thread currently on the frontpage as well.

dangoor · on Dec 22, 2019

> Of course the obvious thing missing in the article is how they expect to deliver new business features while recoding everything in a new language.

For new features, the new parts of the GraphQL schema for those features will be written in Go as part of the new services. Our frontend is already in large part a single page app in React which requests data via GraphQL, so the frontend for the features will look just the same as it would on our monolith.

bhntr3 · on Dec 22, 2019

> If we moved from Python to a language that is an order of magnitude faster, we can both improve how responsive our site is and decrease our server costs dramatically.

I see people say things like this a lot but my experience is that while other languages are 10x or more faster than python in some benchmarks it's very rare that computation time dominates server latency or that servers are running at 60%+ cpu across all cores.

If 90% of your service latency is not directly on the cpu and/or you haven't profiled to see that the performance bottleneck is evenly distributed across all tasks, then it's super dangerous to migrate to a new language thinking that will fix it.

I hope people inside Khan Academy know this and it's just a clickbait blog. If they really think "go is 10x faster than python so we'll only need 1 server for every 10 when we migrate" then I think they'll be disappointed.

pm90 · on Dec 22, 2019

* It’s not just that Go is faster to run but also faster to iterate on. If python can be neither, its offering little benefit.

* they moved from a monolith to a microservices architecture; concern that any of the services in the request path could add latency just because of the overall runtime speed is slow is a legitimate one.

* their primary deployment method is Google App Engine where you are billed by CPU used. Any change that consumed less CPUs has a tangible effect on their costs

andy_ppp · on Dec 22, 2019

Go is faster to iterate on is absolutely false. Where do you get this idea from?

sethammons · on Dec 22, 2019

Having worked in larger async twisted Python and Go, the experience of our teams is that Go is faster to iterate on by a long shot. We've replaced most of the old Python. We just brought on some new devs on my team. They were able to make new contributions to the Go stuff in short order. The Twisted Python, not so much.

pm90 · on Dec 22, 2019

I can easily ask the vice versa of such a bland question. Answer that first if you want a real answer instead of trying to flame.

andy_ppp · on Dec 22, 2019

It’s not bland it’s direct. What you’re doing is conflating personal preference with actual language features. Most developers who aren’t us would say python, being a higher level, dynamic scripting language rather than Golang which is lower level and extremely explicit about the data your program is using. Iteration in go is simply harder as you have to be more explicit rather than sketching something out. Changing that more explicit stuff is harder if you got it wrong while iterating. Why do you think Golang is better at iteration?

Add in extremely poor error messages, lack of generics, having to generate loads of code for various things, the ability to crash whole services if your program does something incorrect, excruciating error handling will all slow you down.

pm90 · on Dec 22, 2019

It’s funny that none of the things you mention in your last paragraph are handled any better in Python. Without type safety, you instead have to deal with bugs caused by inane mistakes. If we’re talking about generics, the conversation has shifted from “writing scripts that does x” to “maintaining a production system” and for the latter, golangs type safety, out of the box excellent default tooling and easily grokkable concurrency primitives make it far more maintainable than Python.

Things are also easier to change in go because of the type system and interfaces. The former catches most the obvious incompatibilities, the latter ensures that abstractions don’t leak across different system boundaries; whereas in Python there is a tendency to pass a do everything objects across the system.

Error checking has improved substantially with error wrapping in go 1.13. Not only can you locate precisely where your system failed; you have to be explicit about handling errors. I do concede that pre error wrapping the error handling was garbage.

andy_ppp · on Dec 22, 2019

Clearly I don’t agree but it’s an interesting and detailed answer. I do agree that type systems help with refactoring but not iteration. There’s a fine line there. Personally I think Golang’s type system isn’t as good as it could have been... I do like the idea that Golang reports to provide quite good (simplicity above everything), if they added proper macros to replace code generation and to replace the desire I have for generics it would be a much more useful language for my needs.

danenania · on Dec 22, 2019

Dynamic languages generally enable faster iteration in the early stages of a project, but once you have a large, mature codebase, static types allow you to work faster and produce fewer bugs. A more performant language will also run your test suite faster which can have a big impact as a project gets very large.

Also, while I agree Go’s error handling isn’t very elegant, it does force you to explicitly consider every potential error, which in my experience makes uncaught errors far less likely than a language with bubbling exceptions.

krenel · on Dec 22, 2019

IMHO the affirmation of

> Go is faster to iterate

holds true considering the total lifespan of the project. Golang is more explicit thus requiring more time to define every type but I've never refactored so fast and safe a codebase. In Python the fact that is dynamic makes it more difficult to safely iterate over it (is more statically-typed Vs dynamically typed). About error handling, it's not perfect, but the code is readable, easy to follow and easy to reason about.

> Iteration in go is simply harder as you have to be more explicit rather than sketching something out.

I will completely agree that prototyping in Python is way faster. Python is my preferred language for throwaway/prototype code.

> Where do you get this idea from?

It's solely based in my experience. I hope it contributes to the conversation.

alfalfasprout · on Jan 2, 2020

Well Brad, that's very dependent on what exactly the service is. Moreover oftentimes low CPU utilization is actually a limitation of the implementation on a slow language (eg; Python technically does have async webservers but adds a lot of idle overhead).

Indeed, there are many benefits these more performant languages have over Python aside from raw single-core performance. For starters, more efficient concurrency and parallelism can help reduce average latency when combined with a quality async webserver. Then there's gains due to shared memory across threads.

So in many cases-- absolutely, you can only need 1 server vs. 10 when you migrate. It's thus not fair to say that these gains are "very rare".

tgv · on Dec 22, 2019

I inherited a few python/django servers. One of these has workers that grow to about 1Gb over time, even though they retain absolutely no data in core (or shouldn't). The same server is used to collect data, convert things and analyze the data. Especially the latter can take a bit of time, which means that there is a problem when more than two people try it at the same time, since it severely hinders the other tasks.

I'm now converting one server to Go (although not the heavy one), and it really runs fast and uses much, much less memory. It also starts in in less than 1s, whereas the django application takes 5 minutes, because of some stupid problem in static file collection.

Python is fine as a teaching tool, to prototype in, and to use in notebooks as a wrapper around numpy, scipy, etc., but not to run in production.

speedplane · on Dec 22, 2019

> while other languages are 10x or more faster than python in some benchmarks it's very rare that computation time dominates server latency

Most applications spend most of their time waiting for the database or network. I suspect the fastest programming languages are those that have the lowest thread/process overhead. If most apps spend their time waiting, then a language with 10X lower process overhead can handle 10X more processes.

golergka · on Dec 22, 2019

You can also avoid using separate process for each client (NodeJS).

speedplane · on Dec 22, 2019

> You can also avoid using separate process for each client (NodeJS).

Yes, NodeJS solves one problem by having low process overhead, but it also fails to take advantage of parallelism in modern processors. Ideally, I'd like to see a system with both.

jacques_chester · on Dec 22, 2019

> Ideally, I'd like to see a system with both.

Java, or Kotlin, using one of the reactive frameworks.

nablaoperator · on Dec 22, 2019

You can always use the 'cluster' module or worker threads.

dangoor · on Dec 22, 2019

Sorry if this wasn't clear: Go is 10x faster than Python, yes, but we know that we're not going to reduce our server count by 90%. 50% is quite possible, though, given Go's superior threading and its good resource use. Moving away from the monolith should also give us new optimization possibilities.

harel · on Dec 22, 2019

Migration from python 2 to 3 is easy and fast. I've migrated multiple large apps and it took about a day each. Most libraries that matter have been migrated. Some don't even support python 2 anymore. It's practically 2020. This should not even be a consideration. After 2 to 3 is done they should consider again If they want to redo the stack but first I'd focus on this small maintenance task.

AnonymousPlanet · on Dec 22, 2019

Hahaha, sorry but this is a very cute thing to say, in my view. At our company we just barely finished migrating our software with nearly a million lines of legacy Python 2 code to Python 3. This took over a year of nearly exclusive migration effort, just making our code work with both. The entire migration project started way before I joined the company several years ago.

So, no, things are not as simple if you're not dealing with toy projects. And no, you can't assume that it's the same for everyone if you're not in their shoes.

Your comment is pretty much the equivalent of "I don't see a bug. Works for me."

stavros · on Dec 22, 2019

It very much depends on how good the codebase is. I also spent a year on and off porting a large codebase from 2 to 3, and it would have gone an order of magnitude faster if the codebase were in better shape.

AnonymousPlanet · on Dec 22, 2019

I agree that code quality is a big factor. But what is good code quality in Python? In our case, the oldest code is the most "pythonic" and is at the same time the worst to maintain. The better code mitigates the drawbacks of dynamic typing and by that moves away from the pythonic standard you see in many libraries.

But even if you nail the types to the board (e.g. assert isinstance(...)), use (the somewhat weak) Mypy wherever you can, and have good test coverage, you still have to grep your code base for usage of, e.g., .keys(), eyeball hundreds of modules for subtle Unicode madness or hunt for the odd division, replace every sort() that doesn't use key= yet, etc. The todos add up and someone has to go into the code and change those lines.

stult · on Dec 22, 2019

What specific “pythonic” habits have you found are more difficult to maintain? Asking out of genuine curiosity, not to challenge the premise. I work with a lot of data science people that really emphasize being pythonic, but coming from the software/static typing side of the house I always find their code style and architecture a little concerning, and I’m not sure if I’m just not getting it or if they really are writing spaghetti.

AnonymousPlanet · on Dec 22, 2019

One of the biggest footguns is method naming. Most Python libraries will gladly use generic method names like "add()" or "getName()". The moment you need to rename the method or change the signature, you will have a hard time telling it apart from all the other method calls by the same name. No type inference will save you here because type inference is incomplete and will never let you find all the callers.

What you should do is use unique names. But that will give you ugly code like myFoobar.foobar_addBar(). The kind of code that makes the pythonic crowd cringe.

Another problem is making code too generic with regards to what types it consumes, instead of nailing it down to the few types you're ever going to use here. This makes it hard to reason about your code months and years down the line. How is this method used in the rest of the code base? Do all callers expect an int? What if my method now happens to return a float?

And there's also abuse of duck typing. Throw around a lot of objects, sprinkling methods and other members on to them as you go. Then when you consume the object, just look if it has the method you want to call. This makes any kind of type checking and static type inference useless.

And then there's a whole lot of Python 2 libraries where you get the feeling that the authors didn't give too much thought about whether they are dealing with str or unicode. The method might just call .encode(...) on one of its arguments without being too sure what it is.

And every one of the mistakes that result from the above practices might only pop up when your code has already been shipped to the customer site.

stavros · on Dec 22, 2019

Of the things you mentioned, only Unicode has been a problem, and that's exactly what 3 is fixing, so that's to be expected. The rest was automatically handled by 2to3 with only a cursory review.

AnonymousPlanet · on Dec 22, 2019

We tried 2to3 and it gave us poor results. But probably because we deviated too far from being pythonic.

The str/unicode misery is one of the biggest gripes I have with Python. I'm glad this unpleasant knot has been mostly untied in Python 3. I came to the conclusion that the transition would have been much easier if Python 3 just concentrated on the separation of bytes and (unicode) strings. The other features could have been in Python 4.

It's a bit like IPv6. If it would just solve the address space problem, most would have moved to it already. Instead it comes with a lot more baggage. And each additional feature has it's own uphill battle for acceptance. So nearly everyone is dragging their feet, citing their pet peeve with the technology.

stavros · on Dec 22, 2019

I'm not sure that's true because, as I said, most of it is automatic and worked well with 2to3, leaving us to deal pretty much only with Unicode. I'd certainly prefer to only have to do this upgrade once.

AnonymousPlanet · on Dec 22, 2019

How do you automatically go from sort( ... some elaborate compare function ...) to sort(key=some completely different function). Yes, there's a wrapper, but it makes the code more convoluted instead of transforming it to the key-paradigm. And if you want to sort by several keys, now you will have to call sort several times.

How do you automatically infer the intention of somedict.keys()? Is it going to be used as a list or as an iterator?

Those are just off the top of my head. I don't remember all the cases where 2to3 tripped over and produced garbage. But there were too many cases to put actual faith in automatic conversion.

It might work if your code is kind of new and homogenic. But looking at how much trouble Dropbox had, even with all the tooling and Guidos they could muster, I have the feeling that your positive experience with 2to3 might rather be the exception than the rule for old and big code bases.

stavros · on Dec 22, 2019

> How do you automatically go from sort( ... some elaborate compare function ...) to sort(key=some completely different function). Yes, there's a wrapper

Yes, there's a wrapper. You use it, add a comment "this is wrapped in the migration to 3" and move on.

> How do you automatically infer the intention of somedict.keys()? Is it going to be used as a list or as an iterator?

If it's being iterated on first thing, it's an iterator. If list methods are called on to it, it's a list. This just hasn't been a problem for us, sure, it took some looking at, but it wasn't more than 30 seconds per case.

> I have the feeling that your positive experience with 2to3 might rather be the exception than the rule for old and big code bases.

Maybe so, but the codebase was ten years old and hundreds of thousands of lines.

harel · on Dec 22, 2019

The two projects i migrated were good quality code at the hundreds of thousand lines of code scale. Though I find that metric a bit inappropriate. I agree the statement was a bit generalising and not appropriate for every project and the million lines projects should have been excluded To be honest - I find a year long dedicated migration effort a bit excessive. But who am I to judge. My experience was smooth with just a few hurdles around byte/string issues and that was it.

dangoor · on Dec 22, 2019

FWIW, our codebase is of similar size to that and we estimated it would take around a year to migrate, which is why spending a bit more time and ending up with a Go-based system on the other end was appealing.

jhoechtl · on Dec 22, 2019

After all it will still be Python.

Khan Academy rationale might be false, as the transition path might be easy from Python 2 to Python 3.

But in the end they will have the same stack as before and that's what they clearly try to avoid. Given that it makes sense to transition from a dynamically typed language to a statically typed which offers more compiler feedback.

RandyRanderson · on Dec 22, 2019

Why do they call them "micro services" and not distributed systems? Oh right, it's because distributed systems are obviously really hard to create correctly and no sane person would ever agree to pay for that.

Nice: re-branding. I can't wait for the, maybe "consolidated computing" manifesto (aka turning micro services back into monoliths).

thethimble · on Dec 22, 2019

What if the services you are writing are independent in that they solve separate business problems, are built by separate teams, have little to no data coupling (e.g. Only basic auth), have different scalability profiles, etc? Separate services are really effective for these cases. Neither micro services nor monoliths are silver bullets. Instead it's possible for each approach to be the best approach in a particular business context.

RandyRanderson · on Dec 22, 2019

In several decades of it experience , I've not known or heard of a nontrivial system like the one you describe in the first part of your message.

In the latter part, that must be a disingenuous dichotomy. You don't really believe that just because an avenue exists we should include it in an evaluation?

pjmlp · on Dec 22, 2019

What about writing modular libraries then?

Sevii · on Dec 22, 2019

People are willing to pay for it when you have 50+ engineers trying to push code through one deployment pipeline. There's an inflection point somewhere at which the cost of sharing deployments is no longer worth it.

andy_ppp · on Dec 22, 2019

I actually think this is where good software design and bounded contexts come in. You can perfectly well run a monolith with hundreds of developers if each of the sections is very well contained. There is no need to add network partitions everywhere to enforce this!

_y5hn · on Dec 22, 2019

This. Usage of multiple nodes, networking, redundancy, etc. is because of operational concerns. It shouldn't be over "development concerns", which is exactly the wrong cause to embarge upon such a long-winded journey!

RandyRanderson · on Dec 22, 2019

I'm trying to imagine that ( real or imagined) benefit outweighting solving distributed transactions.

olingern · on Dec 22, 2019

Why are you getting hung up on nomenclature? The point of the article is clear. If you feel that using 'microservices' is too trite or "buzzword-y" then that's about you and not the article.

RandyRanderson · on Dec 22, 2019

I guess you're agreeing with me sarcastically? Very funny.

zmmmmm · on Dec 23, 2019

I think some of the misunderstanding in these comments comes from not fully appreciating the perspective of not-for-profit organisations. While I can't speak for Khan Academy, I know that in every NFP organisation I have worked for there is an acute awareness that funding could dry up one day and the prime directive is to ensure that in a scenario like that, the work of the organisation can continue.

In this case, it leads to a higher concern about minimising the cost of the operational services than you might have in a for-profit organisation. In all the strategic planning I have been involved in with NFP, we always have the "what if worst case scenario arises" plan and in that plan the ability to scale down to bare minimum operational cost is key. It may not be conscious but I suspect that may be part of the reason the performance savings from moving to Go are so attractive in this case, where most profit-making companies just ask the question of whether they can afford to pay for the servers with their current margin or not and if they can they have more important things to worry about.

0xbadcafebee · on Dec 21, 2019

So, some potential pitfalls:

- The decision seems to be primarily a software architecture one, without much mention of all the other architects whose input will shape how the finished product is run and supported. In a modern software development environment, all the other parts of the org should be consulted on greenfield work to "Shift Left" anything that may need to change down the pike. Design in a silo leads to ineffective products.

- They're going from "hmm we need to upgrade from Python 2 to Python 3", to "we need to redesign everything in a new language with a radically different software architecture". This is definitely the second system effect. It's going to take years to make this thing reliable and sunset the old product.

- They're porting over the logic? Even if this is actually the right move, wouldn't a clean-room implementation potentially give better outcomes?

- Why are they continuing to use App Engine if the writing's on the wall for 2024?

dangoor · on Dec 22, 2019

I don't disagree with your pitfalls, but I do think we're working to avoid them.

To your first point, "The decision seems to be primarily a software architecture one...", this project has had involvement of the whole engineering team since the beginning. The whole org is on board with this change. It's definitely not happening in a silo.

> This is definitely the second system effect. It's going to take years to make this thing reliable and sunset the old product.

I hope not, but obviously we're not done yet, so I can't say how long it will end up taking to completely decommission the Python 2 app. What I can say is this: there are aspects to this project that are _simplifying_ our system and, for what's left moving from Python to Go, our intention is to port the business logic as close to a straight up port as we can get.

> - They're porting over the logic? Even if this is actually the right move, wouldn't a clean-room implementation potentially give better outcomes?

_That's_ second system effect, to me. We can't change everything and fix every problem now, so we're focusing on the changes that will help us move from Python to Go faster.

> - Why are they continuing to use App Engine if the writing's on the wall for 2024?

I don't think Google Cloud is disappearing in 2024, for one. Beyond that, again, we're not changing everything about our architecture. The way our data is stored is staying the same.

scottrogowski · on Dec 22, 2019

> The whole org is on board with this change. It's definitely not happening in a silo.

Given the seemingly strong chorus of voices responding with cautionary tales about why you might want to rethink this plan, and the number of engineers in your organization, it seems more likely that you have some dissenting voices who have either been too scared to speak up or have already been shot down.

dangoor · on Dec 22, 2019

What I meant by "the whole org is on board" isn't that there weren't other opinions. There have been multiple opinions on almost every decision we make (and we have an open process and document our decisions in a style very much like this one[1]). In the end, it's not about "shooting down" alternatives, since that's loaded language. It's about making what we think is the best choice we can with the information available to us.

Even in this thread, there's a chorus of voices sounding caution based on their limited information of what our situation looks like, but there are others who see why we're doing this, based on the same limited information.

We absolutely do know the risks of this project, which is why we're doing this as incrementally as possible.

[1]: http://thinkrelevance.com/blog/2011/11/15/documenting-archit...

batter · on Dec 22, 2019

Go + AppEngine is the most unstable combination i have ever seen. While we tried to deliver project during 1 year, it was almost fully rewritten couple times because of new Go or AppEngine API. Having NodeJS with far less problems. And AppEngine has huge price tag.

pm90 · on Dec 22, 2019

GAE v2 let’s you use docker containers and ime it has been pretty stable and fantastic.

Chloro · on Dec 22, 2019

App engine isn't going anywhere. Don't be so dramatic.

kucing · on Dec 21, 2019

Looking at the case where khanacademy is migrating their server only after about 10 years. I realize more that I don't have to worry that much about being locked into certain technologies (unless it's clearly untransferable, e.g. storing part of customer data in 3rd party server), because after all, we might keep it only for about 10-20 years, and the thing I'm working at almost certainly will only last < 2-3 years.

todd3834 · on Dec 22, 2019

> We’ll only generate web pages via React server side rendering, eliminating the Jinja server-side templating we’ve been using

I’ve been down this road. Deep down this road. Let me just give you a heads up on something I didn’t consider at the time: Most template languages do not parse every single node, one by one. In a sense they are just doing string concatenation. Not so with server side rendering and React. I’m not saying it can’t be done but just realize it is going to take a lot more compute power. Caching is great of course but won’t help you if you plan to customize user content during the server side rendering as well. My recommendation is that you don’t do any user authenticated stuff during SSR.

Also consider how you are going to handle cookies if you do plan to make authenticated requests to server side rendering. Also solvable but for some reason people had the hardest time understanding why we had to forward cookies to the domains we controlled in an API request and definitely not to any other servers.

I’m not sure I would pick React for an SEO driven website. It is hard to get a competitive “time to first byte”. Unless of course you can pre warm a cache of every one of your pages.

Lastly, you’re going to need Node for the SSR. I’m sure you know this but that might take you out of app engine and into cloud compute. Not a big deal but thought I’d mention.

Good luck! It is doable. If you ever want to chat about how we solved some of these problems I’d love to save you some time if I can. Hit me up in my profile email.

dangoor · on Dec 22, 2019

Thanks for the suggestions and the offer to chat!

We've been doing SSR for quite a while now and are improving our CDN use as we go along. We already took steps to ensure that there's no user-specific information showing up in our server-side react rendering which would damage cacheability.

Our frontend infrastructure team essentially owns the React render server. I'll let them know you offered to chat.

benatkin · on Dec 21, 2019

The article says this: "Moving from Python 2 to 3 is not an easy task."

I disagree with this. It's a Python project's dependencies that make it hard to move from 2 to 3, and most libraries have been updated.

Of course, you could argue that it isn't easy to migrate a codebase from one major version of a language (or framework, or database) to another, but when you eliminate easy from your vocabulary it becomes harder to describe different levels of difficulty.

pbreit · on Dec 22, 2019

That’s precisely why it’s not an easy task!!

foolfoolz · on Dec 21, 2019

migrating from python 2 to 3 is such a large task that migrating to any other language is a comparable effort. this is not just a library problem the language itself changed significantly

source: no python services at my company are going to be migrated to python 3; it’s all moving to a JVM

CivBase · on Dec 21, 2019

> migrating from python 2 to 3 is such a large task that migrating to any other language is a comparable effort

I'm going to call BS on that one.

If you're having issues with Python 2, then it might make more sense to switch to another language instead of upgrade to Python 3. But going from Python 2 to 3 is much easier than switching languages completely.

Python is not a perfect language. There is no perfect language. It sounds like your company just had a reason for switching to a JVM language and the Python 2 EOL was a justification to start.

glofish · on Dec 21, 2019

I agree - the statement that migrating from Python 2 to Python 3 is a comparable effort to migrating from Python 2 to Go feels grossly exaggerated.

tapirl · on Dec 22, 2019

AT least, it is more exciting for a developer to migrate from Python 2 to Go than to Python 3. :D

rlonstein · on Dec 22, 2019

I'm skeptical too but if they're doing a lot of communication with other services and they relied on bytes and ASCII just working and the code isn't backed up by tests then I can see them having a very bad time going from Py2 to Py3.

gunnihinn · on Dec 21, 2019

N=1 and all that, but I ported 500k lines of a Python 2 monorepo to Python 3 this year and it took like two weeks, including a week spent reading Eevee’s post on the subject half a dozen times and playing with six and futurize.

zo1 · on Dec 21, 2019

Migrating to 2-3 can be a large task in some very rare cases possibly, but for the most part it is practically effortless if you don't have to support both simultaneously. The biggest hurdle might be the "fear" of the unicode change, but that can be dealt with.

Source: All python services at my current workplace are in the process of being migrated to 3.*, and I'm doing one of the main ones at the moment and it's a breeze, including compiled c-extensions.

For curiosity, a good list of actual python3 syntax changes: https://docs.python.org/release/3.0.1/whatsnew/3.0.html#over...

was_boring · on Dec 21, 2019

What? I've done it on some sufficiently large code bases, and small ones, and it was done way faster then a rewrite. With tools like 2to3 you can assign it to an intern and have it done pretty quickly.

edoceo · on Dec 21, 2019

It's not quite that easy.

seriesf · on Dec 21, 2019

Numerous large projects and companies have publicly stated that they are stuck on Python 2 and it's easier to migrate languages, even to ones that they have to invent (Go) than to migrate to python 3. At least one of these companies had Guido on their staffs for years. Another, also with Guido on the staff, needed over three years to migrate from 2 to 3. The overwhelming body of evidence shows that migrating a large project from 2 to 3 borders on impossible, but there's always someone willing to pop up on HN to say how easy it is.

glofish · on Dec 21, 2019

wait - what?

Is Django a big enough project for you?

Did you know that Django was not only successfully migrated from Python 2 to Python 3, it was ported in such a way that for many years it used the same codebase in both languages ...

Perhaps that's the biggest advantage of porting from 2 to 3. A lot of the code could run in both languages.

seriesf · on Dec 22, 2019

The corollary of my complaint is there will always be someone who pops up on HN with no idea how many lines of code are in a "large project" like Dropbox or YouTube.

glofish · on Dec 22, 2019

This is kind of a cop-out, how would anyone know?

Is Dropbox open source? Is Dropbox even a typical Python application representative of the challenges of porting from 2 to 3?

My hunch is that the challenges of porting Dropbox to any other language have to do less with Python more with the need to deal with a filesystem at a lower and more granular level than what typical programming languages offer. Thus everything needs to be rewritten in bazillions of ways to handle the bazillion corner cases.

joshuamorton · on Dec 22, 2019

Which companies are those? Because neither Google nor Dropbox have claimed the things you're implying they did.

sethammons · on Dec 22, 2019

I've been in a similar boat. We've been splitting up or converting large Python 2.6/2.7 applications into Go services (and doing the same to large Perl applications) for a long time now.

Go has consistently been 10-20x performant (allowing for dramatically reduced hardware needs), easier to maintain, and more productive to produce code in than our previous Python (Twisted) and Perl (AnyEvent).

Hopefully KhanAcademy has solid telemetry data in both legacy and new code so they can quantify benefits. They will also have a learning curve for managing multiple micro services vs monoliths. Accessing shared data will be a problem they will likely have to solve. We've opted for each service controlling its own data - no reaching into another service's data behind its back. Everything through APIs. This gives the microservice the ability to alter its datastore as it needs to and not be blocked by other teams' need to update how they access the data.

Debugging a distributed solution is much harder than a single service. Distributed tracing, consistent structured logging with log aggregators that let you do fancy searches (like Splunk), and application telemetry and metrics will be even more important than before.

dangoor · on Dec 22, 2019

Does sound similar!

We have established the rule that each service owns its own data.

We've already got Stackdriver set up to give us distributed tracing and have set up standards around logging.

d_burfoot · on Dec 22, 2019

I heard about this project from a friend who works at KA. I am concerned about the strategy, and I think the following approach would yield better results:

1. Write in Go an exact reimplementation of the current Python codebase. Use the same database schema, front-end HTML/JS, test suite, and so on. To whatever extent possible, use the same names for classes and functions. Check the reimplementation correctness by using a comparison tool that calls both the Python and Go version of a page/function/search and making sure that they produce the same results.

2. Change the production code over to the Go version, perhaps using a ramping strategy where X% of servers are running the Go code, and you gradually increase X, while monitoring vital statistics like server load and response time.

3. Now that the production site is running Go, incrementally split off components into their own services.

This approach leads you to the same destination, but with a lot less risk. It is very unhealthy to have a situation where the production site is running one codebase but all the developers are working on another codebase. Note that you will realize the benefits of Go (performance, type safety) after step 2, which is much sooner than OP's plan.

Joel Spolsky's classic essay about how you should never do full codebase rewrites is worth reviewing:

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

krenel · on Dec 22, 2019

100% agree. We've just finished to roll out our implementation in Go migrating a subsystem from PHP and receiving around 150req/second and demultiplexing those request to 1500-2000req/second to legacy backends.

The key to the success of the project was that the API was an exact match, and we could compare both implementations for exact requests. The deploy strategy of the new version:

- Reply the real traffic to the new Go service comparing the results with the old one - Then implement a toggle feature than enabled different traffic sources to use one backend or the other - Keep changing backends to the new system and ensure that metrics were unaffected

Having e2e and integration tests for the Golang project was of a huge help, since we could fix all differences using TDD.

Although we changed some of the implementations to take advantage of Go constructs, just a 1-to-1 replacement would have had a huge performance impact.

sethammons · on Dec 22, 2019

Having done this kind of migration/rewrite multiple times, the way you succeed is starting with acceptance tests that verify functionality from the API layer that is implementation agnostic.

After the tests are in place, you break off small portions into microservises and ensure tests pass.

You pass a small percentage of traffic through the new arch, fixing bugs and leveraging telemetry.

You eventually slide all traffic to the new architecture.

You want to get something receiving traffic asap as to start getting feedback. Often this means taking something smaller or simpler out of the lagacy codebase first.

Doing a complete, side by side replica is a recipe for disaster. Think MVP. We've even introduced traffic routing based on feature sets so you can route users who don't use edge case features the new arch which has yet to incorporate the those features while keeping others on the old arch.

dangoor · on Dec 22, 2019

It looks like I need to write a followup with more details about how our migration looks!

Since we're using GraphQL federation, we have a gateway which serves up our complete GraphQL schema, pulled together from a collection of services behind the gateway. We can move individual properties over from our Python monolith to new Go services and the clients will never know. Plus, we can do side-by-side testing in the gateway by making a request to the monolith and to the new service and comparing the results (something we don't have yet, but plan to).

This is definitely not a big bang rewrite. It's about as incremental as it can be, because we can move individual GraphQL properties over and the gateway stitches the result together.

jordanbeiber · on Dec 22, 2019

Is moving directly to #3 not an option?

I’m thinking that defining parts that can be moved to separate services, and start consuming these could be a way to organically transition to a new architecture.

KhalPanda · on Dec 22, 2019

Sounds nice in theory, but won't you always be playing catch-up? Or will development(/bugfixes) halt on the existing python version? Would it ever be acceptable to the business to commence such a rewrite without any show of value until after completion?

(Speaking as someone working at a firm that did choose (before my time) to start a ground-up rewrite than ran about 3 years over estimate).

wonjohnchoi · on Dec 22, 2019

It is crazy to see they are still using python 2. Seeing how slow the conversions to python 3 have been, was creating python 3 a good decision for python community? Can it be argued that developing python 2 further in a backward compatible way would have been better for the community? I know that evaluating this kind of thing is hard as metrics are bound to be subjective and speculative. But I am curious if there was any serious attempt to figure it out.

speedplane · on Dec 22, 2019

> was creating python 3 a good decision for python community? ... I know that evaluating this kind of thing is hard

I don't think there is any question here: Python 3 is a complete disaster. Years and years of engineering effort wasted on changing string libraries. Sadly, the Python leadership refuses to acknowledge the failing, perhaps because such an acknowledgement would challenge their omnipotence ... it would, and it should.

ddxxdd · on Dec 22, 2019

I remember a long time ago, I used the "six" library (Python (2 * 3) ) to code in Python 2 with an API based on Python 3.

I'm not entirely sure how well the line of code: "import six" would fix any compatibility issues nowadays.

aazaa · on Dec 21, 2019

> Now, in 2019, Python 3 versions are dominant and the Python Software Foundation has said that Python 2 reaches its official end-of-life on January 1, 2020 , so that they can focus their limited time fully on the future. Undoubtedly, there are still millions of lines of Python 2 out there, but the truth is undeniable: Python 2 is on its way out.

The Python 2/3 split is by far the most annoying thing about Python. I don't develop software in Python but about half the time I've had to use a Python library or program the problem of 2/3 incompatibility has cropped up. Some projects don't make it clear whether one or the other is required, leading to further confusion.

If anything the Python 2 EOL could make a bad situation worse. Like Khan Academy, each Python 2 package maintainer will be forced to make a decision: move to Python 3 or abandon and maybe move to an entirely new language. It think many will choose to abandon, leaving these packages to rot.

Second on the list are the multiple package managers (or things looking like package managers).

Third on the list of annoyances are native extensions, driven by the poor performance of Python itself. These extensions make it difficult to use certain libraries across operating systems.

So as a non-Python developer I don't look forward to the occasions when I must use a Python-based piece of software.

zo1 · on Dec 22, 2019

If you're installing a package via the package manager, it will very quickly tell you if you can install it on your specific version of python. Unless you're downloading some rather obscure and un-loved library where the author didn't explicitly state which versions of python they support.

Multiple package managers: There has only been 2 big ones from my long-term general usage of python. Easy-install and pip, the former of which is falling in favor but still semi-supported. Pretty much everything runs off of pip. What may be confusing you, and does confuse me at times as well, is their naming and the installation instructions as provided by library authors. E.g. some say python setup.py -install others just tell you to "pip install" it. Some would say use "setuptools", etc. Other would tell you to use things such as "conda" or "anaconda", pipx, and to create virtualenvs. All secondary, but things that should not ideally distract you from just plain using pip.

3. This has also been getting a whole lot better in the last 5 or so years. Microsoft has been funding dev-time to make the ecosystem for python (including extension compilation) much more pleasant in the Windows space. Also, the package managers and library authors are doing a whole lot better in that binary distributions are much more prominent so the compilation of the extensions never has to happen on your machine.

saagarjha · on Dec 22, 2019

> Unless you're downloading some rather obscure and un-loved library where the author didn't explicitly state which versions of python they support.

I take it that you've never tried the obscure and un-loved pwntools package :)

sergiotapia · on Dec 22, 2019

>The Python 2/3 split is by far the most annoying thing about Python.

It's _literally_ the reason why I went with Ruby instead of Python all those years ago.

stoicShell · on Dec 22, 2019

On a personal level, I have the same problems with using Python.

I've become enamored with package management in Go — not perfect yet but efficient, simple, to the point. The backwards compatibility enforced at the version level is also great — you often find Go code years old that keeps running just fine. I like things that you set up once and may just forget, that's where real productivity is found imho — it doesn't matter that I can do X in 1h if I have to do it every other day, I'd rather spend a full week or even ten, and solve it forever.

I think Js is comparably simple but I've heard so many horror stories about dependency management that I just don't know — I've yet to use Js in prod myself at work and I don't look forward to this day.

Here's the thing: it does not matter how great a language may be while I'm writing it, because that's 10-25% of my time; what matters is that everything around, from setting up dev environments to shipping passing by devops, especially as a one-man/small team, can be done "simply enough". And that, IMHO, is where Go is miles ahead of most other languages from a philosophy standpoint.

I tend to feel very positively about Rust for it looks to be an extraordinary intelligently lead project, with comparably 'real' benefits that extend beyond the code page (but I've yet to use it myself to confirm first-hand).

We see the importance of these topics so clearly with Py2/3: none of the problems between these two have anything to do with what's in the code, with programming; all of it has to do with the ecosystem, with the real and much larger task of maintaining codebases, managing teams and deploying 'stuff' in ways that work with the environment (whether tech, people, knowledge, politics, what have you).

The move to 3 by Python has been a failure in that regard, and IMHO it rests on the shoulders of an entire community who chose to stick to 2 now regardless of what happened then. Well then is now and the result is chaotic.

I'm not worried about Python itself — the language is incredibly popular, especially in academia, and we need to double the programmers population earthly each year so that's a sustainable amount of new projects written in Python 3 every day. Old 2 projects will be but a drop 10 years from now simply because of this number effects of growing tech at such an insane rate (it's been about true since the late 1940's, uncle Bob has a great take on it in his latest appearance in The Changelog podcast).

Anyway. I can't wait for py2 to die and py3 to become the only Python.

whatsmyusername · on Dec 22, 2019

> I like things that you set up once and may just forget, that's where real productivity is found imho

Very productive, until an unpatched security issue in a dependecy from 6 years ago bites you in the ass.

atmosx · on Dec 21, 2019

How does Khan academy make money to sustain the website? Is it all donations?

bhaumik · on Dec 21, 2019

Non-profit, with several $1m+ donations: https://www.khanacademy.org/about/our-supporters

patneedham · on Dec 21, 2019

To be more specific (for anyone else who hasn't checked the link yet), the "Lifetime giving" section has: 9 donations >10m, 4 donations between 5-10m, 20 donations between 1-5m

So looks like there has been at least 120m in donations!

atmosx · on Dec 22, 2019

Thanks

sosodev · on Dec 21, 2019

From what I understand it’s mostly donations but I think they receive some grants too.

tybit · on Dec 22, 2019

As much as I personally don’t enjoy writing Go I really can’t fault them.

I still find it interesting that for a relatively obvious feature set of fast compiles, fast startup and fast runtime there really isn’t anything mainstream out there to compete with Go.

I really hope something like Kotlin, Swift, ReasonML or even AOT JVM/.NET brings something to the table soon. Or perhaps I’ll just have to wait for WASM to really take off server side.

genuine_smiles · on Dec 22, 2019

> fast compiles, fast startup and fast runtime

All I can think of is D, but it’s not quite mainstream. Are there any other less popular languages that meet all 3 conditions?

pjmlp · on Dec 22, 2019

Delphi, Ada, .NET Native, OCaml.

amedvednikov · on Dec 22, 2019

https://vlang.io

sagichmal · on Dec 22, 2019

Still snake oil.

fouc · on Dec 22, 2019

Site looks pretty legit, can you explain why it's snake oil?

doteka · on Dec 22, 2019

Because it literally uses string manipulation to generate c code from source. There is no concept of an AST or anything - just string bashing.

gilbertmpanga12 · on Dec 22, 2019

Looks super lit with everything baked;- faster compilation, small binaries, and performance... Still scared to jump in

iteratorloopmap · on Dec 22, 2019

guitarbill · on Dec 22, 2019

in production is fast startup really such a boon outside of serverless? especially if you're already doing blue/green deployments, doesn't seem like it'll have much impact.

(depends what "fast" vs "slow" means - are we talking about milliseconds vs a second or two, or startup times so horrendous they cripple your devs' ability to iterate and tests?)

dangoor · on Dec 22, 2019

We essentially run in a serverless environment (App Engine), so fast startup does matter to avoid some unlucky users hitting the cold start.

guitarbill · on Dec 22, 2019

fair enough. you say in the blog App Engine has worked well for you and you're sticking with it, so i'm assuming you considered moving to traditional servers but found it unappealing?

dangoor · on Dec 22, 2019

Yes. Google Cloud now has multiple options for autoscaling servers (App Engine Standard, App Engine Flex, and Cloud Run) with the biggest differences being how they're deployed and specifics around the scaling.

We _could_ manage our own Kubernetes clusters and such, but Cloud Run is pretty similar to that and takes away all of the management headache. There is essentially zero code difference, should we decide to change our deployment strategy later.

We're using Google Cloud Datastore for persistence, and that automatically scales in both servers and storage, so it has worked out nicely for us as well.

tybit · on Dec 22, 2019

Local iteration is most important to me, but serverless is a great example too, as are CLIs (slow CLIs have recently become my pet peeve).

The nice thing about Go is it performs well in each of these use cases by ensuring nothing in its tool chain is slow, or produces slow code.

Slow is in the eye of the beholder I suppose, but I guess I’m using it here to mean within an order of magnitude of its peers.

jacques_chester · on Dec 22, 2019

GraalVM looks to bring fast launch, at the expense of long-run performance optimisations from JITting. For FaaS-y purposes that will be a sane tradeoff, for long-running services the startup overhead is amortised over requests.

pushrax · on Dec 22, 2019

How does WASM fill this gap?

> fast compiles, fast startup and fast runtime ... mainstream

C is this but it's harder to write secure/correct C, the standard library is smaller, and there's no canonical toolchain in the same way as Go.

tybit · on Dec 22, 2019

I guess I’m jumping to conclusions but it seems like a lot of lessons have been learnt since JVMs and .NET came onto the scene, and that WASM runtimes and the future languages that target them will prioritise speed at every stage.

_ph_ · on Dec 22, 2019

Go already cross-compiles to WASM, so if desired, Go code can be run via WASM. But on the server, you probably rather want to run the Go code natively. For the client, this should be quite interesting.

PudgePacket · on Dec 22, 2019

One doesn't just write WASM though.. WASM is like the JVM. You still need to write code in some other language.

pushrax · on Dec 22, 2019

Exactly. Hence posing the question (rhetorically, I suppose).

timwaagh · on Dec 21, 2019

Seems like such a waste. Is switching to python 3 really that hard? Is hardware that expensive? If this is indeed the right call it doesn't bode well for traditional scripting languages as the web scales to fewer high traffic apps. We might start to see more jvm, go (apparently) or even rust and c(++), rather than speed of development languages like Python or Ruby. Trend seems to be the reverse though, with python the second and most rapidly growing language.

neurobashing · on Dec 21, 2019

Everyone’s project/code base is different but in my experience there’s been a critical mass of libraries for a few years. I presume the “it’s hard to move to 3” is dev teams wanting a new toy as much as “the rewrite is too complex”. Library use, size of code base etc are all big factors but at the end of the day, I think team motivation is really the deciding factor.

freyr · on Dec 21, 2019

That mirrors my experience as well. Someone with influence is bored or wants to level up, so they'll drag the entire company into a long, expensive quagmire.

Unless the existing codebase is mired in technical debt and completely unsalvageable or cannot scale further, this seems like a very radical move.

lonelappde · on Dec 22, 2019

Replacing an old unholy mess with a new unholy mess is usually a bad plan. It's called the Second System Effect.

ksec · on Dec 22, 2019

Which is precisely why most of the Web Dev should not be called Engineers.

Alex3917 · on Dec 22, 2019

I worked on a fairly large codebase that needed to be rewritten from scratch when migrating from 2 to 3, primarily because all the tests were written using a test framework that was no longer maintained. So given that you might need to start over anyway, I think it's reasonable to consider other options. That said, yeah it's difficult to understand how KA's web server costs aren't already basically zero, and how their endpoints aren't already basically instantaneous.

dangoor · on Dec 22, 2019

> That said, yeah it's difficult to understand how KA's web server costs aren't already basically zero, and how their endpoints aren't already basically instantaneous.

I find that this is the outsider's view of a great many products. Things always seem a lot simpler on the outside.

In Khan Academy's case, I think a lot of folks just think of our site as being a collection of more-or-less static pages with videos on them. There's a lot more going on than that, though. We've got a CMS that supports articles with math and interactive elements, in addition to the videos... and many, many exercises with hints. All translated into dozens of languages.

We have to remember every exercise people have done so that we know which ones to present to them next, and we need to display that progress when they look at topic pages. Oh yeah, and if they're in a classroom, we need to present that progress to teachers (or coaches/parents, outside of the classroom). Teachers can also assign content.

Now, we're also offering features for school districts: https://www.khanacademy.org/district

Plus, there's the official SAT prep, which connects to the College Board directly to provide personalized guidance about what to work on... and that's only one of the test preparation areas of our site.

And, as you can imagine, there are a bunch of other features and aspects of the features above that I'm not mentioning. It adds up.

Alex3917 · on Dec 22, 2019

Fair, but what percentage of spend is actually on web servers as opposed to database, data transfer, static asset storage, caches, CDN, etc? The features you listed are kind of what I expected, but I still wouldn't expect the web servers to be more than 15% or so of your hosting costs. I know you guys get a ton of traffic, but on most web sites at least 90% of traffic is logged out and doesn't even need to hit the web servers in the first place.

dangoor · on Dec 22, 2019

I don't have recent numbers in front of me, but I believe our web servers are more like 40% of our hosting costs today.

Over the past year, we've started leveraging our CDN (Fastly, who have been great) a lot more. That said, for us a lot of logged out traffic still carries the weight of logged in traffic. A logged out user can start doing math exercises and we'll keep track of what they've done. If they then create an account or log in, that activity is associated with their account.

Khan Academy may look like a content site, but in many ways it's more like a "learning app".

tim333 · on Dec 22, 2019

According to their 2018 accounts, 'information technology' costs were $5m. Salaries were listed separately at $29m so I'm guessing the $5m was mostly servers.

Consultant32452 · on Dec 21, 2019

This is my experience too. Someone or a couple someones on the team decide they want to try out some new tech or expand their resume. Then it becomes a quest to justify the switch rather than a quest to make the best business decision.

tptacek · on Dec 21, 2019

What's really the ergonomic difference between "traditional scripting languages" and Go? I came up writing professional C code, and spent most of the last 15 years writing "traditional scripting language" code, and Go feels a lot closer to scripting than to C to me, despite compiling down to machine code.

Go has static types, and that distinguishes it from Python, Ruby, and Perl. But the trend now seems to be for languages to move towards static typing anyways; 2005-era Python was wrong about that.

zozbot234 · on Dec 21, 2019

> Go has static types, and that distinguishes it from Python, Ruby, and Perl.

It also has interface{} and that's a non-trivial commonality with Python, Ruby and Perl.

rtpg · on Dec 22, 2019

It’s usually the DSL argument. In Go you can establish calling patterns for errors, but you’re really limited in terms of providing libraries with nice APIs that prevent you from making mistakes.

In Python you have a lot of ways to make sure someone does a thing. Exceptions are good for making sure an error is handled. Context managers make sure a resource is cleaned up properly.

I might be wrong but I feel like writing something like jquery (with its fluent API) would be really tough in Go.

freyr · on Dec 22, 2019

> 2005-era Python was wrong about that.

Given Python's incredible rise since 2005, and even in the past few years, I'm not sure we can say they were "wrong." It's serves a purpose.