We turned our monolith into a bunch of micro services almost 6 years ago to the day. For a long time I was very happy with the new pattern but over the years the weight of keeping everything updated along with the inevitable corners that fall behind and have...questionable..security due to how long they sit neglected has really left me wondering if I am happy with it after all.
I would love hear some thoughts from others that made the move, especially anyone that decided to move back to a monolith repo.
A company I am affiliated with made a decision to rewrite their code in microservices-oriented architecture thinking it would only take one year. Now we're 7 years into the transition and starting to come up against some hard deadlines that threaten revenue streams. It seems obvious to everyone except the leadership and the architects that this has been an unmitigated disaster. Other comments on this thread seem to indicate that many have had similar experiences.
For a more in-depth analysis on the unforeseen challenges of microservices in particular, I would encourage a lot of careful research into how other companies have tried and failed at this. In particular, I might look at Uber's ongoing difficulties.
All I have to say to the Khan Academy engineers is to buckle up because frankly, moving from Python 2->3 is not that hard and you have no idea what you are getting yourself into.
Yes, the micro services + Golang vanity project because you think you’re Google. I really don’t think people understand error states in distributed systems very well, putting possible network partitions everywhere is not a great idea. I would strongly suggest trying a Golang monolith first and seeing if there are one or two heavily used services that need splitting off. Also monorepo. Always.
The really fascinating thing about this tendency is that Google itself never completed nor even really started the wholesale transition of programs and services to Golang/microservices. Google does have services that are micro with respect to the overall codebase. But they aren't what most people out there in the wider world would think of as micro. And Golang remains a niche language at Google, perhaps more popular than server python, but far smaller in usage than Java or C++.
Microservices have always seemed dangerous to me. "Bugs thrive in the seams between libraries, so let's put more and deeper seams in!"
Microservices have an immense cost, and you have to make sure they're worth it. Many teams years ago found it a nice pattern and implemented it because why not, and now we're at the "oops this isn't actually amazing" part of the cycle.
In my experience, the biggest benefit of microservices is decoupling teams.
Developer productivity is very hard to maintain in a monolithic app as the number of developers increases and the legacy code piles up. Breaking up services and giving each dev team control over their own codebases enables them to develop their own products at their own pace.
If you only have one dev team, microservices are a lot less attractive. However, there are still some benefits, such as being able to refactor parts of your codebase in isolation (including perhaps rewriting them in different languages), and the ability to individually adjust the runtime scale of different parts of your codebase.
I was already achieving that around 2008 by having each team responsible for their modules, delivered over Maven, or on the late 90's by having each team responsible for their COM modules.
No need to over-engineering modularity with distributed systems algorithms into the mix.
This. The problem with decoupling services is usually there end up being a couple services that are critical but not sexy.
No one wants to touch them so they sit around unmaintained until an unrelated change or unpatched security issue comes around. Suddenly you've got a big problem with a mystery codebase.
That sounds very familiar, but I'm not sure this is something that can be blamed on decoupling itself. An unpopular module is going to need as much attention as a micro service from code point of view. For upgrades / patching, there would be a company-wide process around it that doesn't care that much how the code is organised.
'company wide' process is either, 'squeaky wheel gets the grease,' or, 'no one even knows this exists,' 100% of the time in my experience. This is from going from 10k+ to 150 to 5k+ to 50 people.
I wrote company-wide, but it's not the case everywhere. At some scale you'll want department, or even project-wide process. But the policy should be fairly common - who owns it, what's the response time, how to escalate urgent things, etc.
Containerization, autoscaling, service discovery, tracing, metrics and monitoring et al - lot of it is required to do larger scale, distributed systems. Even if you do not call them microservices.
This is nonsense. You can already do that via libraries. The choice of RPC vs local procedure calls has no effect on scaling development.
IMO the only reason to use micro-services is the one they mentioned - you can have different parts of your system running on different machines so they can be spun up independently. But I think most people aren't "web scale" enough to need that anyway.
I've found there's a happy middleground. You need medium-sized-services that still share code libraries. For many companies, this is often 7 or 8. The key is to combine like business units/features, not necessarily fragmenting at every visible code boundary. A deployed "service" can really just be several HTTP paths and/or gRPC services in one repo. You still get to keep decent separation of work, deployment, versioning, dev focus, etc with these medium sized services without sacrificing the benefits of larger, more centralized/shared reuse.
I’ve accidentally landed on an architecture like this and I’m actually pretty happy about it. It was driven by a desire to kill a large monolith slowly, by extracting key features into separate services. Sold to the customer as microservices because sexy fad of the moment. Our real motivation was we had way too much trouble recruiting in the monolith stack (.NET) and had a surplus of embedded C++, python, and JS engineers. Anyway, turns out our teams naturally self-organize around four or five domain+language clusters that effectively form separate services that are too large to really be micro, but too dissimilar to play nicely together in a monolith. Eg, python data science module wrapped in a Flask API providing physics calcs, legacy C# service providing simple data models via REST, JS react front end served from a separate node service, a weird C++/python hybrid used for an embedded device sim service, etc. It’s not what I planned as the tech lead/architect, but I think we organically reached what is really the best approach for our team. Definitely an element of Conway’s law in action, but in a good way. We are making the most of our organizational structure rather than fighting against it. Would this scale to google levels? No, probably not. But we don’t need it to, and it’s incredibly unlikely we ever would given our specific business.
Start with a monolith that has clear internal APIs that are designed so they can later be made into network APIs. This gives you the development speed of a monolith while maintaining an options for the future. When you do break things out into separate services: try to make as few of them as possible and maintain the ability to build as a monolith.
Forget everything you have heard about micro services. Most of it is bullshit from people who don’t actually think for themselves.
This. If you can't design a well segmented monolith, you can't design a well segmented system of microservices either. The microservices will just be buggier and much harder to fix after the fact.
Design is best evolved. If you can get a lot of that done while you still are able to run a well structured system as a monolith you can save a lot of time. It is cheaper to change an interface in Java/Golang than a REST API.
Could not agree more. I recently led a refactor of a monolithic .NET MVC app and took this exact approach. We made all of the controllers thin, with almost no logic at all beyond specifying the route and dependency injection. Then redirected the request to a “service”, which originally was just a reworked combination of the old controller/model logic hidden behind a common service interface. Then, slowly we replaced the C# services with microservices. So we went from ball of spaghetti to monolithic service oriented architecture lite to actual microservices with the monolith converted into an API gateway. If we didn’t have independent motives for going to microservices, sticking to the clean and well organized internal APIs of the refactored monolith would have been totally fine.
I moved back to monolith and am very happy. I think of the monolith now as a collection of modules. The rule is now, one should be able to drag any of the modules to the top-level of our monorepo and create a new microservice pretty easily when the time comes. I think the microservices book (that came from that Uber engineer...?) suggests a rule of 5 engineers per service.
How are you preventing transaction couplings? For example, module A and B are called by C. C starts a transaction that wraps A and B. If you move B up as a network feature, you lose the transactionalty.
Good question, and this rule is meant to be bent in those scenarios. I try to avoid these dependencies if at all possible, but if not possible, the "writes" for those modules all belong to a single service, and any other service, depending on how "pure" I need to be in the project, will make network calls to the other service, or just grab that data directly from the database.
For a more concrete example, I recently built a service ("scraper") that scraped data and upserted a large tree of structured data to postgres in a transaction. Writes were only allowed from scraper, but "api" could SELECT data for reporting to the frontend "web" as much as it wanted. In the future, "api" might make be refactored to make internal HTTP request to "scraper," so they could have totally separate databases.
Our approach to services at Khan Academy is likely a bit different from most. We're sticking with a monorepo (the code for all services lives in one repository). We have a single go.mod file at the top of the repo, so all services use the _same versions_ of dependencies.
We're still building out our deployment system to better support multiple services, but we're planning to redeploy all of the services when library code changes (which is something we're trying to minimize).
All of this ensures that we don't have trouble with services lagging behind on critical updates.
I don't really understand moving to microservices if you aren't going to give the service teams the autonomy to make their own decisions and move at their own pace. Microservices always seemed to me to be more of an organizational strategy than a technical one - if you have services, but they can't operate independently, it feels a bit like you are just creating a distributed monolith.
We're making a certain set of tradeoffs. For example, we're not adopting the "write code in whatever language you want" form of microservices that some folks adopt, because we don't feel like we're large enough to support that.
Like I said, though, we do want to minimize the library footprint. The vast majority of deploys in this new world will be single service deploys, with the benefits that come with that. We already deploy our monolith several times a day. These services will speed that further.
It'll be interesting to see how it works. I would probably fight to not have shared dependencies because the relatively small benefit does not seem to merit the increased coupling between teams - if all teams needs to agree before a library can be upgraded, I can't imagine it will be very easy to keep libraries up-to-date. To a certain extent it depends on how large your engineering org is, which I don't know.
I've been in companies that moved from monolith to microservices and I saw it work well except when teams had such tight cross-service dependencies that they had to get other teams' ok before making changes to internal details of their service. Then developer velocity was slower than before because it took time to make the cross-team discussion happen and political capital to make other team care when they have other priorities.
We'll see how it plays out, but the situation that I've described isn't really different from the one we have today (because we have a monolith). Hopefully, it will be better because of how the Go project is trying to get library maintainers to follow semver and avoid breaking changes. When we need to upgrade a dependency, we can do so in one diff, catching the errors with the compiler and test runs. If the upgrade seems risky, we'll watch that deploy carefully, and we already have a process for "risky" deploys. Plus, these are likely some of the easiest changes to rollback if need be, because they are unlikely to change persisted data.
Ultimately, though, if we find that this plan reduces velocity, it won't be that hard to change later.
We have some dataflow jobs written in Kotlin (we blogged about that in June 2018[1]). We also have an internal service written in Kotlin.
We ideally want one language. But Apache Beam (which is behind Google Dataflow) doesn't yet have production support for Go. More importantly, though, we have no time pressure on switching the Kotlin code over, so that's a long way out.
So, no choice of language and no choice of the libraries used. What’s “everything else” exactly? Sounds like missing out on the more interesting parts of a micro service architecture.
Yeah, I'll never work for a company[1] where service teams are free to choose any language; 2 or 3 options at most is fine, but more than that is a hard no. I'll have to read/work on that code sooner or later, and I have no time to be dealing with a hodgepodge of languages
1. In the 10-5000 employee range: tech giants are a different beast when it comes to team accountability.
> I'll have to read/work on that code sooner or later
In the microservices organizations I've seen, this isn't true. The other service teams provide an API and, like any other SaaS you use, you do not need to be able to read the implementation. You would only work on that code if you switch service teams.
Sounds similar to some work I’ve been doing, so thanks for unknowingly validating my design!
At my employer, I’m spearheading a wholesale reimplementation of outdated process automation software, turning everything into Django web apps.
I’ve been working with a monorepo and monolithic deployments to maintain development velocity but recently started transitioning the CI/CD pipeline to deploy each application/service in the monorepo independently. The pipeline packages common assets (including, e.g., manage.py, common HTML templates, and the dependency spec...all housed in the same monorepo) into each app directory before the deploy stage.
Meanwhile, local developers clone the entire monorepo, and when they launch localhost, all of the services come online simultaneously. (That’s the goal, at least!)
I was already excited to see my work come to fruition, and now I’ll be keeping an eye on Khan Academy, too!
That does sound pretty similar (though our services all have the luxury of serving nothing other than GraphQL!).
Our current plan for local development is to continue cloning the monorepo and firing up all of the services. Go services don't take a whole lot of resources, so we think this plan will work fine for quite a while.
I'm in an org that runs multiple services in Go. The dependencies on library stuff has been a very minimal need. I think you are optimizing for an imaginary problem. With microservises, a team should have non breaking API versions running and work with teams to transition to a new API version when needed. If the underlying uuid lib or kaftka lib changes, teams may or may not need to update, but they can do so on their own time.
IMHO a microservice should follow the unix philosophy of doing one thing well. But interfacing with other services is not as simple as unix pipes. In particular it is more of a request/response communication. Consequently, interfacing needs to be thought through carefully and made as simple as possible. They should evolve much more slowly than individual service code. While you may use (g)rpc or http at a lower level, your specific protocols will have many more constraints. Ideally you have codified assertions about their behavior in tests early on.
Note that they are not all going to start/stop at exactly the same time and during development they may even crash so each microservice should survive such transitions. You may have two different versions of service code or even completely different implementations or two different version of the API in use at the same time. This can happen as different services evolve at a different rate. And you may want to transition to a new version in a piecemeal manner so as to not bring the whole service down. All these considerations complicate things. So ideally you have factored out these common tasks in shared library/packages. And ideally you write your code such that if necessary more than one service can be compiled into the same binary for performance reasons.
In a monolith some things become easier since everything dies at once! But somethings become more complicated - such as supporting evolving code. And monolitha require more discipline to keep things modular. Over time this gets harder and harder. Lack of modularity means you have to understand a lot more code and when you evolve things, more code will have to change and there may be unforeseen side-effects. And scaling can become harder.
I think that the fact you have much more visibility over which of the services are behind and have lower security practices one of the things I love about our microservices - we recently (a year ago) broke up our monolithic codebase into a service oriented architecture (I would hesitate to call our services micro personally) and I was astounded at all of the hidden security issues and random code in the far reaching corners of the monolith that hadn't been touched or thought about in years.
It is much easier (imo) to pop into a repo of one of our services and look through the code in it's enitrety and see when things were last touched and where the issues are. I would make the argument that the "inevitable corners that fall behind and have questionable security" is something that is inevitable in any codebase that grows to a certain complexity, and microservices (or SOA in general) make it much easier to see those things as they are decomposed.
Well, I can only agree with all that... But the move to services adds a lot of surface that needs it's own security and architecture maintenance.
The more you break down your code (the smaller the size of the services), the more maintenance need is created from the division, and the easier it is to fall behind on it.
A recent project I was exposed to has been struggling with Microservices and a multi-repo setup. Even with CI/CD and a lot of good tooling around their setup.
The overhead introduced with having such a setup in a corporate environment that has not-so-well-though-out requirements and design is ridiculous. Keeping track of dependencies, arcane knowledge of inter-service dependency quirks being siloed and hidden, keeping individual services up to date, dealing with older services and their interaction with newer services till they "migrate" to newer tooling/common code, etc.
Everyone then skirts around the fact that the problem could potentially be Microservices or a micro-repo setup. Instead, they throw process, sign-off, complicated promotion pipelines and just plain warm-bodies at the problem in an attempt to mitigate it. But the damage is done, velocity has slowed to a crawl and everyone is miserable, especially when having to explain the whole thing to newcomers.
The best ideas can be implemented poorly. Software design principle that works for nimble Silicon Valley startups doesn’t work in your big corporate environment? Big surprise.
When I worked at a pretty large software co, a team there were always adopting the latest techniques and tools but their deployment pipeline was an over architected disaster that nobody could reliably deploy. Reason? Their CI/CD system consisted of a person manually clicking around to build Jenkins Jobs. There will be other such silly nonsense (hopefully less extreme) in other corporate environments too.
Like many things in life, there are no absolutes. So rather than going to opposite ends of the spectrum, check out modular monoliths.
This approach is a nice balance (IMHO), since you start off with monolith but it is broken down into cleanly separated modules. Each module can potentially become its own micro-service if or when the time comes.
In terms of implementation, this can be easily done, for example we do it JVM/Kotlin where each microservice is its own project that produces a binary/jar. All the projects are part of a multi-project build. Lastly we have a common project for shared code, utils, types, enums, interfaces, etc and an application project that loads/sets up all the microservices from each project. Works great so far. And when you do have break up 1 project into its own service, the effort is fairly manageable.
Same. I tried microservices before and while it cleaned up the code base, I didn't quite like the results down the road. Some things go stale, you multiply ops * number of microservices, a change in one can mean a change in multiple others. I'm not against services in general, but not a fan of so called microservices.
I’ve been through a transition with about 60 devs that went DDD plus microservices.
A few monotliths ended up as a couple of hundred services, and looking back I feel we got basically all positives.
What other people say about scaling teams is true, but I have a few other points as well:
- personally, I spent 6 months writing tooling for service lifecycle management and setting strict conventions. This was before the microservices decision and I was recruited to do “devops” which gave me a lot of head room. :p
- when talks about microservices surfaced I had to fight for several months to get the lead devs on-board with tooling and conventions.
From the infra/cm/systems management side we’re used to manage many thousands of “configuration items” - devs are usually not, and many underestimate the value of proper automated lifecycle management.
- once everyone was on-board we all used a common language. Big win.
- I could form a “devops” team to help develop the tooling further and the infra platform as well.
- almost all teams worked in mobs - amongst other things it really helped with the ownership part, something that’s absolutely crucial. Well defined domains and accountable mob teams, just awesome!
- quality rose by a mile. Small commits and somewhat robust tooling under the eyes of a mob.
- graph the pains. If outdated versions are a problem - put it on a graph, green, yellow, red. Show services in relation to other services - the application is the sum of all connected services.
Can you explain what you mean with service lifecycle management in regards to micro services please, or do you have a book on it? I'm currently studying SE and it's the first time it has come up. Thank you :)
Well - I’ve got a “service management” background, so it’s completely natural to talk in these terms. :)
A service have in a way two interfaces - one business (the work it’s doing), and one technical (how it does it; a port publishing an endpoint or whatever).
No business process to support, no service to manage.
The lifecycle of the technical service will consist of a bunch of actions that will be taken, perpetually, until the sunsetting/decommissioning of the business process.
Actions: init, code pushed, deploy, monitor, trace, update, decommissioning etc.
You take these actions and put then in a lifecycle circle and you have a nice powerpoint!
As much as possible, preferably everything, in this cycle have to be governed by conventions and automations.
- Automatic follow-up if a service have no upstream or downstream services for example. Why do we have a dangling service?!
- Or, you have a key service tied to an SLA, but upstreams are not matching this?
- you have services that have not been touched in a timely maner.
- etc... just drop the relevant team an automated slack message with the option to initiate whatever is required to keep the lifecycle churning.
With many thousands of assets/ci (configuration items) almost everything have to be automated or you will grind to a stop eventually.
If you can couple business process to automated technical service management - big wins!
To me, it’s kind of what ”DevOps” should be about, from a technical perspective:
Take the best/reasonable parts from ITIL (concepts/principles), mix it with principles from the agile manifesto and the 12-factor app.
Automate the lot of it.
Doing this in practice gives you dev & ops.
It’s quite a journey that is more difficult the bigger you are. It scales though, so start small, prove the concepts, and grow organically.
The more i write and talk about it, the more I realize that it is about ”externalities”, so to speak.
A microservice is just code - one small piece that does one, tightly defined thing.
We’ve been doing code always and smaller pieces of is easier to deal with and reason about.
The structure around keeping 100s or 1000s of moving pieces in concert is where a lot of the work is shifted.
It takes teamwork as well as a common vision and language.
The above sentence tangents “culture”.
> I would love hear some thoughts from others that made the move, especially anyone that decided to move back to a monolith repo.
We had a big monolith where I work.
We’ve been slowly, but surely isolating parts of the monolith as separate deliverables, extracted into their own repos. But only when appropriate, and not as a forced exercise.
The remaining “monolith” is still pretty big, but it does (mostly) represent one logical deliverable, so effort to split it up has some what stalled.
There’s been small points of friction, but nothing near as painful as we used to have it. No way we’re going back.
So everything in moderation. Micro services architecture is a tool. Use it when it’s the right one.
It's very limited when you started to do complex thing. Example, let's say you are building websocket. You will have a hard time to write type safe websocket handler to process the payload from client for all the events...
I started to do Rust/Crystal and both of them are better than Go(performance, type system).
Yet, whenever I build something for work, I come back to Go :-(. I told myself to use Rust or Crystal.
Then I realized that Go is a practical language. It compiled fast so it makes testing easier. The cross compiler just make it so easy to build binary run on everything thing. And the limitation of Go makes it very consistent on how you do thing. This makes working with Go become faster event by the fact that it slows you down on other parts.
So I think Go is a language that people easier to fall into because it has the speed of interpreter language like Ruby/Python(or even faster) during development and have a better performance/type safe story.
Not for any meaningful work in my experience. As a matter of fact, I found the change/compile/run loop in golang to be slower on projects I've been working on due to the fact that it doesn't support incremental compilation, so any change I make ends up recompiling the entire program and writing out a 100+MB binary anyway. Compared to a Scala project I worked on before (and Scala is notorious for slow compiles), after the first compilation, all modifications happen very quickly as only the respective classes are re-compiled.
> Here's Uncle Bob take on testing with Go
Again, this doesn't apply for any non-trivial/large project. On a project I'm working on, it literally takes 7-8 minutes to do a clean build + run all unit tests in golang.
Go absolutely does incremental builds by default and has been like that since I can remember. Packages are only rebuilt when their source or their dependencies change.
Same for tests which are cached by default so during typical development only a subset of tests are executed and compilation time can be a big part. Leave full tests for CI.
An anecdote I found from 5 years ago:
On my 1.7GHz processor it takes 10 seconds to build the whole standard library from scratch (300k lines of code).
just trying to understand - you guys think moving a Python2 monolith to Python 3 is too painful, and so you are going to port all the code from Python2 to a completely new language (Go), change the architecture (monolith -> microservices) and move the HTTP API to React + GraphQL, all in one year?
2020 is going to be in an interesting year at Khan Academy ;-)
The move from Python 2 to 3 would likely have also involved changing the architecture so they could migrate components incrementally. Since they were going to do that regardless, and if they already wanted to change the interfaces from HTTP to GraphQL, this is a natural time to do it. Though, this migration has nothing to do with React--they were already and will continue to be using it.
That isn't what they said. They carefully explained that they could migrate to Python3 but that the benefit of doing so was small so they looked at the performance benefits of using other languages and decided that the performance benefit was large. The performance benefits of using Go (or Kotlin) were the deciding factor.
My guess is that the framework for them thinking about this is that they have already been thinking about migrating languages to get better performance and the Python3 migration seems like completely wasted effort if you then throw it all away to go to another language shortly after.
2020 is absolutely going to be an interesting year.
One thing that might not have come across clearly in the blog post: we're already well on the path to using React everywhere (we started using React a week after it became public… 6 years ago?). We made the decision to move to GraphQL in 2017, so we've already got a lot in our GraphQL schema. Finishing those switchovers will make our move to Go happen more quickly.
> Of course the obvious thing missing in the article is how they expect to deliver new business features while recoding everything in a new language.
For new features, the new parts of the GraphQL schema for those features will be written in Go as part of the new services. Our frontend is already in large part a single page app in React which requests data via GraphQL, so the frontend for the features will look just the same as it would on our monolith.
> If we moved from Python to a language that is an order of magnitude faster, we can both improve how responsive our site is and decrease our server costs dramatically.
I see people say things like this a lot but my experience is that while other languages are 10x or more faster than python in some benchmarks it's very rare that computation time dominates server latency or that servers are running at 60%+ cpu across all cores.
If 90% of your service latency is not directly on the cpu and/or you haven't profiled to see that the performance bottleneck is evenly distributed across all tasks, then it's super dangerous to migrate to a new language thinking that will fix it.
I hope people inside Khan Academy know this and it's just a clickbait blog. If they really think "go is 10x faster than python so we'll only need 1 server for every 10 when we migrate" then I think they'll be disappointed.
* It’s not just that Go is faster to run but also faster to iterate on. If python can be neither, its offering little benefit.
* they moved from a monolith to a microservices architecture; concern that any of the services in the request path could add latency just because of the overall runtime speed is slow is a legitimate one.
* their primary deployment method is Google App Engine where you are billed by CPU used. Any change that consumed less CPUs has a tangible effect on their costs
Having worked in larger async twisted Python and Go, the experience of our teams is that Go is faster to iterate on by a long shot. We've replaced most of the old Python. We just brought on some new devs on my team. They were able to make new contributions to the Go stuff in short order. The Twisted Python, not so much.
It’s not bland it’s direct. What you’re doing is conflating personal preference with actual language features. Most developers who aren’t us would say python, being a higher level, dynamic scripting language rather than Golang which is lower level and extremely explicit about the data your program is using. Iteration in go is simply harder as you have to be more explicit rather than sketching something out. Changing that more explicit stuff is harder if you got it wrong while iterating. Why do you think Golang is better at iteration?
Add in extremely poor error messages, lack of generics, having to generate loads of code for various things, the ability to crash whole services if your program does something incorrect, excruciating error handling will all slow you down.
It’s funny that none of the things you mention in your last paragraph are handled any better in Python. Without type safety, you instead have to deal with bugs caused by inane mistakes. If we’re talking about generics, the conversation has shifted from “writing scripts that does x” to “maintaining a production system” and for the latter, golangs type safety, out of the box excellent default tooling and easily grokkable concurrency primitives make it far more maintainable than Python.
Things are also easier to change in go because of the type system and interfaces. The former catches most the obvious incompatibilities, the latter ensures that abstractions don’t leak across different system boundaries; whereas in Python there is a tendency to pass a do everything objects across the system.
Error checking has improved substantially with error wrapping in go 1.13. Not only can you locate precisely where your system failed; you have to be explicit about handling errors. I do concede that pre error wrapping the error handling was garbage.
Clearly I don’t agree but it’s an interesting and detailed answer. I do agree that type systems help with refactoring but not iteration. There’s a fine line there. Personally I think Golang’s type system isn’t as good as it could have been... I do like the idea that Golang reports to provide quite good (simplicity above everything), if they added proper macros to replace code generation and to replace the desire I have for generics it would be a much more useful language for my needs.
Dynamic languages generally enable faster iteration in the early stages of a project, but once you have a large, mature codebase, static types allow you to work faster and produce fewer bugs. A more performant language will also run your test suite faster which can have a big impact as a project gets very large.
Also, while I agree Go’s error handling isn’t very elegant, it does force you to explicitly consider every potential error, which in my experience makes uncaught errors far less likely than a language with bubbling exceptions.
holds true considering the total lifespan of the project. Golang is more explicit thus requiring more time to define every type but I've never refactored so fast and safe a codebase. In Python the fact that is dynamic makes it more difficult to safely iterate over it (is more statically-typed Vs dynamically typed). About error handling, it's not perfect, but the code is readable, easy to follow and easy to reason about.
> Iteration in go is simply harder as you have to be more explicit rather than sketching something out.
I will completely agree that prototyping in Python is way faster. Python is my preferred language for throwaway/prototype code.
> Where do you get this idea from?
It's solely based in my experience. I hope it contributes to the conversation.
Well Brad, that's very dependent on what exactly the service is. Moreover oftentimes low CPU utilization is actually a limitation of the implementation on a slow language (eg; Python technically does have async webservers but adds a lot of idle overhead).
Indeed, there are many benefits these more performant languages have over Python aside from raw single-core performance. For starters, more efficient concurrency and parallelism can help reduce average latency when combined with a quality async webserver. Then there's gains due to shared memory across threads.
So in many cases-- absolutely, you can only need 1 server vs. 10 when you migrate. It's thus not fair to say that these gains are "very rare".
I inherited a few python/django servers. One of these has workers that grow to about 1Gb over time, even though they retain absolutely no data in core (or shouldn't). The same server is used to collect data, convert things and analyze the data. Especially the latter can take a bit of time, which means that there is a problem when more than two people try it at the same time, since it severely hinders the other tasks.
I'm now converting one server to Go (although not the heavy one), and it really runs fast and uses much, much less memory. It also starts in in less than 1s, whereas the django application takes 5 minutes, because of some stupid problem in static file collection.
Python is fine as a teaching tool, to prototype in, and to use in notebooks as a wrapper around numpy, scipy, etc., but not to run in production.
> while other languages are 10x or more faster than python in some benchmarks it's very rare that computation time dominates server latency
Most applications spend most of their time waiting for the database or network. I suspect the fastest programming languages are those that have the lowest thread/process overhead. If most apps spend their time waiting, then a language with 10X lower process overhead can handle 10X more processes.
> You can also avoid using separate process for each client (NodeJS).
Yes, NodeJS solves one problem by having low process overhead, but it also fails to take advantage of parallelism in modern processors. Ideally, I'd like to see a system with both.
Sorry if this wasn't clear: Go is 10x faster than Python, yes, but we know that we're not going to reduce our server count by 90%. 50% is quite possible, though, given Go's superior threading and its good resource use. Moving away from the monolith should also give us new optimization possibilities.
Migration from python 2 to 3 is easy and fast. I've migrated multiple large apps and it took about a day each. Most libraries that matter have been migrated. Some don't even support python 2 anymore. It's practically 2020. This should not even be a consideration. After 2 to 3 is done they should consider again If they want to redo the stack but first I'd focus on this small maintenance task.
Hahaha, sorry but this is a very cute thing to say, in my view. At our company we just barely finished migrating our software with nearly a million lines of legacy Python 2 code to Python 3. This took over a year of nearly exclusive migration effort, just making our code work with both. The entire migration project started way before I joined the company several years ago.
So, no, things are not as simple if you're not dealing with toy projects. And no, you can't assume that it's the same for everyone if you're not in their shoes.
Your comment is pretty much the equivalent of "I don't see a bug. Works for me."
It very much depends on how good the codebase is. I also spent a year on and off porting a large codebase from 2 to 3, and it would have gone an order of magnitude faster if the codebase were in better shape.
I agree that code quality is a big factor. But what is good code quality in Python? In our case, the oldest code is the most "pythonic" and is at the same time the worst to maintain. The better code mitigates the drawbacks of dynamic typing and by that moves away from the pythonic standard you see in many libraries.
But even if you nail the types to the board (e.g. assert isinstance(...)), use (the somewhat weak) Mypy wherever you can, and have good test coverage, you still have to grep your code base for usage of, e.g., .keys(), eyeball hundreds of modules for subtle Unicode madness or hunt for the odd division, replace every sort() that doesn't use key= yet, etc. The todos add up and someone has to go into the code and change those lines.
What specific “pythonic” habits have you found are more difficult to maintain? Asking out of genuine curiosity, not to challenge the premise. I work with a lot of data science people that really emphasize being pythonic, but coming from the software/static typing side of the house I always find their code style and architecture a little concerning, and I’m not sure if I’m just not getting it or if they really are writing spaghetti.
One of the biggest footguns is method naming. Most Python libraries will gladly use generic method names like "add()" or "getName()". The moment you need to rename the method or change the signature, you will have a hard time telling it apart from all the other method calls by the same name. No type inference will save you here because type inference is incomplete and will never let you find all the callers.
What you should do is use unique names. But that will give you ugly code like myFoobar.foobar_addBar(). The kind of code that makes the pythonic crowd cringe.
Another problem is making code too generic with regards to what types it consumes, instead of nailing it down to the few types you're ever going to use here. This makes it hard to reason about your code months and years down the line. How is this method used in the rest of the code base? Do all callers expect an int? What if my method now happens to return a float?
And there's also abuse of duck typing. Throw around a lot of objects, sprinkling methods and other members on to them as you go. Then when you consume the object, just look if it has the method you want to call. This makes any kind of type checking and static type inference useless.
And then there's a whole lot of Python 2 libraries where you get the feeling that the authors didn't give too much thought about whether they are dealing with str or unicode. The method might just call .encode(...) on one of its arguments without being too sure what it is.
And every one of the mistakes that result from the above practices might only pop up when your code has already been shipped to the customer site.
Of the things you mentioned, only Unicode has been a problem, and that's exactly what 3 is fixing, so that's to be expected. The rest was automatically handled by 2to3 with only a cursory review.
We tried 2to3 and it gave us poor results. But probably because we deviated too far from being pythonic.
The str/unicode misery is one of the biggest gripes I have with Python. I'm glad this unpleasant knot has been mostly untied in Python 3. I came to the conclusion that the transition would have been much easier if Python 3 just concentrated on the separation of bytes and (unicode) strings. The other features could have been in Python 4.
It's a bit like IPv6. If it would just solve the address space problem, most would have moved to it already. Instead it comes with a lot more baggage. And each additional feature has it's own uphill battle for acceptance. So nearly everyone is dragging their feet, citing their pet peeve with the technology.
I'm not sure that's true because, as I said, most of it is automatic and worked well with 2to3, leaving us to deal pretty much only with Unicode. I'd certainly prefer to only have to do this upgrade once.
How do you automatically go from sort( ... some elaborate compare function ...) to sort(key=some completely different function). Yes, there's a wrapper, but it makes the code more convoluted instead of transforming it to the key-paradigm. And if you want to sort by several keys, now you will have to call sort several times.
How do you automatically infer the intention of somedict.keys()? Is it going to be used as a list or as an iterator?
Those are just off the top of my head. I don't remember all the cases where 2to3 tripped over and produced garbage. But there were too many cases to put actual faith in automatic conversion.
It might work if your code is kind of new and homogenic. But looking at how much trouble Dropbox had, even with all the tooling and Guidos they could muster, I have the feeling that your positive experience with 2to3 might rather be the exception than the rule for old and big code bases.
> How do you automatically go from sort( ... some elaborate compare function ...) to sort(key=some completely different function). Yes, there's a wrapper
Yes, there's a wrapper. You use it, add a comment "this is wrapped in the migration to 3" and move on.
> How do you automatically infer the intention of somedict.keys()? Is it going to be used as a list or as an iterator?
If it's being iterated on first thing, it's an iterator. If list methods are called on to it, it's a list. This just hasn't been a problem for us, sure, it took some looking at, but it wasn't more than 30 seconds per case.
> I have the feeling that your positive experience with 2to3 might rather be the exception than the rule for old and big code bases.
Maybe so, but the codebase was ten years old and hundreds of thousands of lines.
The two projects i migrated were good quality code at the hundreds of thousand lines of code scale. Though I find that metric a bit inappropriate. I agree the statement was a bit generalising and not appropriate for every project and the million lines projects should have been excluded To be honest - I find a year long dedicated migration effort a bit excessive. But who am I to judge.
My experience was smooth with just a few hurdles around byte/string issues and that was it.
FWIW, our codebase is of similar size to that and we estimated it would take around a year to migrate, which is why spending a bit more time and ending up with a Go-based system on the other end was appealing.
Khan Academy rationale might be false, as the transition path might be easy from Python 2 to Python 3.
But in the end they will have the same stack as before and that's what they clearly try to avoid. Given that it makes sense to transition from a dynamically typed language to a statically typed which offers more compiler feedback.
Why do they call them "micro services" and not distributed systems? Oh right, it's because distributed systems are obviously really hard to create correctly and no sane person would ever agree to pay for that.
Nice: re-branding. I can't wait for the, maybe "consolidated computing" manifesto (aka turning micro services back into monoliths).
What if the services you are writing are independent in that they solve separate business problems, are built by separate teams, have little to no data coupling (e.g. Only basic auth), have different scalability profiles, etc? Separate services are really effective for these cases. Neither micro services nor monoliths are silver bullets. Instead it's possible for each approach to be the best approach in a particular business context.
In several decades of it experience , I've not known or heard of a nontrivial system like the one you describe in the first part of your message.
In the latter part, that must be a disingenuous dichotomy. You don't really believe that just because an avenue exists we should include it in an evaluation?
People are willing to pay for it when you have 50+ engineers trying to push code through one deployment pipeline. There's an inflection point somewhere at which the cost of sharing deployments is no longer worth it.
I actually think this is where good software design and bounded contexts come in. You can perfectly well run a monolith with hundreds of developers if each of the sections is very well contained. There is no need to add network partitions everywhere to enforce this!
This. Usage of multiple nodes, networking, redundancy, etc. is because of operational concerns. It shouldn't be over "development concerns", which is exactly the wrong cause to embarge upon such a long-winded journey!
Why are you getting hung up on nomenclature? The point of the article is clear. If you feel that using 'microservices' is too trite or "buzzword-y" then that's about you and not the article.
I think some of the misunderstanding in these comments comes from not fully appreciating the perspective of not-for-profit organisations. While I can't speak for Khan Academy, I know that in every NFP organisation I have worked for there is an acute awareness that funding could dry up one day and the prime directive is to ensure that in a scenario like that, the work of the organisation can continue.
In this case, it leads to a higher concern about minimising the cost of the operational services than you might have in a for-profit organisation. In all the strategic planning I have been involved in with NFP, we always have the "what if worst case scenario arises" plan and in that plan the ability to scale down to bare minimum operational cost is key. It may not be conscious but I suspect that may be part of the reason the performance savings from moving to Go are so attractive in this case, where most profit-making companies just ask the question of whether they can afford to pay for the servers with their current margin or not and if they can they have more important things to worry about.
- The decision seems to be primarily a software architecture one, without much mention of all the other architects whose input will shape how the finished product is run and supported. In a modern software development environment, all the other parts of the org should be consulted on greenfield work to "Shift Left" anything that may need to change down the pike. Design in a silo leads to ineffective products.
- They're going from "hmm we need to upgrade from Python 2 to Python 3", to "we need to redesign everything in a new language with a radically different software architecture". This is definitely the second system effect. It's going to take years to make this thing reliable and sunset the old product.
- They're porting over the logic? Even if this is actually the right move, wouldn't a clean-room implementation potentially give better outcomes?
- Why are they continuing to use App Engine if the writing's on the wall for 2024?
I don't disagree with your pitfalls, but I do think we're working to avoid them.
To your first point, "The decision seems to be primarily a software architecture one...", this project has had involvement of the whole engineering team since the beginning. The whole org is on board with this change. It's definitely not happening in a silo.
> This is definitely the second system effect. It's going to take years to make this thing reliable and sunset the old product.
I hope not, but obviously we're not done yet, so I can't say how long it will end up taking to completely decommission the Python 2 app. What I can say is this: there are aspects to this project that are _simplifying_ our system and, for what's left moving from Python to Go, our intention is to port the business logic as close to a straight up port as we can get.
> - They're porting over the logic? Even if this is actually the right move, wouldn't a clean-room implementation potentially give better outcomes?
_That's_ second system effect, to me. We can't change everything and fix every problem now, so we're focusing on the changes that will help us move from Python to Go faster.
> - Why are they continuing to use App Engine if the writing's on the wall for 2024?
I don't think Google Cloud is disappearing in 2024, for one. Beyond that, again, we're not changing everything about our architecture. The way our data is stored is staying the same.
> The whole org is on board with this change. It's definitely not happening in a silo.
Given the seemingly strong chorus of voices responding with cautionary tales about why you might want to rethink this plan, and the number of engineers in your organization, it seems more likely that you have some dissenting voices who have either been too scared to speak up or have already been shot down.
What I meant by "the whole org is on board" isn't that there weren't other opinions. There have been multiple opinions on almost every decision we make (and we have an open process and document our decisions in a style very much like this one[1]). In the end, it's not about "shooting down" alternatives, since that's loaded language. It's about making what we think is the best choice we can with the information available to us.
Even in this thread, there's a chorus of voices sounding caution based on their limited information of what our situation looks like, but there are others who see why we're doing this, based on the same limited information.
We absolutely do know the risks of this project, which is why we're doing this as incrementally as possible.
Go + AppEngine is the most unstable combination i have ever seen. While we tried to deliver project during 1 year, it was almost fully rewritten couple times because of new Go or AppEngine API.
Having NodeJS with far less problems.
And AppEngine has huge price tag.
Looking at the case where khanacademy is migrating their server only after about 10 years. I realize more that I don't have to worry that much about being locked into certain technologies (unless it's clearly untransferable, e.g. storing part of customer data in 3rd party server), because after all, we might keep it only for about 10-20 years, and the thing I'm working at almost certainly will only last < 2-3 years.
> We’ll only generate web pages via React server side rendering, eliminating the Jinja server-side templating we’ve been using
I’ve been down this road. Deep down this road. Let me just give you a heads up on something I didn’t consider at the time: Most template languages do not parse every single node, one by one. In a sense they are just doing string concatenation. Not so with server side rendering and React. I’m not saying it can’t be done but just realize it is going to take a lot more compute power. Caching is great of course but won’t help you if you plan to customize user content during the server side rendering as well. My recommendation is that you don’t do any user authenticated stuff during SSR.
Also consider how you are going to handle cookies if you do plan to make authenticated requests to server side rendering. Also solvable but for some reason people had the hardest time understanding why we had to forward cookies to the domains we controlled in an API request and definitely not to any other servers.
I’m not sure I would pick React for an SEO driven website. It is hard to get a competitive “time to first byte”. Unless of course you can pre warm a cache of every one of your pages.
Lastly, you’re going to need Node for the SSR. I’m sure you know this but that might take you out of app engine and into cloud compute. Not a big deal but thought I’d mention.
Good luck! It is doable. If you ever want to chat about how we solved some of these problems I’d love to save you some time if I can. Hit me up in my profile email.
We've been doing SSR for quite a while now and are improving our CDN use as we go along. We already took steps to ensure that there's no user-specific information showing up in our server-side react rendering which would damage cacheability.
Our frontend infrastructure team essentially owns the React render server. I'll let them know you offered to chat.
The article says this: "Moving from Python 2 to 3 is not an easy task."
I disagree with this. It's a Python project's dependencies that make it hard to move from 2 to 3, and most libraries have been updated.
Of course, you could argue that it isn't easy to migrate a codebase from one major version of a language (or framework, or database) to another, but when you eliminate easy from your vocabulary it becomes harder to describe different levels of difficulty.
migrating from python 2 to 3 is such a large task that migrating to any other language is a comparable effort. this is not just a library problem the language itself changed significantly
source: no python services at my company are going to be migrated to python 3; it’s all moving to a JVM
> migrating from python 2 to 3 is such a large task that migrating to any other language is a comparable effort
I'm going to call BS on that one.
If you're having issues with Python 2, then it might make more sense to switch to another language instead of upgrade to Python 3. But going from Python 2 to 3 is much easier than switching languages completely.
Python is not a perfect language. There is no perfect language. It sounds like your company just had a reason for switching to a JVM language and the Python 2 EOL was a justification to start.
I'm skeptical too but if they're doing a lot of communication with other services and they relied on bytes and ASCII just working and the code isn't backed up by tests then I can see them having a very bad time going from Py2 to Py3.
N=1 and all that, but I ported 500k lines of a Python 2 monorepo to Python 3 this year and it took like two weeks, including a week spent reading Eevee’s post on the subject half a dozen times and playing with six and futurize.
Migrating to 2-3 can be a large task in some very rare cases possibly, but for the most part it is practically effortless if you don't have to support both simultaneously. The biggest hurdle might be the "fear" of the unicode change, but that can be dealt with.
Source: All python services at my current workplace are in the process of being migrated to 3.*, and I'm doing one of the main ones at the moment and it's a breeze, including compiled c-extensions.
What? I've done it on some sufficiently large code bases, and small ones, and it was done way faster then a rewrite. With tools like 2to3 you can assign it to an intern and have it done pretty quickly.
Numerous large projects and companies have publicly stated that they are stuck on Python 2 and it's easier to migrate languages, even to ones that they have to invent (Go) than to migrate to python 3. At least one of these companies had Guido on their staffs for years. Another, also with Guido on the staff, needed over three years to migrate from 2 to 3. The overwhelming body of evidence shows that migrating a large project from 2 to 3 borders on impossible, but there's always someone willing to pop up on HN to say how easy it is.
Did you know that Django was not only successfully migrated from Python 2 to Python 3, it was ported in such a way that for many years it used the same codebase in both languages ...
Perhaps that's the biggest advantage of porting from 2 to 3. A lot of the code could run in both languages.
The corollary of my complaint is there will always be someone who pops up on HN with no idea how many lines of code are in a "large project" like Dropbox or YouTube.
Is Dropbox open source? Is Dropbox even a typical Python application representative of the challenges of porting from 2 to 3?
My hunch is that the challenges of porting Dropbox to any other language have to do less with Python more with the need to deal with a filesystem at a lower and more granular level than what typical programming languages offer. Thus everything needs to be rewritten in bazillions of ways to handle the bazillion corner cases.
I've been in a similar boat. We've been splitting up or converting large Python 2.6/2.7 applications into Go services (and doing the same to large Perl applications) for a long time now.
Go has consistently been 10-20x performant (allowing for dramatically reduced hardware needs), easier to maintain, and more productive to produce code in than our previous Python (Twisted) and Perl (AnyEvent).
Hopefully KhanAcademy has solid telemetry data in both legacy and new code so they can quantify benefits. They will also have a learning curve for managing multiple micro services vs monoliths. Accessing shared data will be a problem they will likely have to solve. We've opted for each service controlling its own data - no reaching into another service's data behind its back. Everything through APIs. This gives the microservice the ability to alter its datastore as it needs to and not be blocked by other teams' need to update how they access the data.
Debugging a distributed solution is much harder than a single service. Distributed tracing, consistent structured logging with log aggregators that let you do fancy searches (like Splunk), and application telemetry and metrics will be even more important than before.
I heard about this project from a friend who works at KA. I am concerned about the strategy, and I think the following approach would yield better results:
1. Write in Go an exact reimplementation of the current Python codebase. Use the same database schema, front-end HTML/JS, test suite, and so on. To whatever extent possible, use the same names for classes and functions. Check the reimplementation correctness by using a comparison tool that calls both the Python and Go version of a page/function/search and making sure that they produce the same results.
2. Change the production code over to the Go version, perhaps using a ramping strategy where X% of servers are running the Go code, and you gradually increase X, while monitoring vital statistics like server load and response time.
3. Now that the production site is running Go, incrementally split off components into their own services.
This approach leads you to the same destination, but with a lot less risk. It is very unhealthy to have a situation where the production site is running one codebase but all the developers are working on another codebase. Note that you will realize the benefits of Go (performance, type safety) after step 2, which is much sooner than OP's plan.
Joel Spolsky's classic essay about how you should never do full codebase rewrites is worth reviewing:
100% agree. We've just finished to roll out our implementation in Go migrating a subsystem from PHP and receiving around 150req/second and demultiplexing those request to 1500-2000req/second to legacy backends.
The key to the success of the project was that the API was an exact match, and we could compare both implementations for exact requests. The deploy strategy of the new version:
- Reply the real traffic to the new Go service comparing the results with the old one
- Then implement a toggle feature than enabled different traffic sources to use one backend or the other
- Keep changing backends to the new system and ensure that metrics were unaffected
Having e2e and integration tests for the Golang project was of a huge help, since we could fix all differences using TDD.
Although we changed some of the implementations to take advantage of Go constructs, just a 1-to-1 replacement would have had a huge performance impact.
Having done this kind of migration/rewrite multiple times, the way you succeed is starting with acceptance tests that verify functionality from the API layer that is implementation agnostic.
After the tests are in place, you break off small portions into microservises and ensure tests pass.
You pass a small percentage of traffic through the new arch, fixing bugs and leveraging telemetry.
You eventually slide all traffic to the new architecture.
You want to get something receiving traffic asap as to start getting feedback. Often this means taking something smaller or simpler out of the lagacy codebase first.
Doing a complete, side by side replica is a recipe for disaster. Think MVP. We've even introduced traffic routing based on feature sets so you can route users who don't use edge case features the new arch which has yet to incorporate the those features while keeping others on the old arch.
It looks like I need to write a followup with more details about how our migration looks!
Since we're using GraphQL federation, we have a gateway which serves up our complete GraphQL schema, pulled together from a collection of services behind the gateway. We can move individual properties over from our Python monolith to new Go services and the clients will never know. Plus, we can do side-by-side testing in the gateway by making a request to the monolith and to the new service and comparing the results (something we don't have yet, but plan to).
This is definitely not a big bang rewrite. It's about as incremental as it can be, because we can move individual GraphQL properties over and the gateway stitches the result together.
I’m thinking that defining parts that can be moved to separate services, and start consuming these could be a way to organically transition to a new architecture.
Sounds nice in theory, but won't you always be playing catch-up? Or will development(/bugfixes) halt on the existing python version? Would it ever be acceptable to the business to commence such a rewrite without any show of value until after completion?
(Speaking as someone working at a firm that did choose (before my time) to start a ground-up rewrite than ran about 3 years over estimate).
It is crazy to see they are still using python 2. Seeing how slow the conversions to python 3 have been, was creating python 3 a good decision for python community? Can it be argued that developing python 2 further in a backward compatible way would have been better for the community? I know that evaluating this kind of thing is hard as metrics are bound to be subjective and speculative. But I am curious if there was any serious attempt to figure it out.
> was creating python 3 a good decision for python community? ... I know that evaluating this kind of thing is hard
I don't think there is any question here: Python 3 is a complete disaster. Years and years of engineering effort wasted on changing string libraries. Sadly, the Python leadership refuses to acknowledge the failing, perhaps because such an acknowledgement would challenge their omnipotence ... it would, and it should.
> Now, in 2019, Python 3 versions are dominant and the Python Software Foundation has said that Python 2 reaches its official end-of-life on January 1, 2020 , so that they can focus their limited time fully on the future. Undoubtedly, there are still millions of lines of Python 2 out there, but the truth is undeniable: Python 2 is on its way out.
The Python 2/3 split is by far the most annoying thing about Python. I don't develop software in Python but about half the time I've had to use a Python library or program the problem of 2/3 incompatibility has cropped up. Some projects don't make it clear whether one or the other is required, leading to further confusion.
If anything the Python 2 EOL could make a bad situation worse. Like Khan Academy, each Python 2 package maintainer will be forced to make a decision: move to Python 3 or abandon and maybe move to an entirely new language. It think many will choose to abandon, leaving these packages to rot.
Second on the list are the multiple package managers (or things looking like package managers).
Third on the list of annoyances are native extensions, driven by the poor performance of Python itself. These extensions make it difficult to use certain libraries across operating systems.
So as a non-Python developer I don't look forward to the occasions when I must use a Python-based piece of software.
If you're installing a package via the package manager, it will very quickly tell you if you can install it on your specific version of python. Unless you're downloading some rather obscure and un-loved library where the author didn't explicitly state which versions of python they support.
Multiple package managers: There has only been 2 big ones from my long-term general usage of python. Easy-install and pip, the former of which is falling in favor but still semi-supported. Pretty much everything runs off of pip. What may be confusing you, and does confuse me at times as well, is their naming and the installation instructions as provided by library authors. E.g. some say python setup.py -install others just tell you to "pip install" it. Some would say use "setuptools", etc. Other would tell you to use things such as "conda" or "anaconda", pipx, and to create virtualenvs. All secondary, but things that should not ideally distract you from just plain using pip.
3. This has also been getting a whole lot better in the last 5 or so years. Microsoft has been funding dev-time to make the ecosystem for python (including extension compilation) much more pleasant in the Windows space. Also, the package managers and library authors are doing a whole lot better in that binary distributions are much more prominent so the compilation of the extensions never has to happen on your machine.
On a personal level, I have the same problems with using Python.
I've become enamored with package management in Go — not perfect yet but efficient, simple, to the point. The backwards compatibility enforced at the version level is also great — you often find Go code years old that keeps running just fine. I like things that you set up once and may just forget, that's where real productivity is found imho — it doesn't matter that I can do X in 1h if I have to do it every other day, I'd rather spend a full week or even ten, and solve it forever.
I think Js is comparably simple but I've heard so many horror stories about dependency management that I just don't know — I've yet to use Js in prod myself at work and I don't look forward to this day.
Here's the thing: it does not matter how great a language may be while I'm writing it, because that's 10-25% of my time; what matters is that everything around, from setting up dev environments to shipping passing by devops, especially as a one-man/small team, can be done "simply enough". And that, IMHO, is where Go is miles ahead of most other languages from a philosophy standpoint.
I tend to feel very positively about Rust for it looks to be an extraordinary intelligently lead project, with comparably 'real' benefits that extend beyond the code page (but I've yet to use it myself to confirm first-hand).
We see the importance of these topics so clearly with Py2/3: none of the problems between these two have anything to do with what's in the code, with programming; all of it has to do with the ecosystem, with the real and much larger task of maintaining codebases, managing teams and deploying 'stuff' in ways that work with the environment (whether tech, people, knowledge, politics, what have you).
The move to 3 by Python has been a failure in that regard, and IMHO it rests on the shoulders of an entire community who chose to stick to 2 now regardless of what happened then. Well then is now and the result is chaotic.
I'm not worried about Python itself — the language is incredibly popular, especially in academia, and we need to double the programmers population earthly each year so that's a sustainable amount of new projects written in Python 3 every day. Old 2 projects will be but a drop 10 years from now simply because of this number effects of growing tech at such an insane rate (it's been about true since the late 1940's, uncle Bob has a great take on it in his latest appearance in The Changelog podcast).
Anyway. I can't wait for py2 to die and py3 to become the only Python.
To be more specific (for anyone else who hasn't checked the link yet), the "Lifetime giving" section has:
9 donations >10m,
4 donations between 5-10m,
20 donations between 1-5m
So looks like there has been at least 120m in donations!
As much as I personally don’t enjoy writing Go I really can’t fault them.
I still find it interesting that for a relatively obvious feature set of fast compiles, fast startup and fast runtime there really isn’t anything mainstream out there to compete with Go.
I really hope something like Kotlin, Swift, ReasonML or even AOT JVM/.NET brings something to the table soon. Or perhaps I’ll just have to wait for WASM to really take off server side.
in production is fast startup really such a boon outside of serverless? especially if you're already doing blue/green deployments, doesn't seem like it'll have much impact.
(depends what "fast" vs "slow" means - are we talking about milliseconds vs a second or two, or startup times so horrendous they cripple your devs' ability to iterate and tests?)
fair enough. you say in the blog App Engine has worked well for you and you're sticking with it, so i'm assuming you considered moving to traditional servers but found it unappealing?
Yes. Google Cloud now has multiple options for autoscaling servers (App Engine Standard, App Engine Flex, and Cloud Run) with the biggest differences being how they're deployed and specifics around the scaling.
We _could_ manage our own Kubernetes clusters and such, but Cloud Run is pretty similar to that and takes away all of the management headache. There is essentially zero code difference, should we decide to change our deployment strategy later.
We're using Google Cloud Datastore for persistence, and that automatically scales in both servers and storage, so it has worked out nicely for us as well.
GraalVM looks to bring fast launch, at the expense of long-run performance optimisations from JITting. For FaaS-y purposes that will be a sane tradeoff, for long-running services the startup overhead is amortised over requests.
I guess I’m jumping to conclusions but it seems like a lot of lessons have been learnt since JVMs and .NET came onto the scene, and that WASM runtimes and the future languages that target them will prioritise speed at every stage.
Go already cross-compiles to WASM, so if desired, Go code can be run via WASM. But on the server, you probably rather want to run the Go code natively. For the client, this should be quite interesting.
Seems like such a waste. Is switching to python 3 really that hard? Is hardware that expensive? If this is indeed the right call it doesn't bode well for traditional scripting languages as the web scales to fewer high traffic apps. We might start to see more jvm, go (apparently) or even rust and c(++), rather than speed of development languages like Python or Ruby. Trend seems to be the reverse though, with python the second and most rapidly growing language.
Everyone’s project/code base is different but in my experience there’s been a critical mass of libraries for a few years. I presume the “it’s hard to move to 3” is dev teams wanting a new toy as much as “the rewrite is too complex”. Library use, size of code base etc are all big factors but at the end of the day, I think team motivation is really the deciding factor.
That mirrors my experience as well. Someone with influence is bored or wants to level up, so they'll drag the entire company into a long, expensive quagmire.
Unless the existing codebase is mired in technical debt and completely unsalvageable or cannot scale further, this seems like a very radical move.
I worked on a fairly large codebase that needed to be rewritten from scratch when migrating from 2 to 3, primarily because all the tests were written using a test framework that was no longer maintained. So given that you might need to start over anyway, I think it's reasonable to consider other options. That said, yeah it's difficult to understand how KA's web server costs aren't already basically zero, and how their endpoints aren't already basically instantaneous.
> That said, yeah it's difficult to understand how KA's web server costs aren't already basically zero, and how their endpoints aren't already basically instantaneous.
I find that this is the outsider's view of a great many products. Things always seem a lot simpler on the outside.
In Khan Academy's case, I think a lot of folks just think of our site as being a collection of more-or-less static pages with videos on them. There's a lot more going on than that, though. We've got a CMS that supports articles with math and interactive elements, in addition to the videos... and many, many exercises with hints. All translated into dozens of languages.
We have to remember every exercise people have done so that we know which ones to present to them next, and we need to display that progress when they look at topic pages. Oh yeah, and if they're in a classroom, we need to present that progress to teachers (or coaches/parents, outside of the classroom). Teachers can also assign content.
Plus, there's the official SAT prep, which connects to the College Board directly to provide personalized guidance about what to work on... and that's only one of the test preparation areas of our site.
And, as you can imagine, there are a bunch of other features and aspects of the features above that I'm not mentioning. It adds up.
Fair, but what percentage of spend is actually on web servers as opposed to database, data transfer, static asset storage, caches, CDN, etc? The features you listed are kind of what I expected, but I still wouldn't expect the web servers to be more than 15% or so of your hosting costs. I know you guys get a ton of traffic, but on most web sites at least 90% of traffic is logged out and doesn't even need to hit the web servers in the first place.
I don't have recent numbers in front of me, but I believe our web servers are more like 40% of our hosting costs today.
Over the past year, we've started leveraging our CDN (Fastly, who have been great) a lot more. That said, for us a lot of logged out traffic still carries the weight of logged in traffic. A logged out user can start doing math exercises and we'll keep track of what they've done. If they then create an account or log in, that activity is associated with their account.
Khan Academy may look like a content site, but in many ways it's more like a "learning app".
According to their 2018 accounts, 'information technology' costs were $5m. Salaries were listed separately at $29m so I'm guessing the $5m was mostly servers.
This is my experience too. Someone or a couple someones on the team decide they want to try out some new tech or expand their resume. Then it becomes a quest to justify the switch rather than a quest to make the best business decision.
What's really the ergonomic difference between "traditional scripting languages" and Go? I came up writing professional C code, and spent most of the last 15 years writing "traditional scripting language" code, and Go feels a lot closer to scripting than to C to me, despite compiling down to machine code.
Go has static types, and that distinguishes it from Python, Ruby, and Perl. But the trend now seems to be for languages to move towards static typing anyways; 2005-era Python was wrong about that.
It’s usually the DSL argument. In Go you can establish calling patterns for errors, but you’re really limited in terms of providing libraries with nice APIs that prevent you from making mistakes.
In Python you have a lot of ways to make sure someone does a thing. Exceptions are good for making sure an error is handled. Context managers make sure a resource is cleaned up properly.
I might be wrong but I feel like writing something like jquery (with its fluent API) would be really tough in Go.
That rise has been entirely accounted for by machine learning and data science. Python as a language for actual software engineering has been slowly dying for a while now.
I've worked with or known about too many places that have jumped on the microservices bandwagon and the only thing I've encountered is a lack of maturity (mine included) and a knee-jerk reaction against monolithic code, when the problem isn't the size of the codebase but its organic growth over time. Go is an excellent language for it but distributing your business logic and functionality over a network fundamentally changes the behaviour of your app and how you have to think about it; you can't just tear out bits of the monolith and make it an API.
I've come to some fairly comfortable conclusions:
1. If your team is small; don't do it. The mental overhead of that architecture will bring your team's ability to deliver down to it's knees.
2. If you've got a long-lived app with lots of legacy code; don't do it. You will have to maintain and add new features to the old codebase because it's easier, which means you have more things to rewrite.
3. If you're small-ish/mid scale. Don't do it. Kubernetes and similar are tools for people handling Facebook/Google/Netflix kind of loads.
4. If your organisational structure doesn't fit (i.e. you don't have enough people to split into smaller teams); don't do it.
Scaling is considered a good problem to have, right? That means that your current, ugly monolith is actually successful, and somehow the first thought is to replace it all with a complete—but trendy—unknown?
If the engineering team is so unhappy about the architecture then Go is the wrong choice. Maybe they could consider service oriented architecture and some refactoring/tech debt time so they feel happier about that codebase. Pull some of the code out into modules and then start figuring out where bits of the codebase would actually belong, while still being one deployable.
And then after that, if you really want to, distribute it over the network, and then start thinking about porting it.
Otherwise, throw away your first successful prototype at the first instance and go all in on distributed architecture and microservices. At least then you have the luxury of figuring it out from scratch.
> I've worked with or known about too many places that have jumped on the microservices bandwagon and the only thing I've encountered is a lack of maturity (mine included) and a knee-jerk reaction against monolithic code, when the problem isn't the size of the codebase but its organic growth over time. Go is an excellent language for it but distributing your business logic and functionality over a network fundamentally changes the behaviour of your app and how you have to think about it; you can't just tear out bits of the monolith and make it an API.
This seems like a response to the blog post as opposed to the parent comment.
We absolutely recognize how added network boundaries changes the app in big ways.
One thing I wanted to mention: we're _not_ going the Kubernetes and service mesh sort of route because our experiences thus far show us that there's still a lot of rough edges. We're sticking with App Engine because it generally just works. Scales down essentially to zero and scales up well with the traffic. So our services are all going to individually be running on App Engine.
Plus we're not going "micro" with our services. They're each fairly decent size, own specific parts of our data, and are owned by specific teams.
It's almost religious. The reaction some people have when you suggest monoliths is completely baffling. It's apparently just "known" that it is the correct approach to all problems, so, y'know, it's embarrassing for you to have even suggested otherwise.
The best systems I've worked on have all been well architected monoliths. The code is ugly as fuck in some places but that's what tech debt is.
If you release a bug to prod in your monolith, it is exactly the same as a dependent microservice releasing the same and bringing the cluster down. You get a crash either way, and at least with a monolith you're not spreading your call-stack over the network; it's all in memory.
In terms of general compute speed Go is many times faster than Python - approximately on a par with Java speed-wise but with much less memory use. That means your server costs are many times cheaper and your page latencies are much faster than Python. It's significant.
In the past companies would just compile python down to c to get the memory and perf they need. Probably would be the right answer here too, but that would not look as cool on the resume.
Moving compute intensive tasks to C would speed up many programs. That is the reason that many dynamic languages have overall good performance - they rely on C libraries for the computative heavy lifting.
This has one big backdraw however. Not only do you need to write in two different languages, the rewriting in C requires a lot of care, as the language protects you much less than the high level language you implement for.
One big attraction of Go is, that it is high level and productive enough, to be the main implementation language, and for time critical stuff very efficient. So you don't have to cross language bareers to implement speed critical code and you get the full type and memory safety in the whole stack.
Interestingly, one can write Python extensions in Go, so in most cases that would be my choice these days for speeding up critical code paths in Python.
For any web-based app, it really makes sense to take the approach of using a rapid development language first, then as you need to scale, convert to something that’s compiled and focuses on speed. It’s not one or the other kind of thing — they both have a role (at least until we reach the holy grail where fast to develop is also fast to run).
Like one of the former engineers at Twitter said about their early issues with stability and when someone blamed Ruby for the performance issues - short version it was a bad architecture not the language.
Stateless web servers are one of those things which are ridiculously and easily parallelizable, you can scale a web server horizontally easily. The ROI of rewriting everything in another language as opposed to just adding more web servers would probably take years unless you’re running at a ridiculously large scale. For the cost of one developer’s fully allocated salary, you can throw a lot of hardware at performance issues with web apps.
I prefer statically typed languages, but performance isn’t one of them. Besides, how much processing is a typical web app doing?
What's a "rapid development" language? I know about RAD, but that was a buzzword that was only ever used to push terrible languages like Visual Basic, and somewhat less terrible ones like Embarcadero Delphi.
Dynamic languages (so you can hack together stuff and quickly bypass any roadblocks), with REPLs (quick feedback and avoiding writing tests), and low cognitive overhead (so new folks can ramp up quickly).
Some example languages that come to mind here: Python, Ruby, JavaScript, Clojure, Groovy.
Go actually comes close here, even though it doesn't have a REPL and isn't very dynamic, because of its focus on minimizing cognitive overhead and getting the job done with minimal fuss.
Other languages carry a lot of community baggage. Java is one of the worst IMO...if you try to hire Java programmers, it's going to take a lot of effort and risk to find and reject applicants who've read too many design pattern books, are architecture astronauts, or come from an enterprise-y background. The signal-to-noise ratio is just really poor.
I _like_ Python. Even released a Python web framework. I think there are many projects it's a good fit for, which is why it's continuing to be quite successful, despite the pain of Python 3.
But that doesn't mean it's a great fit for all projects. Personally, I've come to find that code in statically typed languages is easier to maintain over time, especially from a big team. I guess a lot of Python folks agree, which is why Python 3 allows static typing as well.
At a certain point, server costs _do_ add up to real money and some applications are not purely database-bound. Go's tooling makes it almost as fast to work with as a scripting language, but with much better performance. The language itself is certainly not as succinct as Python, but I think it has made reasonable tradeoffs.
Also: there's already a lot of JVM on the web.
Finally, I'll just note that _not all_ Python 3 migrations are that hard. It depends on a lot on the libraries used.
I don't think upgrading to python 3 will be as hard as rewriting the entire thing, but rewriting does come with the benefit that you are not stuck with the problems that come with dynamic typing in a huge codebase, and i'm assuming this is why sticking with python is hard.
They also said a faster language will improve their server.
Instead of rewriting with static types, they could just gradually add them – Python 3 supports static typing[1] with the actual type-checking done by external[2] tools.
This is perhaps a very big misconception with python usage. Just because it has "dynamic" or "duck" typing, doesn't mean that you have to consider chaotic and unpredictable data running through your code paths. It's just not the case.
In a typical codebase, it's probably 98% very specific and known data types linked to the variables in your code. With the remaining 2% being things that are just "easier" to solve with dynamic typing rather than coming up with complicated interface/inheritance hierarchies that you typically find in compiled languages.
And with type-hinting now being there in python, you have a very good way of "codifying" that dynamic or duck-typing. Such that you can expect almost 100% knowledge of all the data types coming in/out of your classes/functions. At this point, I'd argue it's got one of the most robust "type" systems out there, if one can call it that at all. Just don't use text editor + MyPy, or VSCode for your python development, and you'll be in good hands. I.e. Use PyCharm.
Honestly, at this point mypy is inferior to the built-in PyCharm when it comes to speed, integration and in some cases the type-inference as well. I've also compared it to the VS python language server and it too doesn't stack up.
The other metric would be what a person coming from established, full-fledged IDEs such as VS would expect. With PyCharm, you get intellisense almost on-par with what you get from VS for C#/VB, assuming you use type-hints that is.
Not trying to be difficult, but I'm being honest about them providing a really good python experience that is for the most part free. There is no need to putz-around with VScode, json configs, plugins, mypy, etc and still end up getting a relatively inferior experience. Doubly so for new-developers.
Hardware is cheap. Hardware is on the "accessible to a 3rd world middle class person" level of cheap.
But with enough scale, it adds up, while the costs of a rewrite don't. And the difference between a language like Go and one like Python is on the hundreds of times.
I think it's more that given their specific codebase it's similarly difficult to switch to python 3 as it is to switch to a number of other, entirely different programming languages. Once you recognize that rough equivalency, then it's worth considering the stability, compile times, necessary production resources, etc of those other programming languages.
If your project's specific dependencies are so intrinsically stuck on a python 2.x implementation you might be caught between having to redesign that dependency in-house or switching to a language where you wouldn't need to do that in-house work.
But that's almost certainly not the case. Even if their existing codebase relies very heavily on the small subset of Python2 features that require manual porting, switching to Python3 will be much less work than rewriting everything in a new language.
We have the good fortune of not having to be super secretive about our tech. We _want_ to talk about this project as we go along and share what we learn. We've already got more interesting stuff to talk about, and it'd be a lot less interesting, I think, without this context for the overall project.
As mentioned in the post, a small piece of our GraphQL schema is already in Go running in production. This blog post isn't just "we're thinking of doing this thing". It's "we've already done a bunch of research, thinking, _and_ built some of it."
Indeed, that's true. I just think we have more to gain by talking about what we're doing and seeing what the community thinks about some of our approaches rather than keeping it all private for the next year.
In my experience it's more annoying than you think, especially if you want to prevent small sneaking regressions.
Python not being statically typed also means all of the breaking type changes they made with string now being byte or similar means crashes are not revealed until possibly production if your unit tests didn't get that oneeeee edge case right.
If I'm going to do a new project in the future, I will demand it will be a statically typed language. I'm sick of dynamic languages.
Switching to python3 is hard, especially if you have a massive python codebase interacting with other systems. Its not as easy as importing unicode_literals. Unicode breaks in very subtle ways.
and rewriting it in another language will magically fix this and not introduce new bugs? besides, in my experience Python 3's clear separation between bytes and str makes these breakages much less subtle than it silently going wrong in Python 2.
i wish them all the best, but would've been much more impressed if they'd done it, not simply announced to do it.
> in my experience Python 3's clear separation between bytes and str makes these breakages much less subtle than it silently going wrong in Python 2.
No question Python 3 is better than 2, but it's not better enough to justify the move. People will only move when they absolutely have to. That isn't progress, it's inefficiency.
The question isn't, is porting Py2 to Py3 easier than porting Py2 to Go. The question is, if you spend the same effort on porting Py2 to Go that you would have spent porting to Py3, and that only gets you 60% of the way there, but the ROI on that work is much higher, are you better off?
Yikes! It's a lot of effort to reduce memory use. They might be better off creating a new Go entrypoint/server that can call into CPython to reuse all their existing/tested modules (treat their Python as a microservice called by Go). They could then use Go to create/call new microservices or replace various routes on a selective basis.
I think the real problem is that they didn't properly maintain their code. Rewriting it in Go won't prevent them from dealing with this in a few years for when this Go version reaches end of life. I would have liked to see an article on "introducing process" side of programming.
Thanks for the comment! A couple of things about this...
1. the Go team is working very hard to ensure that there are no such compatibility issues. Code written for Go 1.0 should still compile with Go 1.14 beta today.
2. It's possible there's more we could have done along the way, and I tend to think that statically typed languages make it easier to safely refactor more ruthlessly. But I do think we've actually done quite a bit of change incrementally along the way. Our move to React on the frontend and GraphQL on the backend have been good examples of that. Plus, we did a huge refactoring a couple of years ago to draw better boundaries in our monolith, and that has made a move to services possible.
Unpopular opinion, writing Go is faster than Python. With the compiler, strong typing, and no versioning hell, I'm much more productive in Go.
Whenever I use python I run into problems with versions and dependencies. And the whole community just tells me to use pyenv or virtualenv and it will "fix all my issues". Only it doesn't.
Just as a counterpoint, I'd say that using Python can IMHO be similarly productive than Go.
Regarding the dependencies, you have tools on top of virtualenv, such as pipenv/poentry, which handle dependencies, and are easy to use. Biggest issue that I've encountered would probably be when two or more dependencies require the same package, with no intersect between supported versions. I don't think Go handle this any better, thought.
Type hints (and mypy for static type checking) are a must, and coupled with a good IDE, they really improve the productivity. I'd say that mypy's type system is more advanced than Go's, but it strongly lacks in type safety (due to the fact that majority of the libraries are not taking advantage of it yet, and that Python is still a dynamic language by its nature, and there is no runtime type checking).
All these great things that Java fossils like myself have been telling Node and Ruby hipsters about. Of course they won't be caught dead writing in a language their parents use.
Same. Moved from Python to Go, don’t write python much anymore.
It’s ridiculously easy to build things in go. The default tooling works just great. It’s a nice fit with docker for building tiny containers.
Perhaps the nicest thing though is how easy it is to write fast http servers. The default server is pretty good, but there are also so many choices for faster http server frameworks. Middle wares are easy to write and share. I can’t say that I truly understood how http servers worked until I started using go.
Yes. My current estimate of the cutover, for myself, is about three weeks of solid, 40hour/week development. After that, my Python (or other dynamic language) starts the process of seizing up, where instead of rewriting some module I just put a little hack in there to make it backwards compatible with other code, since I haven't got a great way of being quite sure what's calling this code, so I use a __setitem__ or have a function that takes "a thing or an array of that thing", and I find myself increasingly reluctant to refactor the Python.
YMMV on the exact number, but that's been my experience several times now.
I know it can be done; I've seen it done, I've done it myself. But refactoring without even the rudimentary static type system Go has just becomes an increasing nightmare at scale.
And I use unit testing in Python, etc.
But, flipside, yes, Go isn't a great language for just bashing a script together in. Maybe not the worst, with a bit of library work, but not a great language.
The recommendation should be to use pyenv and virtualenv: pyenv for installing the Python versions you need for different projects and virtualenv for creating an isolated environment for each project. Using this setup, I almost never run into dependency issues.
I don't know much about Go. How does it avoid dependency conflicts?
I just cannot understand how rational engineers would choose untyped languages like Python or Ruby for large scale systems. Humans make mistakes - even very smart humans make frequent mistakes - and blast radius grows with the size of the system.
The article mentions Go's superior compile time, when compared to Kotlin. I have done a lot more Java development than Kotlin, but my recollection is that both of them compiled fairly fast.
Is Go really significantly faster to compile for similarly sized projects?
It's mostly because most setups will use maven or gradle which do much more than just compile code. They will often pull down dependencies, check for bugs and style inconsistencies, run unit tests etc. If you run plain javac against a bunch of files it will compile just as fast as Go. In any case it will be in a blink of an eye.
Yes, it’s much faster than these java and jvm based codebases (including all jvm based languages like kotlin, scala etc). Go as a language is just so much simpler.
Although compiling overhead of large JavaScript codebases, that make up most of modern websites, kills any kind of benefit in improving build performance of some backend service.
Not only going from a monolith to microservices but also changing the language? This is a mistake that rookies make. This will be one of those post-Mortems where they will sheepishly admit they bit more than they could chew, and it wasted years of productivity.
There’s no reason to move to Go. Stick with Python for now. Migrate safely to python 3. Once everything is stable, start breaking things up into thrift or protobuf services. They don’t even need to be microservices but you need the contract. Once that is stable migrate to whatever language you want. But at this point you will have the well-defined api and test cases. Trying to do too much all at once is a no-brained disaster.
(Ironically, that post is about Netscape and the rewrite, Mozilla, ended up growing as a nonprofit far beyond Netscape's original scale.)
The so-risky-it's-almost-certain-to-fail approach is a big bang rewrite. Stop the world until the rewrite is done.
That's not what we're doing. A tiny bit of our GraphQL schema is _already_ running in production atop new services written in Go. We're going to do this step-by-step, while keeping the site running, _and_ adding committed features next year.
It's still a huge investment and comes with risks, but it's an incremental process and we'll be able to track the progress at every step.
So long as your data has well-defined interface boundaries on it, this is a good way to de-risk rewrites. It really depends on the current state of the codebase. If there's a tangle of dependencies, then they have to be cleared out or facaded away to a testable interface, or else that part of the port will be a shot in the dark.
And if you have the interfaces in hand, the language itself becomes considerably less important, since more of the code is subsequently dependent on your own system and not really the outer ecosystem. It's the projects that have coded "to the metal" on their existing platform that have the biggest issues with keeping up their flexibility.
Given that this is mostly about our backend systems, we've got one big interface in front: our GraphQL schema.
Our system isn't perfect, but a couple of years ago we spent a good deal of effort detangling our monolith. This plan wouldn't have been an option had we not done that.
The saying only applies when your product is software itself, not in most cases where software is just the means to deliver your product — here a website providing education in video form etc.
Besides, the reality is that most business software out there gets rewritten every 3-15 years (really depends on use-case and conditions, but on average 4-5-6 years is a good bet). After some time it's just not worth it to keep refactoring, you'd rather start anew with hopefully better tech and certainly with better knowledge of your problem — they say you should write everything 3 times to make sure you really nailed it.
In many businesses, these rewrites would constitute a new major version, more comparable to the feeling we always got in the waterfall era — new version = big changes, new UI, new stuff. That's when it's possibly lethal, if you really break the thing, and that thing is your product, not a means to it.
the number of failed migrations, modernizations, and "tech transformations" in non-software industries, as well as the number of consulting outfits and their profits doesn't seem to back this up.
we can argue whether these are "rewrites", but big changes can be rewarding but are inherently risky. balancing this is hard, and rewrites uncover and introduce the unknown unknowns.
that's fair, most of those are "you get what you pay for" and hardly the fault of a competent, well-meaning, but naive dev team. problem is it's much harder to get data on small businesses.
i think one thing we can agree on is that the only thing users hate more than change is breakage.
rewrites are a valid tool in a long-term strategy, just as debt is for finance. but for most people, incremental change has a bigger, smoother RoI. it sounds like this is kind of what KA is doing. although the timeline seems aggressive and the whole thing absolutist, being stuck on Python 2 is a risk now, too.
I think we agree on the general perspective, yes — we'd probably agree as a team with known (ad hoc) conditions and goals.
Data on small businesses is hard indeed. I'm only speaking anecdotally from the MSP / software shops perspective, the "tech guys" of most businesses who don't in-house IT. Also from a European perspective, so that might make a big difference — we're, ahem, let's say not as involved, interested, or capable in all things "technology" as a general population (I didn't say luddites but that's how it feels sometimes, compared to the vibe I get in NYC or rich Asian cities).
> Isn't there a quote somewhere along the lines of "full rewrites are suicidal?"
This is just dogma. Often it's the wrong choice, but sometimes engineers have very good reasons to re-write a system. As with all decisions, tailor the solution to the situation - not the other way around.
You can start porting your live codebase to python 3 right now - no downtime at all. You can have a 2+3 compatible codebase live in a week or two, it'll get you about 98% the way there [1]
As for automation tools, for 2/3 compatibility, I used futurize and huge codebase to great success: https://python-future.org/futurize.html. I started with this:
Also, wire Travis / your CI system to have your tests run on python 2 and 3.
If you want to test your python 3 codebase live on a subdomain / same SQL/NoSQL DB, be careful about jobs/tasks! Pickle version mismatches and stuff. Use a separate redis/whatever DB for the deployments.
I appreciate all of the links and the fact that, for a lot of folks, this is the state of Python 2 to 3 migration today. What you're suggesting is, imho, the best path for most (and I appreciate you mentioning the pickle incompatibility because that is a thing which would trip people up when trying to make the move).
If we would have been able to have a 2&3 compatible codebase live in a week or two, we absolutely would have done that. Incompatible changes in some libraries we use, App Engine first gen to second gen changes (which are for the better, but still a big deal), the choice of storing some pickles permanently, plus a need to really verify unicode handling all over the place (especially in a 10 year old codebase), and other factors that aren't coming to mind at the moment mean that this is not a couple week thing for us.
Moving to Go is more work than moving to Python 3, but in our particular case it's not as much more work as people might expect.
Thanks for your reply to my reply! Sure, it may take more than two weeks, and maybe golang is a good choice for your stack. Maybe a better way of saying it is python 2/3 compatibility can be gradually adopted while maintaining.
Also, I can't speak for the technical specifics of your project, but I would like to speak a bit from my prior experiences:
> Incompatible changes in some libraries we use
What are those libraries? There may be python2/3 forks available on pypi. For instance, on Peergrade we went from pyPdf -> pyPdf2, boto -> boto3 (that one is a lot of work). We were still able to stick on the python 2 codebase, but gain python 3 forward compatibility.
In the case of very specific packages, we had to do forks. We had to move some patches from a python 2 only library and port them into a python 3 only library, then do one of those version constraint things.
> App Engine first gen to second gen changes (which are for the better, but still a big deal)
I'm not familiar with app engine so can't speak to it.
> the choice of storing some pickles permanently
Can you clarify this?
I can't speak for it without seeing it, but would it be possible to serialize the data to json, use python import strings (https://devel.tech/tips/n/djms3tTe/how-django-uses-deferred-...), then write a migration to port the old pickles over, and have something more portable?
It's possible, depending on what's being stored, it'd also help you migrate to golang if the data stored in it could be consumed by a go service.
> plus a need to really verify unicode handling all over the place (especially in a 10 year old codebase)
Yes I had a lot of pitfalls with this one. Areas to look out for are hashing functions. They are very strict in whether they're dealing with bytes or strings.
Are you already using unicode_literals? Those can be implemented gradually in a current codebase.
How is your test suite? Sometimes having test coverage, even naively, can also act as a smoke test to catch unicode issues.
Selenium tests can get a lot of coverage for very cheap to verify behavior at a high level.
> Moving to Go is more work than moving to Python 3, but in our particular case it's not as much more work as people might expect.
I think the services + golang part is smart, and can't speak for the specifics of your codebase.
I will say in hindsight, I feel the effort / hours I put into upgrading a Python 2 codebase -> 2/3 and eventually 3-only made it much easier to breath. Cleaner syntax, no unicode headaches, and no need to have the lingering prospect of a language exodus looming over the head.
One negative aspect we haven't talked about those with "gradual" python 2/3 migrating is breakages. There are refactors that end up being done that when pushed out risk breaking - and when they're purely internal code changes. There is a business case to eliminate the tech debt because the value equation: A python 2 codebase, even with warts to it, is meeting an EOL in the next few days (https://pythonclock.org/).
Even assuming golang + microservices is the final destination. Amortized, on the projects I've been on, moving to python 2/3 (or 3-only) paid back. If there's opportunities where the python codebase could be split into apps / separate wsgi entry points, then have golang services replace them later - that could be an option. Even without golang, the (probably huge?) refactors involved in moving to python 3 and being "(micro?)service ready" has benefits.
We did do a bunch of refactoring a couple of years ago to draw boundaries within our monolith. We almost certainly wouldn't have made this choice had we not done that work. We're also doing work in Python to smooth the way for this change. For example, we're migrating our remaining REST endpoints to GraphQL in Python before making the move to Go.
Regarding the pickles: we are essentially doing what you're suggesting. We're going to migrate them to JSON. This is one of those things that is going to be a pain to do and we'd have to do it whether we're switching to Python 3 or Go.
Also, I'll note that the Python 2 EOL is only half real. There's so much Python 2 out there that, while the PSF is no longer supporting it, there will be people supporting it as needed.
I do agree with you about Python 3 being much cleaner. But we have to do significant rewriting of our devserver, a lot of our data access code, and some other libraries that I don't remember offhand. Again, it's quite specific to our codebase. People with a "normal" Django app are unlikely to have such issues.
Ok this is going to sound ignorant, as my only experiences in backend services have been Go and Python. I don't like either. Is there something I'm missing? For simple CRUD apps, both are sufficient. But (in my limited experience), the moment I've wanted to create more complex business logic with stricter constraints, neither has been quite up to the task.
Go doesn't make things easy. It asks you to repeat yourself. I don't like the lack of basic functions like a generic map / filter function.. I know Rob Pike just says "use for loops", but it feels so unnecessarily unexpressive. When I see map, I know what's going on almost immediately. For loops take more reading to understand. Nil pointers shouldn't be, yet still are, a thing (developers aren't perfect - why can't the type system help?). It feels like a straight downgrade from Python from a code clarity perspective. And it's typed, sure, but the type system doesn't let me express constraints that other languages allow me to do that would prevent entire classes of bugs. It doesn't feel worth it compared to Python. Yes, the core language ends up being comparatively "simple", but simple building blocks doesn't guarantee a simple overall system. And my company is very diligent from an architectural perspective.
But then when I look at Python, I'd rather just use Javascript with Lodash, esp. when it comes to the treatment of functions as first class objects[0]. Throw Typescript in there, and you get, in my opinion, a better type system than Go, so unless language performance is a major constraint (which it hasn't been for my company, our DB usage patterns it the biggest thing instead), why would I want to use either of these rather than Typescript?
[0] Edit: Dumb and wrong, I meant its treatment of anonymous functions. I don’t like lambdas.
I have a lot of JavaScript history, but if I were optimizing for the things you want to optimize for, I would absolutely pick Kotlin above JS+TypeScript. It's got bunches of language features, including superior concurrency, and runs on a faster runtime to boot. Unlike TypeScript, it doesn't have a 20+ year old dynamic language at its heart.
In backend systems, I think that the overall architecture and management of data are much more interesting problems than the code. I feel like Go helps direct more of the thinking toward the overall rather than creating beautiful abstractions in the code.
You had me nodding in agreement until the last paragraph. Functions are first class in Python. You can pass them all over the place and use them the same way you would in JS. You don’t have Lodash but frankly it’s not needed. The stdlib, functools and itertools are pretty much all you could ever ask for.
This is the silliest thing I see paroted. Anonymous functions are there for one offs. Sometimes a one-off is two lines. Breaking it out into a separate function is ridiculous.
You end up seeing functions declared within functions so that people can work around this limitation and it is grotesque.
If you need to express something complex enough that you can't do it in a single statement in Python, it deserves to have a name (or more likely, it already has one in the standard library).
But python for loops can and typical do have multiple lines, yet that “block” inside the for loop doesn’t have a name. The fact that the body of a for loop can have multiple lines but an anonymous function cannot is a purely arbitrary syntax limitation, and the principle you stated ought to apply (or not apply) equally to both use cases.
This doesn't follow: for loops are there specifically for cases of the complexity that can't be handled by other constructs (like comprehensions). Anonymous functions are not, for the complex cases you give the construct a name.
It's a very intentional language limitation, much as semantic whitespace is an intentional limitation.
And there is actually a language difference. A for loop is a statement. A lambda is an expression. Python doesn't let you put statements into expressions.
I've sometimes wondered whether code would be universally clearer if all you could do in a loop is have a one-liner or call another function/method (mainly when looking at my own code within loops and thinking WTF!!!).
Would probably cause issues when teaching programming though. It would be interesting to have a compiler switch that enforced this...
What comes to mind immediately is that you would lose the ability to `break` unless you had another convention like returning a specific value to indicate you want to break.
Thanks for writing this, that clarifies the issue more clearly and concisely than what I had in mind.
I will mention a core aspect of my argument, to add to your point.
When you make a named function and then use it separately somewhere else, there is a loss of locality. The logic is now further from its point of use. This is a cost, and sometimes it is an unreasonable one.
i'd agree with this, except you can have a function within a function in Python to limit scope and keep it local. yeah, you have to define it before the code using it - sometimes this feels like a boon though, especially compared to huge inline lambdas. so over time, my annoyance with this has decreased, and it doesn't bother me anymore.
I've started to use Python type hints, but even with typing I like to label my names things that describe local concerns (types are often more general than what you mean to say), and sometimes I'll deliberately shorten the names to make a Python lambda fit where in another language with bracketing I'd just take an extra 1-2 lines and format the line differently.
I've written snippets or data structures one way or another as I'm sure many have, and in my experience the results have been mixed. This and the fact that reduce has no bearing on your app architecture, I do see this as arguing over a small and vague gain or loss.
Large anonymous functions are a thing in every modern language I can think of except python. It’s the only sensible solution in the case of functional method chaining, which is much more readable than nested comprehensions. Python just got this one wrong.
Python can't replicate the structure of this JavaScript code:
fooBaz(x => {
// multiple lines of logic
})
In python, you are forced to move the logic away from the point of usage, which means you lose some locality. This loss is sometimes not worth getting a "name" for the logic.
Of course, sometimes it is better to create a named function away from the point of usage, but I don't think that is always the case.
There are a lot of cases like GUI programming where you need a lot of handlers which are suitable for lambda because they don't have a good name, and should never be called manually, and often they have multi-lines of codes.
Nesting functions like this can have a bad performance impact because functions in Python are objects. Normally, all of those function objects are instantiated once when you load the module. However nested functions will be instantiated at runtime every time their parent is called, even if they aren’t used. This cost is perceptible in hot paths.
That’s an interesting principle, something like “a block of code that is complex enough to be written across multiple lines of code ought to be given a name,” but standard idiomatic Python already violates that principle by featuring multiple lines in the body of a for loop. The fact that the body of a for loop can be multiple lines but the “body” of a lambda cannot is an arbitrary syntax inconsistency.
That's a couple characters over most languages I normally use. Plus lambdas have a number of strange warts and restrictions (no multi-line lambdas?) that leave me feeling that Python really dislikes traditional functional programming constructs.
Python has become less functional over time. It’s also not a particularly good OO language either. It’s really a more descriptive sort of bash that became ubiquitous despite its short comings.
While functions are first class in Go, the limitations of the type system make them less ergonomic to use, I think. Without user defined generics, many of the common uses of first-class functions become a lot less convenient to use.
In typescript, if I have a bunch of "User" objects where each user has an "name" and "age" field, and I want to print a comma separated list of all those names formatted as "name -- age", it's simple. I write:
let userNamesList = let userNames = users.map(u => `${u.name} -- ${u.age}`).join(", ")
console.log(userNamesList).
This is possible in no small part because map is generic.
In go, to get the same level of expressiveness, I'd have to write the following stuff around it:
func mapUsersToStrings(f func(u User) string, users []User) []string {
result := make([]string, len(users))
for i := 0; i < len(users); i++ {
result = append(result, f(users[i]))
}
}
And after writing that boilerplate, I can finally write:
That boilerplate, of having to write a specialized map/filter/etc function with a for loop for every combination of types you transform between (mapUsersToStrings, mapUsersToAddresses, mapUsersToAges) is really annoying.
It's harder to point to many other cases because, simply enough, people don't write such cases. The fact of the matter is just that go libraries make very sparing use of first-class functions because the lack of generics prevents it from working well, and so we end up with an entire language ecosystem where code is harder to read and with fewer simple abstractions reused across it.
You can still write any code without good generics or first-class functions, you'll just have more programmers writing more code with less clean abstractions.
I dislike JS, but love TS. But really, who cares about me anyway?
The big picture is that JS is the English of software languages. It’s not beautiful, it steals from everyone else, and it’s never the best tool, but it’s becoming - more and more - rarely the wrong tool.
I expect this trend to continue. Some genius will get JS to compile to native bytecode (WASM is the intermediate step). And, it too will be neither the best language for the job, nor the wrong tool for the job.
The proof is in the pudding, nothing so wrong (JS) can be so right - and yet it survives, nay thrives.
As an aside, I find English to be a beautiful language. By stealing words from so many origins (Saxon words, Latin worse and Old Norse words), there's so much versatility.
The Saxon words are direct and burly. E.g. oak, brash, death, iron, etc. The Latin words are multisyllabic (e.g. multisyllabic :) ). The Norse words are just plain fun (e.g. Yule, law, heathen, oaf).
It's nice to be able to choose depending on context. Similarly, I find versatility to be nice in JS, as the community goes from embracing OOP to a more functional style.
Go and Python would have to be the two easiest to use and most productive languages I've ever used. Other languages might have more powerful features but Go and Python just get more coding done more quickly and more efficiently. Other people's personal tastes may vary - and obviously do - but you can't dismiss these languages when they're so obviously very effective for many people.
Python is NOT an easy language at all. Easy to get started with, but when you are deep diving into language features, it's starting to get WAY MORE COMPLEX than Go.
I had similar feelings regarding anonymous functions before, having come from a more FP background.
Ways to make your Python experience nicer:
- almost never try to use map or filter. List/set/dictionary comprehensions are more pythonic and will be easier in the long run
- learn about standard library stuff, especially the methods on dictionaries, as well as the collections package. If you’re thinking about lodash, Python tends to have replacements that end up being better tbh
- if you really want a local function, just creating an inner def statement is fine. But KISS should usually make this rare.
People here don't like C# for some reason, I write really complex business logic with it every day and it is not only easy to write, it is maintainable as well. My own JS code with lot less complexity is way harder to understand after a while.
Go seems to be optimized for onboarding new developers (particularly straight out of school) quickly, rather than for the long term comfort of developers using it. There are Rob Pike quotes that speak to the first part of that at least.
If I was a business owner, I'd love Go. But what I can't really figure out is why so many devs love it. It's far from the worst thing in the world, I don't hate it, but I just can't get excited over it either.
From what I can tell by reading what Go developers write, they like it precisely because it's not something to get excited over. It seems like the kind of language where once you learn it, you don't have to keep up with a bunch of blog posts detailing all the cool new things being added to it, and decisions over stuff like formatting are made for you. I primarily use Rust for hobby projects, and the steady stream of new features/libraries can get tiring. You don't have to use the new stuff in your own code, but if you want to use popular libraries, you'll probably be stuck using the new features. Making everything async seems to be the new hotness, and sometimes I wish the language would be nothing to get excited over.
As someone who writes — I mean prose, whether fiction or not (mostly not: essays, technical, etc) — I had to come to terms quite early with the dichotomy you explain here.
There is a place for the "beautiful language", the parts of it and the ways of using it that make it a pleasure, as a writer and as a reader. This, unsurprisingly, usually demands a whole lot of additional work on top of the 'direct meaning'.
Then there's a place, in casual communication, in business, in marketing, in essays as well, in technical docs, in speeches, in a lot of places, for the 'direct meaning', or close to that. The efficient use of language, when all form recedes in favor of meaning, of concepts, of getting that 'other' to get what you mean.
It's just that, in human language, you do it all with the same tools, we use formal or topical subsets of a vastly larger ensemble. In programming, we're more likely to use different languages [themselves subsets of human language if you think about it, but let's forget that for the sake of simplicity].
And there are programmers among us who love to dabble in the form, like some writers would spend 10%, twice, ten times the effort crafting just "better form" over an already well-defined idea/story. While other programmers, or at other times, just focus on getting things done. Cue the spectrum in-between.
So it all depends what we put in our code, as programmers, as human beings I guess. Is that thing a personal statement? Or is it just garbage code to temporarily expedite some roadblock? How do we approach complexity, bottom-up from the simplest elements/code, or top-down with the most expressive almost-meta entities? Maybe some side-line out-of-the-box angle? See how we'd word all of these, in human languages, as in code. We just wouldn't say the same, nor code (select languages) the same.
I don't know if I explained it well. But looking at it from a human language writer, it all seems clear now. The whole rat race of languages, the churches, the sheer effort put into form when meaning has already been solved 10 times by others, the strong NIH syndromes... it's all so common in traditional writers circles. We're all just writers, really!
I think there can exist a happy middleground between constant churn and what Go currently provides. Some nice quality of life features like pattern matching and list comprehensions would go a long way towards making it a more pleasant language, without making developers feel like they're in a constant struggle to stay current.
It also lends itself to patterns in code. One thing that learned about Python coming from Perl was there was a “Pythonic” solution. Go seems to take that idea further with gofmt and a small but powerful std lib. I can typically glance at how someone is configuring their http.Server and understand the intention of the code.
The "pythonic" way was a mantra that has been obsolete for a long time. With Python, there are many (unsatisfying) ways to declare dependencies, many ways to format code (I hope "black" will prevail), many ways to format a string, many ways to create a struct/record...
The lack of quality batteries included also leads to "there is more than one way to do it" through the choice of unofficial libraries. My experience is that the extensive standard library of Go brings more normalisation.
> But what I can't really figure out is why so many devs love it.
I think, in order to understand why so many people love Go, you first need to understand why so many people love C.
Go is C with most of the warts removed (for application development). GC, easy strings, maps, easier first-class function syntax, code formatting, modern standard library, easy concurrency, etc.
Yes, there are plenty of places that C can go that Go cannot - operating systems and embedded devices are high up on that list. But for back end application development, it's great as long as your application is of a moderate size and you don't need specialized data structures.
I have so much to agree. There is a reason, why still a lot of software is implemented in C. There are historic reasons of course, but also, because C is just powerful enough to do so, and that gives you a small language, which removes a lot of complexity. Of course, in the case of C, this comes along with the warts of lacking memory safety and limited type safety, which are responsible for a lot of bugs. Other languages like Pascal and Modula-2 did improve a lot in these respects. Yet, all of those languages remain simple.
That is the reason, I like Go so much. It is still a very simple language, but improves in the key parts of memory safety, having a GC, better type checking, and a few high level constructs, most of all having first class functions and closures. These enable many of the features of "higher" programming languages. You can do mapping functions over lists in Go quite fine.
Huh. "A simple X" is a wily phrase, because everything is exceedingly "simple" in the aspect its designer had in mind[1], but complex from other perspectives. C may be simple in terms of how you would describe it building on the interface provided by a typical CPU, but to someone who only knows Scheme or Haskell learning the machine model is no small task--I don't think the 683-page length of the C11 specification can be attributed to its authors being long-winded. I don't really see Go as having inherited C's simplicity-in-terms-of-differences-from-a-machine-model. If they were optimizing for the same kind of "simplicity" that C has, they'd have to be out of their minds to add garbage collection, strings, maps, lambdas, and goroutines. Go, like Scheme, Haskell, and python, is a different "simple" from C.
1: unless designed by a committee with no such single-minded aim
Defers, channels, inexpensive goroutines, a mostly consistent standard library, very good performance, garbage collection, being similar to C, strongly typed, and readability are the reasons why I've enjoyed writing Go exclusively for the past couple of years. Almost none of these is exclusive to Go, but the intersection of all these certainly is.
There's a class of problems (servers running a simple protocol) for what a basic type system, convenient green threads, and a lot of optmizations are all that you need for great results.
For those Go would beat anything that lacks convenient green threads (Type Script included, also Java, C#, C++), anything that is dynamically typed, and anything that is interpreted.
Go is mainly loved by devs who love to build products over to write beautiful code. Go is for product builders who use coding as a mean and not an end.
Most of the "means vs. end" dichotomy occurs at a level above choice of language. Every language permits both beautiful code and ugly code. Every language allows you to optimize. I write Go every day and I strive to write beautiful and performant code, for the sake of writing beautiful and performant code. Go may not allow you express e.g. quicksort as beautifully as in Haskell, but its simplicity and pragmatism lend themselves to a different kind of beauty.
I find the tooling to be very attractive, especially since modules came out. go test, go mod, go vendor, etc... Working in the Go ecosystem is mostly pleasant and easy.
There are two kinds of devs: Devs who mainly like the feeling of wll writing code, and devs who mainly like building products. Go is by the latter for the latter.
It's bit meant to be exciting, it's meant to be effective, efficient, and reliable.
The performance of Go is not particularly impressive. Especially when it is used to write busy servers with many simultaneous requests. And as a language it is not inspiring either.
Not sure why business owners should love it either. Not enough features in the language coding productivity suffer in a long run. Also if the developer needs Go because other languages are "too complex" maybe said developer can't produce proper code anyways.
C++ is not a narrow niche. We do see it pretty much everywhere and it continues to be the most used language for systems programming, games, embedded systems, and pretty much anything else where performance is critical.
That is a niche, because most applications are IO bound and not compute bound. What even qualifies as “systems programming” is ill defined. Far more critical business systems run on Java and C# servers than C++. If there are C++ components they tend to be small parts along the critical path and not the primary implementation language.
"most applications are IO bound" - there is a whole world of not IO bound software running on cars, toasters, airplanes, desktops, hospital equipment etc etc. I do not think it is any smaller then that niche of web spaghetti being churned out by undergrads
I disagree with the notion that most applications are IO bound only. This is something people often say uncritically, but in my experience is false. Just using a non-native Electron or even Java application feels very sluggish and when you look at the Waterfall on slow web pages, what's slowing it down is very often unrelated to "I/O".
Secondly, C/C++ is like the third or fourth most commonly listed programming language in job listings. If you think all but 2 or 3 languages are niche, that is not what the word means.
C/C++ is not a language. I am also someone who has use C and C++ for years as part of my work and have mostly moved on to TypeScript because there isn’t much reason to use C or C++ anymore unless you are in one of those niches where you need to still program at that level.
Most software problems are not about solving them faster, it’s about combining existing components in new ways and figuring out to orchestrate it all.
I don’t care about copy elision, heap fragmentation, perfect forwarding, when my performance is being lost in the communication between services. What I need is a better architecture and more scaling, not concerning myself with if this loop is being vectorized, or that object is being moved instead of copied, and other minutia which inevitably ends up wasting your time when writing C++.
I don't necessarily disagree that not everything needs to be optimized for performance, but I would just argue that the use cases for Typescript are far more niche than the use cases for C++. There's more to software than just web stuff
Most stuff is web stuff now. And I’m not talking about front end, we do a lot of back end work in TypeScript because node is lighter than the JVM which makes it a better choice for lambdas. I’d say it also has more sophisticated static typing than Java or C++, while also allowing dynamic typing in the few cases where it is convenient. Having the front and back end written in the same language also reduces impedance between teams. A lot of our tooling is even written in it now, deprecating many Ruby scripts.
Most stuff is not web now. There is software in everything everywhere not just web sites.
Niche does not mean "stuff I don't personally use at my job", but that is the only definition under which Typecript is not niche and c++ is. C++ in 2019 had the 4th most job listings according to Indeed. Calling that a niche is absurd especially in comparison to Typescript.
The thing about C++ is that you need to recruit specifically for C++ programmers in a way that you don’t need to recruit for programmers in many other languages. The barrier of entry for C++ is high enough that you can’t just take your typical developer and ask them to write good C++, it’s a language which requires far more effort to become competent in.
There are also many places which just ask for Java/C++ experience for no apparent reason. Amazon is like this, all of their job listings mention C++ but only a very small percentage of the code base is in C++. There is at least 10 times as much Ruby code and it is part of systems that most engineers will have to work with, but no job application mentions that.
Java killed it for general purpose use in the 90s. It’s never your first option unless you are in areas like games, graphics, some embedded work, or quantitative trading.
"All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?"
Go and Python are for getting things done effectively, not nitpicking beauty of code.
That's why the authors of YouTube built YouTube in Python, and the authors of Khan Academy (re)built Khan Academy in Go, instead of bickering about nitpicks on HN.
Can you elaborate your criticism of python? Other than the incorrect mention of functions not being first-class objects, there isn't much I can extract in order to respond to?
A for loop could mean anything, and not necessarily what the author intended. https://en.wikipedia.org/wiki/Rule_of_least_power leads to list.filter(p) always calls p O(n) and returns a list of the same type with some or all of the same elements. list.map(f) always calls f O(n) times and returns a list of the same length. p and f can be reused and only depend on a single element. A good platform will inline list.filter(p).head into machine code that exits a loop early, that you don't have to write or review.
Making them work with strong types would require generics which would introduce a lot of complexity to the language though. It's being debated for years.
It's pretty trivial to write your own custom map/filter/reduce though.
No, Go does not need this and I do not want it to get these useless "features". Go is simple and it gets things done with an amazing standard library. Use a different programming language if this does not suite you. Every programming language ends up with its own implementation of these "concepts" that it is hard to keep track. What is so difficult about for loops? It's simple for everyone and you don't need to learn new concepts because it works.
Fundamental? IMHO, they are just syntactic sugar on a foreach loop. Which in many ways is bad because it provides yet another way to express the same concept, without a significant difference in the method of execution. What am I missing?
Map and filter allow for a very concise and clear indication of the intention of the code, so it is much quicker for the reader to parse what it is doing.
Edit: Also, for loops are just a way of implementing a mapping, where you have data A that you want transformed into data A*. The map is the fundamental concept here, not the for loop.
Which is more generic, the 'for' or the 'map'? I think its the 'for' because it has additional capabilities contained in the base concept, which is why I would call it more "fundamental".
In fact in many languages much of filter() can be implemented in the for construct rather than its body.
There are two sorts of 'fundamental'. You are correct that the 'for' is just syntactic sugar for 'goto' which is more fundamental than map/filter in the sense that it translates into assembly/machine language, upon which all programs are built. Then you can say the silicon logic gates are even more fundamental, you can do even more things with them, etc.
The map/filter is more fundamental in a mathematical sense. If you turn a list of A turned into a list of A*, the 'mapping' is the fundamental concept, the 'for' loop is just an implementation detail. Maybe the computer did it in parallel, or in random order, or I asked you to turn a pile of towels into a folded stack of towels. The same way if I say 'want to watch a movie' - the movie the fundamental concept, whether it's digital, film, etc. is irrelevant.
They don’t have to be glorified for loops, even if they usually are. Map and Filter should be capable of running in parallel or asynchronously, and some languages provide this.
And many languages, have parallel for constructs. The openmp #pragma parallel extensions in C/C++ for example. Of course its possible to create constructs which provide the messages passing for clustered environments too.
And maybe that is part of the problem with a generic 'map' your not really sure the underlying implementation, is it parallel, clustered, serial, etc? So you end up with map(), parallel_map(), mpi_map(), etc, and how to you control the parallelism. Do you just let it default or do you have levers to control the batch & interleave. Pretty soon, its not such a simple construct anymore.
Do you mean a code comment could help the reader understand, or you want a comment about why for loops are less obvious than other more functional approaches?
> “Typescript is a hack on top of [a] [supposedly] server side language that is [awful]. ( Node doesn't even support int64 )”
"supposedly" in this context meaning Js is not worth being called a "server side language", in the commenter's opinion (citing lack of int64 support for example). This also implies in subtext that "server side languages" would be of a higher category/value than languages for whatever other use-cases.
I personally don't agree with any of that, just helping communication here.
> Go, however, used a lot less memory, which means that it can scale down to smaller instances.
I could be mistaken, but this sounds like they went ahead with the default JVM settings, where it tends to use as much memory it is allowed to (which makes sense from a utilization and efficiency perspective). If memory usage is a concern, the JVM can be tuned for such.
I have found that there can be drastic differences in Go performance in the particular way you structure the program. Writing Go code in a Python-like way is going to be less performant than if you run escape analysis every compile and make deliberate effort to stay on the stack.
> Writing Go code in a Python-like way is going to be less performant than if you run escape analysis every compile and make deliberate effort to stay on the stack.
The fact that people are doing this tedious optimization is strong evidence of the fact that Go needs a generational garbage collector with bump allocation in the nursery.
The JVM has a fast generational GC, and as a result you don't have to do this kind of optimization to get good allocation performance.
See what? This is an unofficial benchmark game - and based on implementations people bothered to provide. Not some scientific test, and nothing that guarantees these are the best implementations. And even if they were they're not representative of server/long running program behavior (where JIT quality and especially GC implementation) matters.
Even so, the page linked starts with "Back in April 2010, Russ Cox charitably suggested that only fannkuch-redux, fasta, k-nucleotide, mandlebrot, nbody, reverse-complement and spectral-norm were close to fair comparisons".
Since Russ Cox is one of the Go co-developers, lets see the ones he accepts are fair. Of those 5 are present in the page, on 3 of which Java wins with decent margins (fannkuch-redux: 18%, k-nucleotide: 21%, reverse-complement: 17.5%) and on 2 of which Go barely comes ahead (fasta: 6.3%, spectral-norm: 6%).
Highly debatable. Even then, you see that golang ranks in the 102nd place for the plaintext benchmark, almost 6x slower than Netty, which is very established in the JVM world. Same with the JSON benchmark, golang is in the 126th place, ~3.5x slower than Netty and Vertx.
> but Go usually use between 3 and 10x less memory.
The JVM by default will use whatever memory is assigned to it. This makes sense from an efficiency and utilization point of view, as it generally aims for maintaining good throughput (whereas golang is only tuned for latency). The JVM now ships with new low latency GCs (ZGC and Shenandoah) which are currently available in experimental phase.
The first golang result in the plaintext benchmark ranks 8th (atreugo-prefork), which is incidentally immediately followed by firenio-http-lite, a Java library.
My point still stands, this is a third party library that is not commonly used, and depends on another third party library, fasthttp, which itself comes with caveats from the its author.
Doesn't matter, it's still Go code, and contrary to many languages on that benchmark ( Java included ) all Go code is built in Go, so there is no runtime / external library written in C/C++. ( one of the reason why Go is also slow on the regex benchmarks )
That is for the purpose of "measuring the quality of the generated code when both compilers are presented with what amounts to the same program." But Go also allows pointers to fields and array elements and by design reduces pointer indirections and allocations in general, which those comparisons don't capture.
> Go was written for servers, Kotlin for mobile apps.
Not really. Kotlin was just a better JVM language developed by the JetBrains folks before Google adopted it as a first class Android language, but I don't believe it was specifically developed for mobile initially.
The open question is if Java will start borrowing the best features from Kotlin and it will become less relevant (like Scala). To Kotlin's credit, it has better IDE support, it feels simpler than Java (whereas Scala feels more complex), and it cleans up a lot of fundamental language issues that Java can't get away from.
Kotlin is just a less verbose, more modern language targeting the JVM. It has nothing to do with mobile apps beyond that Google, much later, decided to support it for Android development.
Go is an efficient platform, however in these switches it usually goes something like "we took something hugely overbuilt, with layers and layers and layers and abstractions and abstractions, and rebuilt it with minimalist Go and now it's faster", which, of course it is.
I don't think that's true. You're confusing the fact that Kotlin has become the officially supported language for Android development with the idea that it was written for mobile apps.
You may be confusing the adoption of Kotlin for Android. That happened more recently Kotlin has been open source since 2012 and designed as a direct (compatible) replacement for Java.
I wouldn’t say that Kotlin was written for mobile apps. JetBrains, the company that created Kotlin, doesn’t even have mobile apps AFAIK. It’s just meant to be a more modern JVM language and that means it does anything Java does just better. Servers included.
Kotlin looks nice now but on the long term it's a bad choice since they're stuck on Java 8 they will always lag behind the real JVM and won't be able to catch up since they need heavy modifications.
Interesting read. It always surprises me that companies go many years running major apps and infrastructure with interpreter-based languages like Python and Ruby to begin with. It's an incredible waste of energy, compute, and for web apps sometimes, users' time.
Developers need to be a lot more disciplined about performance and efficiency. I'm glad Khan went to Go, but man all those years wasted.
I would love hear some thoughts from others that made the move, especially anyone that decided to move back to a monolith repo.