By the title alone, this means nothing. How much would it cost otherwise? What is the percentage savings?
In TFA, it gets better though: "Steve: That’s pretty easy. When I started on the spam team, we had close to 1,400 servers running. When we converted several parts to Elixir, we reduced that by around 95%. One of the systems that ran on 200 Python servers now runs on four Elixir servers (it can actually run on two servers, but we felt that four provided more fault tolerance). The combined effect of better architecture and Elixir saved Pinterest over $2 million per year in server costs. In addition, the performance and reliability of the systems went up despite running on drastically less hardware. When our notifications system was running on Java, it was on 30 c32.xl instances. When we switched over to Elixir, we could run on 15. Despite running on less hardware, the response times dropped significantly, as did errors."
> When our notifications system was running on Java, it was on 30 c32.xl instances. When we switched over to Elixir, we could run on 15.
Would be curious to know how they tried to optimise the Java stack.
Because on every benchmark I've seen the JVM is faster in every which way than Elixir. Except for memory where often people will over-provision the JVM rather than look at where their code might be over-allocating or leaking.
The advantages of Elixir are not performance-related.
There is a lot of focus on raw performance on web-related services, when in reality most of their running time is spent waiting for IO. If there are two things the BEAM excels at, is IO and turning almost any problem into half a dozen processes that are scheduled and run in parallel, if not geographically-distributed, with 1/50th the effort of any other language.
We live in a world with 32+ core CPUs. If your load is not spread uniformly all over those cores, you're losing a ton of performance. Handling requests over separate threads, like 99% of languages do, still isn't enough if all the business logic runs on the same thread.
I'm currently writing a web crawler in Elixir, and it is easier to design it so every request is done and processed in parallel, than to write a naive sequential one you'd do in any other language in half a day.
Though if people consistently over decades sing a language's praises on a single point consistently like they do the beam on this point, it's usually not without merit
They wrote it from scratch with the benefit of all the knowledge they had gathered after running the old system for years. A 2X improvement would not be surprising to me, even if they had rewritten it in the same language. .
According to others in this discussion they also made architecture changes (DB, Kafka etc.). Do we know if that improved the performance?
There is no objective way we can tell if Elixir had any performance impact. It could have been due to the rewrite, the architecture change or a combination of both.
Elixir/BEAM's (Erlang Virtual machine) frugality isn't just theoretical; it's got real-world creds. Originally tailored/optimized for 1980s telecom switches (a fleet of single core extremely low powered machines.) Fast forward, and you've got a setup that's less demanding on your A/C and optimizes multi-core usage like a champ. it utilizes the same concurrency abstractions whether its 2 cores across two machines or 64 cores on the same machine, it makes no difference to the BEAM
Take the hot code reloading and actor model-based concurrency as a prime example. It's like getting AWS-level functionality without the steep bill for a lot of companies.
Though, I gotta admit, it used to be a hard sell for CPU-heavy workloads, especially number crunching. But Elixir is stepping it up with their Nx library, so that's changing.
Examples of companies cashing in on BEAM's efficiency:
Bleacher Report: Went from 150 servers down to 5. No joke.
Discord: Handles millions of real-time users without breaking a sweat or the bank.
Financial Times: Their content recommendation engine got both efficient and cost-effective.
Change.org: More petitions, fewer servers.
Podium: A million SMS messages a day and didn't have to massively scale hardware.
> a fleet of single core low, extremely powered machines.
what are "extremely powered machines"?
> It's like getting AWS-level functionality without the steep bill
which part of AWS functionality? load-balancing Beanstalk-style is free. AWS compute is not free, but neither is compute free with Elixir or whatever stack you run.
Totally get your point about AWS having free-tier services and compute never being free, regardless of the stack. My point wasn't that BEAM offers free compute, but rather that its inherent features can sometimes make certain AWS services redundant. For instance, Elixir has built-in fault tolerance with its actor model and supervision trees. This means that even when a process fails, it gets rebooted automatically without messing up other processes—kind of like what you'd use Auto Scaling and backup services for on AWS.
Similarly, distributed Erlang allows Elixir to run across multiple nodes. This could cut down the need for extra AWS instances or orchestration layers like Elastic Beanstalk. And when it comes to deployments, Elixir's hot code swapping can simplify what might otherwise require rolling updates or blue-green deployments with Elastic Load Balancers in the AWS ecosystem.
On the concurrency front, Elixir is designed for handling a high number of users and tasks simultaneously, which might reduce your reliance on EC2 or Lambda. Phoenix, Elixir's web framework, even has real-time capabilities baked in, so you don't need extra services like AWS WebSockets for that.
Finally, Elixir's actor model can serve as an in-memory message queue, which could potentially negate the need for something like AWS's SQS. So, while you're still incurring compute costs, the need for additional AWS services could be lessened, thereby simplifying your architecture and perhaps lowering overall costs.
Not to just toss around anecdotes, but I once rewrote an email service in elixir for a company from a literal sketch on a piece of printer paper describing what their old system did. The new service ran on 1 server vs half a dozen and was both faster at crunching through their mail queue and used far fewer resources. Some tasks are embarrassingly parallelization and the BEAM excels at those tasks. Sure you might want features it doesn't have for certain systems, but for some things it really is the right tool.
Whatsapp took over the world running on Erlang/BEAM, with barely any servers and a few engineers. I honestly don't know what could be a better success story than that, but Discord has also done pretty well. The BEAM + Rust combination is looking scarily effective right now.
I'd be willing to accept the argument that Whatsapp happened to have assembled an uncommonly good team, but it is a signal.
Yeah the BEAM with some Rust NIFs is a great combo. I'd definitely consider it in the future for many types of problems and anything involving a HTTP or GraphQL interface.
If I had said 1/4th the effort would that have invalidated my argument? I pulled that figure out of my arse from experience. YMMV.
You'll note I'm not selling anything here, and no one is paying me a commission.
A junior that got "swindled" by my claims and spends a weekend learning Elixir becomes a better programmer and earns another feather on their cap. How tragic.
Junior devs, if you want to become a senior greybeard like me, learn anything that tickles your fancy, and ignore anyone that says it isn't worth your time. Even learning COBOL will make you a better programmer. I can only promise that Elixir is more fun than COBOL.
Erlang and BEAM was designed to handle Ericson's telephony services. Piping-lots-of-stuff-in-parallel in fault tolerant fashion is the main use case around which it was designed.
To all junior developers--this developer doesn't know what he is talking about.
Erlang and BEAM, the things underlying Elixir, were specifically engineered to be used in a reliable, distributed fashion including a gigantic amount of idioms and support that no other language comes close to.
Erlang's bit syntax is hands down the best byte stream serialization that still exists--both from a perspective of performance as well as expressiveness. OTP is a documented set of idioms and behaviors for building systems that are meant to be highly distributed and deal with failure gracefully. These include things behaviors like in-place upgrades, supervisors which shut down and restart failing processes, error delegation, etc.
Erlang was meant for genuine "five nines" reliability. No other language comes close (maybe Ada does--I'll let their proponents chime in about that).
You can do that easily in modern Java--even for older JVMs, tools like Netty and later Vertx have been around forever. Or in Node, even more easily.
Elixir/BEAM do have some benefits that are worth considering for many projects. But they absolutely are not special in this regard, and that's the junior-developer trap about which the person to whom you replied was referring.
You may be enamored with the "nothing new under the sun" idea that "Turing complete is Turing complete, anything you can do in Python you can do in Brainfuck" but as someone who has written code professionally in about a dozen languages, no, you can't just as easily get the same kind of parallelism out of Java as you can Elixir. To assert otherwise is factually false.
Is it possible to get to the same result? Yes. It is not, however, anywhere in Elixir's ballpark of "easily". Do not discount the power of language-level, not just support, but encouragement. Especially when you're working with junior devs, if "the right thing" and "the thing the language wants you to write" are not in alignment, everything is much, much harder. Erlang and Elixir actively encourage easily parallelized code. Java activity encourages tangles of objects.
Java, kind of with Akka or similar, although even with that one always be aware of blocking. Loom should help. Node: not really unless something dramatic has changed. Using 32 cores is going to require 32 separate node (OS level) processes, and your on your own for providing communication between them, plus callbacks aren't near as intuitive as the BEAM process model (think green threads)
I mean sure, but given the topic at hand is getting performance out of your 32-core server, I'm not sure that's a super relevant observation. "I've been given way more hardware than I need" is an entirely different problem I think most of us would be happy to tackle.
I'm a seasoned Java and Node developer, but have never touched Elixir/Erlang. Could you spell out for me the benefits Elixir provides over Java concurrent code? Is it actually a performance gain, or simply a nicer syntax? I am a bit confused by the claims of this post and of the earlier comment. Thanks a lot
So, when you're working with Elixir or any language on the BEAM VM, you're in a world where data is immutable and processes are isolated. It's like having Akka's actor model but at the VM level, so it's super integrated.
The BEAM VM itself is a different beast compared to the JVM. It's more like its own mini-OS designed for real-time multitasking. Each process has its own garbage collection, and it's all non-blocking. So if one process goes belly-up, it doesn't take the whole system with it. Imagine a network of telephone switches; if one gets zapped by lightning, the rest keep chugging along. That's the level of fault tolerance Erlang and BEAM were designed for.
Now, speaking of fault tolerance, let's talk about how easy it is to mess up an Akka system if you're not careful. Say a new Java dev joins your team and doesn't get Akka's actor model. They might introduce shared mutable state between actors, which is a big no-no and can lead to all sorts of race conditions. Or they might do something like put a blocking operation inside an actor, which can hog resources and mess up the whole system's performance. Akka's great, but if you don't follow its principles, you can still shoot yourself in the foot.
So, the beauty of Elixir and BEAM is that a lot of these good practices are enforced by the VM itself. You get that fault tolerance and concurrency baked right in, without having to rely on every engineer knowing all the best practices.
I recently saw a talk on Youtube about "structured concurrency" in Java. It looked pretty interesting. But it seemed to me the way to achieve parallelism is by starting on a procedural code flow and as you come to a part that can be parallelised, you split into a bunch of tasks in a scope and that scope will monitor how those are executed. Once the results are accumulated, we go back into the procedural flow. This is similar to what is done in go IMO and is a pretty good technique.
In Elixir, on the other hand, you could create a module which is like a server process. You can start this server in a procedural flow, or you can "connect" it to a supervisor by giving it the startup information needed and a strategy to be used on how to restart the process in case it crashes for some reason.
A client process (or processes) with it's own module can then send messages to the server which will handle the incoming messages in its inbox sequentially. If you squint at it from an angle, modules look like classes in that they provide a way to separate code logically.
This way of doing concurrency takes getting used to and has a higher initial learning investment. But it feels cleaner and is less prone to user errors. In go for example you have to be careful of closures and shadowing which will result in shared memory and hard to debug errors, even though the initial investment in learning Go is much lesser.
When a person makes a claim, especially such a ridiculous one, it is perfectly valid to outright reject that claim without any argument. Why? Because no argument was provided in the first place.
I would say that a person claiming that it provides better results with 1/50 the effort is a claim that needs substantiation and was born from a position of hype, yes.
The go runtime has similar capabilities as the BEAM runtime when it comes to concurrent workloads. Go has the benefit of being a typesafe compiled language which gives it speed benefits. But using either one of them instead of Java is probably going to be a huge win for most teams on concurrent workloads.
> The go runtime has similar capabilities as the BEAM runtime when it comes to concurrent workloads.
Only if you think that the BEAM is similar to being able to easily spawn a function on a separate thread and having channels.
Last I checked, goroutines had no mailboxes, supervisors, process monitoring, registry, and its scheduler has a much smaller scope and featureset than the BEAM's.
I swear it's obvious when people comment about the Erlang ecosystem without having really used it for anything.
Exactly, you've hit the nail on the head. While Go's runtime does offer some nice concurrency features, like goroutines and channels, it's just not on the same level as BEAM when it comes to a comprehensive approach to fault tolerance and system resilience.
BEAM's got this whole ecosystem built around it, right? Mailboxes, supervisors, process monitoring, and a registry—these are all first-class citizens in the Erlang world. And let's not forget the scheduler; it's like comparing a Swiss Army knife to a simple pocket knife when you look at BEAM's scheduler next to Go's.
It's kinda funny when people talk about BEAM and Erlang as if they're just another runtime or language. (it's more OS like than traditional VM like) They're really more like a whole philosophy of how to build robust, fault-tolerant systems. And if you haven't actually built something substantial with it, you're likely to miss out on what makes it so special.
I've written non-trivial code in both Erlang and Go. Beam doesn't have anything to do with supervisors that's an OTP thing. mailboxes can be simulated with channels and there are conceptual similarities. process monitoring and the registry are unique to BEAM it's true and they have useful properties to leverage. But they aren't core to how the BEAM handles concurrent processing. The schedulers work on a similar conceptual mechanism for the programmer when writing concurrent code with differing optimizations in each.
In my experience I think my statement still stands.
> Beam doesn't have anything to do with supervisors that's an OTP thing.
The key VM feature you're missing is process isolation. Without it, supervisors are not really possible - you can implement something that vaguely looks like them, but it won't provide the fault tolerance guarantees.
Imagine for example an Erlang process that leaks files. At some point it hits an EMFILE, gets killed, and the supervisor restarts it. The system will then go back to operating normally.
Now imagine a Goroutine doing the same. It hits EMFILE, exits, and the supervisor restarts it. This doesn't help anything: it just hits the same error and the system is unusable. There's no way for the VM to guarantee cleanup when a Goroutine exits, because it doesn't isolate Goroutines and track which one owns which resources.
Links and monitors are tools to extend the same behavior to user-managed resources like DB connections and so on. The responsibility for cleanup if a process crashes while holding a DB connection falls on the DB connection library, not on the error handling inside the crashing process.
> Beam doesn't have anything to do with supervisors that's an OTP thing
Are you aware Erlang was designed to be a VM(the BEAM), a language for that VM(Erlang) and a comprehensive set of patterns and infrastructure for fault tolerance(OTP)?
Those naive sequential services don't really exist in prod though.
It would be a dark day to discover my AWS t1.337metal was blocking 63 cores on I/O when nearly every modern lang has a wealth of async functionalities just awaiting to be exploited.
IO lists, which are the foundation of anything that build a string incrementally, are managed with vectored IO syscalls out of the box (readv/writev), which you'd have to handle yourself in most other languages, or resort to allocating and endless memory copying which Erlang is able to avoid.
>Handling requests over separate threads, like 99% of languages do, still isn't enough if all the business logic runs on the same thread.
I mean, if your business logic is inherently serial it makes little difference if you run it in a single thread or if each serial segment in between IO requests is run in a different thread. One way or another it's not going to get parallelized.
All of what you mentioned applies equally well, if not better, to the JVM. Especially now that it has virtual threads. The article does not go into details about the implementation of the Java program. My guess is that they were not using asyc code, or profiled it to see what the bottlenecks are. Rewriting is always fun especially in the latest flavor of the day stack.
It’s not even remotely similar. Node’s cluster is just bog-standard OS subprocesses running their own event loop.
To spread work over multiple cores with the cluster (or even worker-threads) modules you have to do so explicitely and manually. It’s essentially the same model you get with pthreads, or java, or python.
BEAM is a completely different model, the “processes” are internal tasks, which the runtime will schedule appropriately over available cores (the scheduler has been multithreaded for 15 years or so), spawning processes is as common as spawning tasks in JS, except each of these is scheduled on an SMP runtime.
beam will literally send processes between machines erlang) more easily than node will balance load over cores.
At the Abstractions conference in Pittsburgh in 2016, Joe Armstrong was hanging out in the hallway with his swag bag, just a regular engineer complaining about Jira and his manager (he had no interest in managing) and asking people's opinion about the schedule and lunch places. We were looking at the program for the next sessions and someone said there's a talk about ideas for adding concurrency features to Node, and we said great, lets go, and a few of us went to go stand in the back of that one.
On a number of points the presenter was proposing, like message passing, immutable structures, and process tree management, the presenter would say, "but Erlang's had this feature for many years..." and the room would laugh and turn around and acknowledge Joe. He was modest but the validation must have been nice.
I'm unfamiliar with BEAM. How does this compared to goroutines? Obviously they won't migrate between machines, but concurrency feels very easy and ergonomic in Go.
Go routines were pretty directly inspired by Erlang processes, so in terms of primitives I'd say they are very similar, aside from go lacking the distributed features you already mentioned.
Where Erlang/Elixir add value beyond go routines is what OTP (kind of the standard library) provides on top. Pre-built abstractions like GenServer for long running processes, Genstage for producer/consumer pipelines and supervision trees for making sure your processes haven't crashed and restarting them and their dependents if they have.
At the most basic level it's a bit similar: there is a multithreaded runtime which schedules work in userland. Green threads if you will.
The devil, however, is once you go beyond the trivial.
First, the units of work operate completely differently, BEAM follows the actor model rather than CSP, meaning every actor has an address / mailbox and the actors can send one another messages through this, any actor can send any other actor (they're aware of) messages, and actors can process their mailboxes however they want.
But BEAM is also completely strict and steadfast about its actor model: its actors are processes, each has its own stack but also its own heap, when one actor sends a message to an other, the message content gets copied (/ moved) from one stack or heap to an other, processes do not share memory[0]. Incidentally this is what makes distribution relatively transparent (not entirely so, but impressively still): if everything you do out to interact with others is send asynchronous one-way messages, it doesn't really matter whether they're in the same process (OS), in a different process (OS), or on a different machine entirely.
The reason BEAM works like that however is not for any sort of theoretical purity, instead it is in service to reliability, which is the second big difference between BEAM and Go: BEAM error handling is inter-process, not intra-process. BEAM's error handling philosophy is that processes encounter errors, for all sort of reasons, and when that happens you can't know the entire state of the process, so you just kill it[1] and it should have one of its buddies which is linked to it, and whose job is to handle this and orchestrate what happens.
BEAM has built-in support for linking and monitoring. In the context of erlang, linking means that if one process dies (crashes), the other is sent a special message which also kills it. This message can be received as a normal message instead, in order to handle the crash of your sibling (in which case you receive various metadata on the crash). Monitoring means you just want to receive the crash signal. The reason you might prefer linking to monitoring is that if you're a manager of other processes and you crash, you probably want all the processes you manage to die as well. Which doesn't happen with monitors.
That is because BEAM has its origins in telecommunications, where reliability means redundancy, and oversight. So the way you structure an application in beam (often) is a tree of processes, where many of the processes have oversight of a subtree, handle fuckups (maybe by restarting, maybe by something else), serve as entry point to their workers, etc..., and if one of the leaves dies that's just a signal sent to its parent, which might just die and signal its parent, which will handle it somehow. This is the design principle known as the supervision tree: https://www.erlang.org/doc/design_principles/des_princ#super...
The third big difference is more philosophical and has to do with code reuse: because of (2) above, a lot of erlang / beam / otp is communicating between processes in a subtree, moving messages between them, exit signal strategies, etc... which leads to behaviours (https://www.erlang.org/doc/design_principles/des_princ#behav...), which are pretty alien because they're more or less mini frameworks, which not only are two things which are usually put opposite one another, but many people don't really want to hear about frameworks.
But that's what they are: behaviours are the encoding of entire prototypal lifecycle and communication patterns, where the user / implementer of the behaviour fills in the "business" bits.
Oh yeah and beam comes with an entire suite of introspectability tooling, which is kinda linked to (2): all the oversight thing ends up at people, so you can connect to a runtime and look at and touch all the things, more or less.
BEAM is a bit of an OS running on an OS really, probably closer in philosophy to the image-based languages of the 80s. In part because it is a language from the 80s. Not quite image based though, or in an other way designed to go even further and just run forever, as it includes built-in support for hot code reloading and migrations (though from what I remember that's not super great or fun, it was quite messy and involved to actually do properly).
By comparison to all that, goroutines are just threads which happen to be cheap so you can have lots.
[0] kinda, some objects live on shared heaps as an optimisation but they're immutable and reference counted so it's an implementation detail.
[1] and here if actors share any memory, an actor might be dying in the middle of updating or holding onto shared state, which means its error corrupts other actors touching the same state
Async/await and any module won't save you from global state, data races, and the fact that you're running on an imperative language with mutable state. Additionally, the ergonomics are not the same, so even if you could replicate the BEAM in Node or any other language, you'd have to be a masochist to do it.
Lastly, the concurrency are primitives to the entire runtime, not a set of external libraries maintained by whoever, which might be incompatible with other libraries you might want to use.
I think Node is still single core by default? Elixir (or rather the Beam) will handle core utilisation so if you start a load of Elixir processes they’ll be spread across multiple cores.
Node itself has never been single-threaded. The execution model for JavaScript is single-threaded, so there’s no working around that, but libuv uses threads to build async IO on top of blocking operations.
Then there’s worker threads, which are pretty similar to web workers AIUI, that give you parallel execution for cpu-intensive work.
Obviously, though, none of these facilities compare with BEAM
It doesn't effectively do that. It does about 10% of that, ineffectively.
Instead of running a single VM with full knowledge of how to run lightweight processes, designed to fully take advantage of modern multi-core CPUs with multiple guarantees enforced by the runtime you have multiple single-threaded VMs awkwardly communicating with each other over a bolted-on API
Yeah but then you have to handle a lot of the synchronization of memory. It is hard to make you realise what is possible on the erlang vm without having tried it.
In particular, it is preemptive. This... Makes a lot of stuff easier.
NodeJS was designed to run single-threaded. Sure, you can use cluster module to run it with multiple but there's memory overhead and the ergonomics of sharing state and message-passing is nowhere near GenServer. Not to mention all the other benefits of BEAM.
But about the specifics of implementing a web-crawler, a NodeJS way to implement it would be to parallelize using lambdas.
They rewrote, which is known to help too. Going from 30 to 15 instances is not bad but it's very likely that a Java-to-Java rewrite would have helped go down too.
The big one however is going from 200 Python servers to 4 Erlang ones: a 50x reduction is quite something and a Python-to-Python rewrite would not have allowed to achieved a 50x gain:
> All this is possible because Elixir, and the Erlang platform underneath, are fundamentally designed for always-online software with many users. When you use the right tool for the job, the benefits are clear.
In short, the Elixir is doing something completely different. And they are also not counting some new pieces like the database cluster and kafka based aggregators, etc.
> The big one however is going from 200 Python servers to 4 Erlang ones
If your rewrite or refactor gives you a 10X in performance, that's not an optimization, but a bugfix. Unless you are a researcher who have just found a revolutionary algorithm.
It is a matter of mindset. I work on software with soft real time, storage capacity and power consumption constrains, there is a constant flux of "small" feature requests and I cannot even "throw hardware at the problem" at will like as they apparently did, until it became unquestionable that the problem had to be fixed properly.
I'd recommend to adopt this mindset even if one doesn't have those constrains - without erring on the side of "faster than necessary" though - because it's usually difficult to assess how much time and money are leaking because of the inefficiencies one accepts in the name of questionable reasons. They were sort of lucky to have a bill to show it to them.
A proactive stance in this regard goes a long way.
> They rewrote, which is known to help too. Going from 30 to 15 instances is not bad but it's very likely that a Java-to-Java rewrite would have helped go down too.
A java-to-java rewrite could have been as much if not more painful than a java-to-elixir rewrite, especially if the service is highly concurrent.
Rewriting synchronous code as asynchronous in Java is a lot of work and not fun at all IMO.
Given the recent arrival of virtual threads in Java 21 this may not be necessary any longer but at the time I think it was a perfectly reasonable choice.
> it's very likely that a Java-to-Java rewrite would have helped go down too.
it depends.
The elixir programming paradigm might be one where you are able to more easily write efficient, but still highly concurrent code, where as it would take more work to do the same in java.
Elixir / Erlang does open up a lot of paradigms. However it'll be interesting to see how Java fairs with its new virtual threads. There was already Akka so it should happen fairly quickly.
Still I'd prefer Elixir. The BEAM VM just runs lighter.
Flagged for being silly flamebait. The Python projects you have personal experience with might have been poorly run but that’s not representative of the language, and it’s not going to lead to a conversation where anyone learns something.
The syntax and semantics of the language change in non-reverse-compatible ways between every minor release. This is independent of project management.
As an example, between 2.3 and 2.5, the syntax for package variables was changed (and then the semantics were changed between 2.5 and 2.7). There is nothing you can do as a python user to ameliorate the impact of such changes other than to not use those language features.
Can you explain how "managing my project better" would have allowed me to avoid the impact of this change?
> The syntax and semantics of the language change in non-reverse-compatible ways between every minor release.
Even accepting this and assuming the average project was bit by every single one, the release cadence for minor versions is aboit once per year (recently, almost exactly that, in October), and minor versions are supported for 5 years, so this would justify updates every year if you were a maximally eager adopter, or every five years with a maximally conservative while only using in-support versions approach, or somewhere in between for less extreme cases, not every three months.
> As an example, between 2.3 and 2.5, the syntax for package variables was changed (and then the semantics were changed between 2.5 and 2.7).
2.7 was released 13 years ago. Why would you reach that far back for a relevant example?
That’s three years, not months, and consider that it’s possible that how we develop software as a field might have matured over multiple decades. The edge case you’re referring two didn’t even affect most packages in the 2000s so it’s quite a stretch to say that something which happened in 2006 embodies how Python is developed now.
Don't take this the wrong way, but I don't really care about how your 2.3 apps were or were not affected. It's not my job to maintain them. The apps I cared about were the apps I had to mantain. And it turns out that if you have tens of thousands of lines of python, you eventually hit a problem that needs to be fixed.
So sure... if you have a 200 line program, maybe you won't hit any code whose semantics have changed. Large apps will (and still do.)
So say “when I worked on a Python project many years, we had a lot of problems with one release” – people might find it weird that you’re bringing up old history but nobody is going to doubt that you personally had an unpleasant experience. It’s okay not to like Python!
What’s getting criticism are these huge sweeping claims like “you rewrite your code every three months” or “syntax and semantics of the language change in non-reverse-compatible ways between every minor release” which you have been completely unable to support or the attempts to dismiss anyone else’s different experiences as somehow less valid.
I don’t tend rewrite Python code any more often than is needed due to feature changes or occasionally refactoring to pay off some maintenance friction caused by design that has in practice turned out to be suboptimal.
When you move from one minor rev of python to the next, some language feature changes (either syntax or semantics or features no longer work.)
For instance... if you use async io in 2.x, the debugger stops working. Between 2.3, 2.5 and 2.7, the syntax of package variable scoping changed and then the semantics changed from package to class variables.
If you used a feature like package variables in your code in 2.3, that code would not work in 2.5. If you fixed it in 2.5, the semantics changed so that if you defined a package variable according to the 2.5 syntax, but it was defined within a class, it became a class variable.
> When you move from one minor rev of python to the next, some language feature changes
Even if there was a breaking change affecting your project every minor version, to have the cadence of backward-compatibility induced changes you siggest you’d have to be switching Python versions forward about four times as fast as they are released, which, if you started with the oldest in support version at the beginning of the project, you could only sustain for about a year and a half, before running out of versions to switch forward to.
> If you used a feature like package variables in your code in 2.3,
Then you are probably out yelling at kids to get off your lawn; 2.3 being out of support for 12 years.
That's great that your 5 line scripts don't use features that change between revs, but people who have to maintain large python apps have to book time to pore over the latest language version's definition, update our linting tools to find where in the codebase we use a deprecated feature, change the code, update the tests, retest and redeploy.
Not to mention getting a version clean dependency closure. Though we have forked and rewritten some of the non-standard modules we're dependent on to be less broken and to give credit where due, it does seem like standard modules supporting python3 are version clean unlike python2 and 1.6.
The heuristic we use is about an hour of dev time per 750 lines of code so our 70,000 line legacy python app takes somewhere around 100 dev hours per minor revision upgrade.
Compare this to a legacy C application written in 1989. How do we port it to the latest version of C? We just copy it and compile. That community went to a lot of trouble to ensure code written in previous versions of the language still worked. The last time I heard of a language feature being deprecated was in 2011 (though I think gcc recently undefeated support for trigraphs.)
In my opinion, your python baby is ugly. It was ugly in 1.6. It was ugly in 2.x. And it remains ugly in the 3.x era. You should come to terms with the fact that some people just don't like python.
> You should come to terms with the fact that some people just don't like python.
Nobody cares about that - it’s a given that any language will have fans and detractors and most of us are mature enough to focus on what works for the projects and teams we’re part of.
What we’re objecting to is portraying your experience as a global truth. If you don’t like it, sure, but unverifiable hyperbole isn’t contributing anything but noise. This could be your opportunity to learn what tools or practices people use or consider whether the way you want to use the language is at odds with the core developers’ view.
You’re the only one doing that. Nobody here has questioned that you had an unpleasant project to work on - we’re only arguing that it’s not representative of the experience now (your initial hyperbole) or even 15 years ago when the 2.3-2.5 transition would have happened. Many of us have worked on larger codebases in that timeframe with very different experiences.
They threw DB[s] and Kafka into the mix. Python would get them the same net gain, if not more, with less dev cost. Python's I/O workloads perform on par with Go/NodeJS (See FastAPIs 3rd party quarterly benchmarks as an example).
If they rebrand Python to Metal or some other name, people would recommend it left and right. It's just suffering from bandwagon criticism. Yet it remains one of the top 3 languages for years, covering several domains.
Python is the second best language for everything. Even with its warts (which any language gets after 30 years), it's a very solid, defensible choice.
However, it does not have a great concurrency story relative to languages that were built concurrency first (Go, Erlang), and it's fair to acknowledge that.
I think it's more important to look at the re-architecting than the different language. Second, I think certain architectures - like the actor model described in the article - work better and more intuitive if you use a different language.
That said, I'm sure a 2x performance improvement could've been done in Java as well if they did a re-architecture. They could also have made a lateral movement and go to a different JVM language, like Scala that also has an actor concurrency model + accompanying syntax.
> certain architectures... work better and more intuitive if you use a different language.
This is the key point that people miss when pretending that languages are interchangeable. The entire point of making a programming language is to make certain types of ways to solve problems easier to express. This constitutes a language's "pretty path". By providing such pretty paths, languages necessarily make less desirable paths, which will be painful to slog through.
If you try writing a functional pipeline in Java, you're going to have a much worse time than doing the same in Elixir. If you try to do Object-Oriented class towers in Scheme, it's going to be painful. Etc, Etc. You can write a Rust program and a C program that compile to the exact same binary, but I can put a whole stack of cash on which one's going to be easier.
This is an important point. Sometimes when you need to do a major rearchitecture of a system it can help to choose a language that is more appropriate to that architecture. The Elixir/Erlang ecosystem has a better story for Actor Model development so it makes sense to choose them for the new architecture. It depends on the team and the specifics of the new architecture because the devil is always in the details. It may be that the new language isn't enough of a win to justify the switch and sometimes the win from the new architecture is big enough that sticking with the current language makes sense.
But a knee jerk response of: This is mostly just good because they rewrote/rearchitected it, ignores the benefits of using a language or technology that fits the new architecture better.
That’s exactly the thing? Why would you bother optimizing the code, looking for overallocations, leakage, tweak parameters, when you can just take a friendlier language for the same benefits.
I think the question is whether they significantly changed the architecture at the same time. For example, reading the description of the Python migration sounds like they applied a lot of experience which would have benefited any language, and micro-optimizations like what you described would have been a rounding error on those larger changes:
Yep. We got a 10x improvement on throughput for our backend runtime, which was in Java, by moving to a better architecture for performance hotspots.. using Java again.
In a rewrite with a different design/architecture, that new design typically accounts for most gains, rather than language.
A language may make some parts of that rewrite simpler.
I've gotten 100x improvement with no code change by just adding an index in the database table. An inexperienced developer might have blamed the database and insisted on moving to NoSQL because of "web scale". If they got the chance to rewrite it, they could have pointed to the performance increase as a proof that they were right.
They really should teach benchmark training more widely in the industry. Even though I'm readily here to sing the praises of Elixir when warranted, nothing beats actually profiling end to end the workload(s) in question that need to be improved. Sometimes, it really is optimizing the database that matters most, like adding an index (or using window functions or stored procedures, as in many cases I've had in the past)
>Would be curious to know how they tried to optimise the Java stack.
Fairly safe to day not at all.
The JVM is extremely fast, very efficient and very scalable (you can write java code that scales linearly with available cores). If performance or scalability is a metric to care about, it is nearly impossible to outscore java. You can, with very skilled C/C++ developers, but it's going to be difficult to find those people and it'll be a lot of work. If you need extreme performance on a reasonable budget, you can't do better than java. I know java isn't hip anymore, so this is not a popular truth, but it is.
then I have to pay Azul for the faster JVM and I may still not sufficiently cut my server costs, or the cost of the JVM paces the reduction I have in server costs.
2 million a year is several developers compensation saved every year. It also opens the door to more savings down the road, potentially, as existing workloads may discover they can use the same approaches to reduce cost / overhead.
I was just adding that in addition to the truth that normal Java could just do it. But if that is really not enough, then paying for a few Azul licenses is an addition can save you time and money.
Elixir is slower than plain PHP according to the techempower benchmarks. I'm not even sure how that's possible but it is. By like a factor of 2 iirc. I'm not sure how elixir is that slow since it's compiled.
The techempower benchmark are... Quite infamous in the elixir community.
Long story short, they are running in debug mode, badly written, not optimised, with bad OS level settings. And every time the community have tried to contribute fixes, the experience has been... Really bad.
So we stopped trying. If things have changed we could try again but ... We just wrote them off
As far as I can tell, you’re basing this off a single thread from a prior techempower round. The result being cleaned up in a subsequent round, but are ignoring that.
It depends on what is actually being benchmarked though; if it's the simple JSON payload, more time will be spent on HTTP parsing (done in nginx for the PHP benchmark so really really fast) and some JSON parsing (done in a C library for the PHP benchmark so really really fast). Basically, how much PHP is actually being benchmarked?
Are you looking at a benchmark that compares real-world usage, or a microbenchmark like hello world or a small JSON payload?
PHP is interpreted and Elixir is compiled. Comparing them to Ruby and Python make no sense as they are interpreted as well.
The fact that Elixir, a compiled language, known for speed, is slower than PHP is surprising. As far as "Most PHP libs are wrappers over C code". That's just not true. Most PHP libs are in PHP
> Even old pre-7 PHP was much faster than Ruby, Python, and others.
No, PHP 7 was an impressive step forward in speed because it switched to better bytecode and object representations internally, but Python wiped the floor with most versions of PHP 5 for exactly the same reason. (PHP 5.5 with opcaching was roughly comparable.)
But this comparison overall is like trying to find the strongest two-year-old...
It's a ridiculous thing to say that any version of PHP is slow. You could run PHP5 today and it's still not slow by any modern standard. Even on the hardware of 15 years ago almost any script you throw at it finishes in single digit milliseconds.
How is that slow? I'd like to meet the developers that consider this slow.
I guess it depends on what the script is doing, but I'd consider that slow. Code I've worked on in Scala takes more like 50-60 microseconds per http request for a json CRUD type of thing (plus latency to wait for the db, but it can serve other requests during that time).
Because they are different languages with different VMs and different concerns. Made for different purposes. Number crunching is slower in Erlang VM (you can call C code though).
They did savings by re-implementing their services and attribute those savings to the new tool / programming language.
I wonder what the saving would look like if they chose another tool for the second / optimized system. I doubt it would differ much if they went with Go, Java or stayed with Python.
It shows that Python with Django is literally 40 times slower than the fastest framework. Python with uvicorn is 10 times slower.
The use of languages like Python and Ruby literally results in >10x the servers being used; which not only results in higher cost, but also greater electricity use, and pollution and carbon emissions if the grid where the data center is located uses fossil fuels.
Not to mention, dynamically-typed languages are truly horrible from a code readability point of view. Large code bases are IMO difficult to read and make sense of, hard to debug, and more prone to bugs, without static types. I'm aware that Elixir is dynamically-typed, but it (along with JS) is an exception in terms of speed. Most dynamically-typed languages are quite slow. Not only do dynamically-typed languages damage the environment as they're typically an order of magnitude slower, they also lower developer productivity, and damage the robustness and reliability of the software written in it. To be clear, I'm in favor of anything that increases productivity. If Kotlin were 10 times slower, I'd be happy to pay that price, since it is genuinely a great language to work with, is statically typed, and developers are more productive in it. I'm not sure how Elixir mitigates the downsides of dynamic typing (maybe lots of 'type checks' with pattern matching?), but it would definitely be super-nice if a well-designed (Kotlin or Haskell like?) statically-typed language targeting the BEAM existed...
Since you mentioned them I think it's worth telling people to take those benchmarks with a larger grain of salt. They've become such a pissing contest that I don't know if they can be called "real-world". There doesn't seem to be much scrutiny of the implementations.
Take some Rust frameworks: they write out pre-computed header strings to the output buffer, completely bypassing what the framework's documentation recommends. Examples are actix[1] and ntex[2]. No one would ever do this in real life.
Now I like Rust, and it'd likely be some of the fastest even without these shenanigans (Axum and may-http don't do that, I believe). But I don't know if other languages/frameworks have benchmarks implemented in the same non-idiomatic way just to look better.
Hmm, the pre-computed header strings is definitely interesting optimization, and does seem to be a bit non-idiomatic – but IMO, this is the sort of optimization the framework itself should try and do.
If first several bytes of the header are going to identical for many requests, a super-optimized framework would ideally memoize it, and write it directly out, as this benchmark is doing.
(And if the framework itself is doing it, then any user of the benchmark would just inherit that optimization, and not have to resort to non-idiomatic optimizations...)
Did you even ever see implementation behind techempower benchmarks? There's NOTHING realistic in them. Those applications literally hardcode static content length header values to be faster. They are pretty good show of how low you can get to squeeze out performance but not one sane person will write code like that.
> greater electricity use, and pollution and carbon emissions if the grid where the data center is located uses fossil fuels.
to
> dynamically-typed languages damage the environment as they're typically an order of magnitude slower
is quite a stretch.
Do dynamically-typed languages inherently damage the environment? Or is it the fossil fuels?
Not that the appeal to the environment matters, because later on we have this:
> If Kotlin were 10 times slower, I'd be happy to pay that price, since it is genuinely a great language to work with, is statically typed, and developers are more productive in it.
> Do dynamically-typed languages inherently damage the environment? Or is it the fossil fuels?
My opinion is that slow languages that use 10x the electricity, with no ROI for the 10x energy use is bad.
High energy use, even if it's clean energy, implies a higher environmental toll. If a country were solely using nuclear and solar, higher energy use results in (1) more nuclear reactors constructed, and (2) more solar panels built. The manufacture and construction of both has an environment cost. Of course, with fossil fuels, the damage to the environment is potentially a lot worse.
> Not that the appeal to the environment matters
I don't think higher energy use is inherently bad. If we can improve the quality of life for human beings, then IMO a higher energy use is justified. I don't really believe in degrading our quality of life to lower our energy use.
My problem with many popular slow languages is that they have a negative ROI for the higher electricity cost. In exchange for 10x the energy use, you have a language that results in less-readable code (a serious issue), that causes more bugs / less-reliable software, etc. We literally a get negative ROI in exchange for 10x the energy use. Which is absurd and illogical.
If Hindley–Milner type inference had been more prevalent in the 1990s, I have a feeling dynamically-typed languages would have never take off. We're moving back to static typing with mypy, TypeScript, etc., but I'm hoping we move away entirely soon from using slow languages for writing servers serving large numbers of users.
If we actually got something in exchange for it, it wouldn't bother me so much.
For the vast majority of projects this makes no difference. If you are at Facebook scale? Sure. But then, you do what they did, write a VM to speed things up.
Dynamic typing doesn't cost much more money on average and thankfully the cost of energy itself is a motivation for companies to do rewrites. If they are paying a lot in server costs and electricity then they do typically rewrite to reduce the amount of servers.
For companies they primarily need to worry about running at a profit and getting to market quickly which dynamic languages do extremely well, and the costs in electricity and carbon aren't very high when your scale is small.
> running at a profit and getting to market quickly which dynamic languages do extremely well
This (or similar variants of this) is an assertion that's commonly made about dynamically-typed languages, but I don't think they hold any water.
Less readable code (due to the lack of types) makes it a lot harder to add new features, and harder to debug code as well.
Several years ago, I briefly worked on a fairly large codebase at a startup that was written in a Ruby on the backend, and CoffeeScript on the front-end. There were only around 60,000 active users. Yet, the dynamic typing made adding new features, or fixing a bug a truly painful and fragile experience. Needlessly painful, and slow. It literally reduced developer velocity.
I think once you cross a few hundred lines, dynamic typing becomes a handicap rather than an advantage.
All of this doesn't even touch on the energy use. Which I'll admit is irrelevant to most companies. Server costs even for popular web/SaaS/etc tech companies are often a tiny tiny fraction of overall cost, with most of the company's annual operating cost being employee salaries. (As I had stated earlier, I don't mind a language being slower - if it actually provided any advantages–like improved developer productivity, in exchange for that slowness.)
The main argument here was about the energy costs. But it sounds as though you don't like dynamically typed languages. That's fine.
Facebook, twitter and plenty of of billion dollar businesses were built with dynamic languages and many would argue they may not have even existed if they were written in staticly typed languages due to the slower up front time expenditure.
I like staticly types languages for large projects but enjoy thr development speed of dynamic ones. If you don't like dynamic that's completely fine
The only thing I'll concede is that using a statically-typed language without a good type inference system, might slow people down a tiny bit.
The serious downsides of dynamic typing means you might still win out (in terms of developer productivity, code readability, code reliability, and ease of debugging) even if you use a language like Java instead of PHP.
Stack overflow is miniscule compared to Twitter or Facebook. You can just book at Basecamp if you want to pick a smaller company. They are Ruby on Rails and they have been consistently doing well.
Ultimately billion dollar companies have been built on dynamic languages. There is nothing stopping you from succeeding with dynamic languages. There just isn't. There are tradeoffs and these companies made them.
I agree that there isn't much necessarily stopping a person succeeding due to the choice of language (unless it's some esoteric or otherwise unrealistic language).
Yahoo used C++ for instance (it would not be my choice, even though it's weakly statically-typed).
But, yea, like you said--there are tradeoffs.
I think dynamically-typed languages has the allure of letting you build quickly initially, but the downsides of dynamic typing start hitting pretty soon afterwrd.
I write a lot of small scripts in Python. It certainly is easier to throw something quickly together, especially when the data model is amorphous, with dynamic typing. But that doesn't mean I'd use Python for a large project.
Reading higher in the thread, some Elixir folks are saying that the techempower benchmarks used the wrong settings (debug mode, etc) for their Elixir benchmark.
The fastest Elixir framework on the list, phoenix, is about as fast as uvicorn. (Both around 10 times slower than dragon.)
I was mostly responding directly to these 2 statements from my parent comment:
> They did savings by re-implementing their services and attribute those savings to the new tool / programming language.
> I wonder what the saving would look like if they chose another tool for the second / optimized system. I doubt it would differ much if they went with Go, Java or stayed with Python.
I think second syndrome is probably a significant factor, but I can also believe that ditching Python was also a significant factor.
EDIT: I’m being rate limited because I guess my comments are too spicy for the HN mods, but anyway I agree that there’s no reason other non-Python languages would fare much worse than Elixir.
There is a reason: the BEAM is almost not prone to huge GC pauses. Bigger load results in every actor responding very slightly slower. Nothing else.
Many other systems don't have this property. They fall down under pressure.
Gosh, a huge chunk of HN is always so dismissive. At least read up a bit beforehand, man. The criticisms should be informed and benefit the readers, not only express a generic skepticism.
The beam just garbage collects each actor separately, and it so happens that much of the time your actor has finished before a gc happened so you never see the cleanup.
The beam also has a spectacular failure mode: OOM whenever messages come in at a higher rate than they are processed. The lack of backpressure mechanisms mean a huge amount of beam language developers spend way too much time recreating their own way for dealing with this or pretend it is not a problem at all. This means too many libraries in the ecosystem behave totally differently under load.
I can see how you would think that, yes. In practice I haven't noticed it except in super rare cases where processes (actors) hold on to huge binaries / strings -- which is one of the weak points of BEAM's GC.
I've been bit by this. In reality you need to know your shit when it comes to tuning the Beam and GC to achieve decent performance under load without triggering OOM.
Frankly I am seeing that as a myth, you seem to have made up your mind some time ago or judged by 1-2 occasions.
I am on ElixirForum every day and worked with Elixir for 7 years and have never seen anyone "perpetuate myths". I've seen some people willing to "increase adoption" which was always met with resistance by the wider community -- we believe growth should be organic.
Pretty sad stance from you though, I have no idea why people get so ticked off when another programmer wants to tell them about a secret weapon.
If you are not willing to try it, that's fair. Say that. Claiming you know stuff about the ecosystem while a guy who is there every day is not seeing that at all comes across as... strange. Biased. And not arguing in good faith. :(
> If you are not willing to try it, that's fair. Say that. Claiming you know stuff about the ecosystem while a guy who is there every day comes across as... strange. Biased. And not arguing in good faith. :(
I am not willing to drag others, such as those that wrote the repos, into a technical discussion with people out to act as you are.
The guy you are responding to was completely calm and reasonable. Didn't say anything attacking or otherwise. I'm not sure why you are seemingly trying to cast him (and the Beam) in such a bad light, with seemingly no reason to back it up.
Both of you are afflicted by that logical fallacy of failing to understand that you not encountering a phenomenon does not mean it does not happen or is rare, it just means you didn't encounter it.
If you try telling people that did encounter that phenomenon that in practice they wouldn't/didn't then you shouldn't be surprised if they question why they started talking to you in the first place.
> Do you genuinely fail to see how your behaviour proves my point?
I genuinely see only one thing: I asked you to elaborate but you are convinced that I am pretending to discuss while I, again genuinely, actually did want to discuss.
You asserting something about me, a person whose mind you cannot read is confusing and quite aggressive, in a very uncalled-for manner too. But as I said already -- have it your way, I disengaged because it became apparent you are not interested in discussing. OK. It's your right.
What's not OK is you claiming that I am not interested in discussing however, and I maintain that I was interested in discussing.
> You cannot just hound people with demands because you don’t like what they are saying.
1. I am not "hounding" you for anything, I asked a question.
2. You are again assuming my motivation and I assert that you have gotten it wrong. You that I "disliked what you said" is a borderline personal attack and an off-topic. I was confused why you claimed what you did and wanted you to elaborate, to find out what made you think like that and if I can change your mind with a few anecdotes and some facts (that are hard to look up because they require scanning a forum; yet they are there and are visible to everyone who engages with the platform).
BTW, if you really have known anything at all about the Elixir ecosystem you would know that its creator, to this day, engages with users on ElixirForum and asks for their feedback on what they find lacking. That sort of engagement and genuine discussion spirit that you claim I (as a part of the Elixir community) don't have.
That alone invalidates your point entirely.
I am disengaging second and final time, let future readers decide for themselves.
Calling someone “biased” and “acting in bad faith” is a personal attack and violates this site’s rules. People get rate limited for far less on this site.
I'd argue taking things out of context and deliberately painting the commenter in a bad light is not a nice forum discourse.
I said, very plainly and visibly, that my parent commenter's unwillingness to back up negative claims COMES ACROSS as biased and ARGUING (NOT "acting") in bad faith.
Come on now, this stuff is not hard, the message is literally up there. Not sure why you had to editorialize it and thus misconstrue it?
As I plainly explained, right at the top, it is not theoretical. There is no point engaging people with evidence if they are so dismissive of basic facts.
But then that is also true here. Your claims about him not saying what he plainly did are just bizarre.
TBH at the load we had we got substantial savings by eventually replacing usage of gen_server as well, though that probably isn’t a good idea much of the time.
The OOMs were largely being caused by calls to and from other services (i.e. kafka) so the answer proved to be in controlling the rate at which things come in and out at the very edge.
From what I saw I got the impression the Beam devs assumed memory and CPU usage go together so a system that is under load memory wise would also be CPU wise, but this isn’t the case if your fan out and gather involves holding large* values on which the response is based, even if for tiny amounts of time.
EDIT: *large meaning "surprisingly small" if you're coming from other universes.
The underlying problem we had* was the rate work was being completed was lower than the rate requests were coming in which causes the mailboxes on the actors to grow indefinitely.
In golang the approximate equivalent is a buffered channel that would start blocking because it has run out, but the beam will just keep those messages piling on to the queue as fast as possible until OOM. This is obviously a philosophical tradeoff.
* I should qualify that each request here had a life in low double digit ms, but there were millions of them per second, and these were very big machines.
Why weren't your processes dropping messages? Also I think you can tell the VM to not allow the process to exceed a certain message size and trigger some sort of rate limiting or scaling out
Edit: huh. I could swear the VM had memory limit options. Guess not. Time to rewrite it in zig!
Yeah, I think that's the assumption people had been operating under.
That team would have thoroughly endorsed a zig rewrite! It was a very odd situation where most of us liked erlang the language but found the beam to be an annoying beast, whereas most of the world seems to be the opposite.
That does not in any way explain a drop of 95%, which IMO is ridiculous and points to other issues.
The system they created now is totally different from the one they had. It’s more efficient by an insane margin. Choice of language seems like it would have trouble breaking the top 5 major reasons.
I migrated Rails apps to Elixir before, we reduced from 15 servers to 3, and 1 was basically "if crap hits the fan", we could have gotten away with 2 easily.
It's worrying that a supposedly high-quality forum like HN receives comments with no substance. If you have an actual counter-argument, let's discuss. If not, well, not an interesting exchange.
You keep insisting HN is not providing comments up to your standards. Let’s not go there OK? I just disagree with your analysis. I think it is too simplistic to state GCs cause this and Elixir somehow magically causes 95% efficiency boosts.
All I’m saying is that if you can drop 1330 servers just like that, there might be something more going on than Python’s slowness.
This is from experience. I have seen people create slow and fast systems with just about any tech. I can make Elixir crawl, I can assure you of that.
I have seen Python apps use 10 servers and reduced it to one as well. Same tech, just a more efficient mindset. It’s IMO a bit too simplistic to say systems with GCs fall over when under load.
Sure, if you want to expand the discussion to "everyone can make every tech stack act badly" then you might have had an argument. I don't find that argument compelling however -- it's borderline meaningless.
Also nobody used the word "magically" before you did. Note that.
What's your argument exactly? That Elixir is overrated? Or something else?
Furthermore, I am not insisting on my standards of the quality of comments. I am under the impression that's the expected quality of comments on HN at large.
Thank you for recognizing that we were going nowhere. Apologies if my tone was sharp.
I am not evangelizing tech -- I am a polyglot and I use what I find is best suited for a job, and Elixir happens to cover quite a lot of ground. That's all really. I also use Rust and Golang quite a bit.
I simply get ticked off when people start demeaning something without seriously working with it or even reading a bit beforehand. Sorry if I mistakenly put you in that group.
You’re comparing Elixir to Python and Rails. Many of us have seen Python replaced with other languages for an astronomical improvement. Python and Ruby are the slowest category of languages; they’re easily beaten and you need to offer some evidence as to why the improvement was derived from migrating to Elixir specifically rather than moving away from Python/Rails.
I'm not dismissive of Elixir, I'm dismissive that Elixir magically solved this problem in a way other languages couldn't. If you have some supporting evidence or rationale as to why Elixir is uniquely able to solve this problem, I'm happy to hear it, but so far you've offered up "low latency GC" which isn't unique to Elixir and itself doesn't adequately explain the degree of improvement over Python (GC latency alone doesn't reduce from 200 servers to 4). Again, I'm happy to entertain arguments about why Elixir is uniquely able to improve performance, but I'm not going to take it on faith (which you interpret as 'dismissive of Elixir').
Okay, how about "a runtime that has been extremely carefully crafted for the lowest possible latency all the way to the point of the hardware falling over"?
It's very hard to provide evidence unless we make a screen-share call where I show you real time dashboards of services being bombarded with thousands of requests per second and for you to see for yourself how the median latency numbers climb from 25ms to 45ms and then fall back to 20-30ms after the burst load subsides.
I find it difficult to just describe this because as much as I've seen it many times in practice, it's also practically impossible (NDAs and compliance nightmares) to demonstrate it to a programmer outside the companies I've worked with without violating all sorts of laws. :(
But yes, basically: a super latency optimized runtime, a GC that's not very sophisticated but it elegantly dodges most GC problems by simply releasing all memory linked to an actor as soon as it quits (and Erlang/Elixir encourage you to spawn many of those in the right conditions; not for every single thing though), and one of the very fastest dynamic languages in the world, probably second only to JS's V8.
All of that is combined with me working with several other programming languages and their hosting solutions which were tripping over themselves when 1000 req/s started coming in (looking at you, Ruby 3.X and Puma and a few other servers; or PHP, or Python).
TL;DR: reliability is much better, latency is predictable.
Weirdest thing is: people don't believe it. If you only knew the CTOs I worked with: they were extremely data-driven and they would not allow me or anyone to just pull all of that out of their bottom. All had to be proven with numbers, and me and my teams did that, many times.
I understand the skepticism somewhat, but you and a few others seem to look at Elixir through the lenses of "too good to be true", and IMO you should try relaxing that skepticism to some extent. And try to be little more sympathetic because again, I literally cannot give you the hard cold data without violating at least three laws.
Go also doesn’t have huge GC pauses (and moreover idiomatic Go generates very little garbage)and I have a hard time seeing how GC pauses would contribute so significantly. Java also allegedly has a very low latency collector.
Apologies, I was only responding to a single point which was meant to counter another. I am well aware that Golang's GC is world-class and it's my second most loved language after Elixir.
If and that's a very, very, very big if, the current open GC are really leading to to much pauses. Then you can go to Azul and buy a better VM and GC, further improving the performance compared to BEAM.
Yep, agreed, I am just listing possibilities. More often than not a performance loss in Erlang/Elixir is caused by GC pauses but you can do a lot to reduce or outright eliminate those.
You need to advance a theory about how the BEAM could make a 95% improvement in a way the alternatives cannot.
For example, a Go has a very good asynchronous story as well and while it may not be exactly as good as BEAM languages, it makes up for it in straight line execution performance.
I’ve personally rewritten a few carefully optimized Python applications in pretty naive Go (without significant rearchitecting) and witnessed 50X-1000X performance improvements. And moreover Go allows for more optimization beyond what Python allows (e.g., consolidating allocations and lifting them out of the hot path).
While you are probably right it's also true that python is just straight up slow - especially in normal configurations (django, flask, uwsgi, lambdas, etc) and elixir is pretty fast while offering a great/fun/friendly dev experience + BEAM-HA/scaling-benefits.
They could have also blown everything out of the water with C++ - or probably even golang - but if elixir can do it on 2-4 boxes it's fast enough.
Python is 100x slower than a real language (I like python and use it often, not meant as a dig at the language just stating facts)
My favorite thing about Python is writing prototypes. The big risk with writing a prototype is that it survives into production. Using Python ensures that the code will be replaced by real code
Python is great for other non-production code like Jupyter notebooks, numpy experiments, etc
How is calling Python "not a real language" just stating facts?
I'm not sure by which metric we are judging whether a language is real or not, but I'm fairly sure that almost anything anyone comes up with will include Python, considering it's one of the most used languages in the world at this point, and used for a fairly large variety of use cases.
Typescript/JavaScript has many of these properties except for GIL and maybe not typically interpreted. That means not a real language? Ruby also has a GIL.
I don’t even know what “proper package management” means in this context. Certainly there’s an official package management system and modules. C++ doesn’t have the latter (very fragmented user efforts and not commonly used in my experience) and barely the former (no adoption). So that means C++ isn’t a real language?
Python does have type checking by the way. Same as TypeScript - you annotate your types and you can run a program to verify your annotations. This is basically how TypeScript works although the typing in the latter is more mature by way of @types packages to let the community supplement adding typing information to third party packages. There’s no relation between the two so it’s not sound (in both cases), but in practice it’s quite useful.
Easy to learn hard to master syndrome. Your comment says nothing.
* GIL - Your use case more than likely reimplements one wheel or another. Whatever compute you're doing should be deferred to the right tool (DB, queue, etc). Otherwise, if it's I/O, you're on par with Go and NodeJS (See FastAPI 3rd party quarterly benchmarks as an example)
* No sound typing - I don't know what this means. If you're concerned with typing, you can use pydantic for highly performant type checking and input/output validation.
* No proper package management - Poetry is excellent, been around a long time and is the unwritten defacto tool. Pipenv imo is close second. This argument feels forced as no one argued Go isn't a language before gomod was solidified. The community was fragmented and people wrote/chose their own tool.
* Runtime type checking only - In a world of interprocess communication, I don't see how this is relevant. You're not using Pytyon to write firmware. If you're not writing tests and just depend on successful compilation, you're writing bad code. Tests not only cover your point but are an excellent self doc. An added value, imo, that isn't spoken about enough. Regardless, tests cover this.
* Interpreted - A good last point to nail in the "your comment says nothing". What does this imply? That it's easier to debug with an interpreter? That cold starts are slow and your design is flawed given the tools?
Anyway, with C and Go bindings, most arguments against Python fall short. It has its place, yes, but a much wider one than the bandwagon regurgitates.
It's really strange to me how people continue to build services using python, knowing it is 100 times slower than appropriate languages, and then get surprised by it being 100 times slower, so they eventually rewrite it.
It’s extremely rare for a system to be 10x slower, much less 100x, and developer productivity is huge. When you see huge numbers being tossed around for an entire system, they almost always mean “our first architecture wasn’t right for the problem” and the question to ask is how much time it would have taken the same team to discover the correct shape of the problem with the other candidates.
I think it is a lack of know-how. Most businesses/managements are not capable of making the decision to go for an ecosystem like Elixir, because they either don't even know it exists (that is also true for many devs) or they do not dare to do anything non-conventional or non-mainstream, or they have the wrong impression, that the "programming language does not matter". (Well, it does! Since it connects you to an ecosystem that comes with it and its language design choices influence how easily you can do things ...)
So then Python comes along and you find loads of devs for that. Once Python is entrenched, businesses have a hard time telling their devs to actually learn something new. And few devs will already explore things like Elixir on their own in their free time. And so they continue to hire Python devs.
(One could also replace "Python" with "Java" or "NodeJS" or similar, the principles remain the same.)
I think it's a generally practiced strategy at this point to spew out blog dev posts on company blogs to build SEO or act as an ad, regardless of quality.
The problem with Elixir is that it is such a foreign language to most of the junior developers and a radical shift to dig into coming from object-oriented and other higher-level languages. This is worsened by the fact that there are not many jobs for Elixir in addition to Development Tooling, IDE Support.
To someone who starts their job on an Elixir codebase, it is just not a smooth onboarding at all. While the performance aspect is unparalleled compared to most of the popular scripting languages in the last decade, the price to pay to settle into Elixir seems huge to me.
sounds like a feature not a bug. Poor onboarding is a cultural problem in my experience, not really a technology one.
When you get junior (or even non junior) developers onboarded in a new language, you have a unique opportunity to break them of bad habits and expand horizons.
Yes, there is a cost to it as it extends in the short term the time it takes to get developers ramped up, however the long tail payoff is huge
Anything the business can control for: architectural designs, server costs, approaches to building out features / services for the business etc.
When you can mold someone's experience via a new language to model a domain, they become very efficient to it, when they have no prior notions to fall back on.
How many times have developers gone down the wrong path because of X did it this way? type thinking. When you can sufficiently remove that so all that is left to think about is the problem space, you do make more gains around that problem space.
My thesis from (albeit anecdotal) experience, is that when you have developers working in a new paradigm (often, this corresponds with a new language) you have better chances at establishing these things than having to consistently try and override a developers prior notions about how something should work / look.
The trade off is higher ramp times and slower on-boarding, of course. In the short term, it can be more costly.
We just did a Prometheus migration that I suspect will take us 5 years to break even on the development effort investment. And I'm not even counting opportunity costs, which were immeasurable.
I like Elixir and I want it to do well, but bad articles make that harder, not easier.
I wonder how much of their traffic is angry people who ended up there by mistake. They could save a lot more than $2M if they just set robots.txt to disallow everything.
Maybe to you, but my wife uses Pinterest a ton. It is where a lot of the women I know go to for ideas for just about everything, from house decor to even the ideas for our wedding.
Pinterest is useful just not for us nerdy guys. I am not sure why Google keeps it though or how they benefit, unless Pinterest uses AdSense exclusively, then one can determine that its some sort of partnership. You would think Google would be smarter about who to send over to Pinterest if thats the case.
They are referring to how they link out their images and SEO content that ultimately one will find unusable & frustrating.
I have experienced this many times when looking for a specific image on Google Images, I will click on the link that goes to Pinterest only to find the image is not there.
IMO this is more a Google problem than a Pinterest problem. Google does the same thing with text search (returning results that have few if any of your search terms), possibly over relying on opengraph tags and other such metadata.
...and yet whenever Google deprioritizes over emphasis on SEO, Pinterest never seems to go away... Is it some sort of partnership they have that keeps Pinterest so prevalent? Or did Google one day just decide to promote Pinterest more, I'm sure they promote Facebook and other social media site results too.
I concur. It's such a pain to go back and requote random words in my query on every search because Google decided it knew what I wanted better than I do.
As someone who uses Pinterest for its intended purpose, like home decor inspiration, fashion ideas, etc.. It’s actually very nice!
It’s structured like a moodboard, a format that fits me nicely whenever I‘m looking for ideas…
However! Even though I enjoy and use Pinterest, it’s never what I want to see in Google Images. If I need to find the artist of a drawing for example, Pinterest is often spammed to me in the results, often with a link leading to absolutely nowhere. Usually the Pin is locked behind a login, which is annoying since I only use the app on my phone.
There’s tons of reasons to dislike Pinterest on Google, it’s not just a dichotomy between nerds and women (not to mention that these groups have overlap anyway, of course).
Or allow convenient usage without registration. It seems whatever I look for mostly is there, I just go away because I don't want to sign-up&in nor be tracked. I wouldn't realy mind if there were some reasonable non-intrusive content-relevant ads though.
I probably might even sign-up some day if I weren't repelled by this being required every time I come. I even stopped reading Quora, and, most recently, Twitter because of this - they started requiring signing-in while I don't want to stay signed in and be tracked even though I actually have respective accounts.
It earns them money, and people aren't using competitors; Google has seemed to give up on the quality of results a long time ago in favor of plain volume and whatever numbers they use to measure success.
I was never sure where all the hate is coming from. Pinterest is actually one of my favourite apps, that I use regularly – and I use very few apps. I actually really love it and after years I still find new uses for it. As far as I can tell, they have a pretty loyal user base.
google search is worthless now, not just for images. i only ever use it with the `site:` setting or the research subsomain (these are rhe only ones that seem to have retained a functional version of google's algorithm). i use pinterest itself for general image discoverability and it works really well.
The hate comes from them hiding where the original content is from. They hijack search results and then pull you into an endless circle of harvested content, making it difficult to find the link to the original source of the material.
this is not true? every single pin has a big button underneath which takes you to the original webpage where it was saved from. and if it was manually uploaded by the user and lacks this kind of metadata, you can scan it visually and it shows you all the other uploads on pinterest which match it approximately, and usually one of the matches has better metadata.
this has been inconsistent over the years. maybe they 'fixed' it but it wasn't not the case before. they were extremely parasitical. you'd click the button and it would just take you to more pins
that only happens when the image was manually uploaded by a user (i.e from gallery). there's not much they can do about it. in that case you can scan it to find visual matches and hopefully one of them has metadata about the original source (but it's not perfect and you have to scroll a bit).
I'll take your word for it, you're welcome to it, but I'd rather it just didn't show up so high in search results.
Oh, and if it could stop recommending pro-anorexic content to my teenager, I'd like that to. It's banned on the LAN but I can't enforce that elsewhere.
It's a dumpster fire of SEO dark patterns and unmoderated shitty content and shouldn't be ranked so high in searches. Google should have routed around them years ago.
if the origin link isn't working, that means it was taken offline by the original source. in that case, imo it's a good thing that at least the image was archived on their servers. especially since each one has a comments section where someone might post information about it.
I don’t know about Elixir specifically, but python is slllllooooowwww. If the operation is CPU bound, you can easily get a 100X performance improvement by rewriting carefully optimized Python in naive Go, Java, Rust, C#, etc. And if you make an optimization pass on that you can usually eke out another 10X.
Even on I/O bound operations, in Python you have to choose between the error-prone async framework if you want to improve resource utilization or you stick to the synchronous world and accept extremely low resource saturation.
Either way, I can entirely believe that another language would beat Python on both counts. I’ve seen similar results rewriting a Python system in Go with extremely minimal rearchitecting.
The silliest thing is that the title credits the improvement to moving toward Elixir rather than moving away from Python (or maybe their case really is ideal for the BEAM VM and wouldn’t translate easily to, say, Go’s runtime model although I doubt it).
For sure; Elixir comes with a whole new architecture as well, and they COULD have gained a significant performance improvement if they rewrote / re-architected it in Python.
However, could they have done a 50x performance improvement in Python? And what about the other numbers, like speed and concurrency?
That said, I'm confident they crunched the numbers and did the tradeoffs; after all, adding another language and/or architecture will make your company more complex, makes hiring more complex.
Depends. Were they already using gunicorn? What about cPython? Was the code written by a junior developer without much consideration for memorization and dictionary access?
And sure, you can get some performance improvements by rewriting things in those languages, at the expense of losing the entire python open source environment.
So I would need way more information about the previous system to take this even remotely seriously.
Python scales quite reasonably for most small to medium companies.
Anecdotally, I am beginning to hear more and more about organisations moving away from high level cloud infrastructure (such as lambda and cloud gateway) and going back to plain old virtual servers (like EC2), or even on-prem. Often the cost of supposedly "cheap" cloud environments is WAY more than you might expect and all booked as operational rather than capital expenditure (the latter being often preferable to shareholders).
My previous company went from very expensive cloud CI/CD servers to on-prem off-the-shelf servers.
The cost and the incredible performance gains we got by moving to a bunch of local computers was enough to make the whole thing pay for itself in about two months. Yep, physical computers costed less than two months of cloud. Plus the gains in productivity from having to wait minutes instead of hours.
Maintenance was never a problem, and we didn't need to hire new people to take extra care of the servers.
My current company is thinking of doing the same for AI servers. It's just too expensive in AWS.
I'm not sure why that would be too surprising to you. In the enterprise organization I've worked with over the past few decades IT strategy is always long term and always cost based. 10-15 years ago organizations moved from the basement to placing their owned hardware in rented racks in data centers that were run by 3'rd party organization because it was cheaper. Then Azure came along and made it sort of a "no-brainer" to move into Azure because you already had a lot of Microsoft products and Azure was cheap. Now with so many Azure price hikes and those 3'rd party data centers improving their business models, the pendulum is swinging away from Azure.
That doesn't mean that the move into Azure wasn't the right one at the time, or that it was more expensive than not going into Azure. It's simply that the market evolves.
I wonder if this will be another effect of higher-than-0 interest rates. With tech companies choosing to leave cloud service providers in order to reduce their costs.
To be honest the world at large doesn't have an anorexia problem but an obesity epidemy.
50 millions children under the age of 5 are already obese (and I think 93% of obese children shall stay obese their entire life).
So... Anorexia may be a hill to die on and it's not fun for parents of anorexic kids but priorities, priorities and priorities.
The real eating disorder worldwide leads to obesity, not anorexia and I wish all the hate and energy spent thin-shaming was instead redirected towards fighting the more important eating disorder.
The numbers from the WHO here are scary and only ever growing:
Obesity can take the form of an eating disorder, but I can tell you that teenage girls aren't "pinning" photos of morbidly obese people and ending up in a spiral that lands them in hospital.
But I hope parent commenter never has to live personally through finding out just how life destroying the predominant cultural attitudes around body form & food around us are. I wouldn't wish it on my worst enemy. Obesity is by far the least of concerns.
Honestly, most of the people I know who have serious problems with obesity have some kind of metabolic disorder. Diabetic or pre-diabetic.
And I think this has become far more common than people will admit. We're burning out our pancreases through processed & high sugar, low-GI foods because the industrial food supply is poorly managed and under-regulated.
But the broader point beyond my specific snipe -- which is personal and driven by deep anger over a serious problem affecting me personally -- the broader issue is not only that Pinterest is on the whole an anti-social actor because of its garbage moderation and promotion of harmful content in its algorithms, it is more broadly a bad actor in the web ecosystem generally.
They hijack search results by scraping and stealing content, hiding the original link, and try to trap you in their circle of links.
In this day and age of people getting sued and slapped down for legitimate webscraping, it boggles my mind that Pinterest gets away with what they do.
On the whole a swamp of an unethical company. F*ck pinterest.
(But love that the commenter couldn't help but turn the thread into an underhanded comment against "fat people". Oh so typical.)
If anyone is struggling with analysis paralysis, remember it is ok to do things wrong the first time because then you can farm that sweet internet karma bragging about how you fixed your crappy first iteration.
If you always do things right the first time, you don't get to brag about putting out the fire your started.
It’s interesting because while this is an extreme case of performance improvement, the ROI doesn’t seem amazing.
"rewriting in another language reduced the number of servers by 95%" is hard to beat, but at the same time, this saves "only" 2m a year, or about 0.3% of FY22 cost of revenue (per another post)
Pinterest per employee revenue seems to be around 1m, which basically suggest that this could even be a worse than average allocation of resources.
My takeaway would be "don’t bother with this kind of optimisation before you reach a scale where you can afford to do marginal improvements"
Can switching to an esoteric functional language from Python and Java really be considered reducing complexity? No matter how well it is written, I'm willing to bet that way fewer people in the company/industry understand the new codebase and can make changes to it.
Actually it does because the chance of hitting strange edge cases grow significantly, as do the runtime of your deployment, plus networking problems risks.
Running 200 servers will force you to automate everything and also handle (relatively) rare edge cases, meaning added complexity, huge up front dev costs, and a continuous dev effort just to stay afloat. Work has to be balanced in a clever way, making tradeoffs that you otherwise wouldn't need to make. The mental model shifts from individual servers to a whole ecosystem that shows emergent behavior (= things break in a totally new way).
Maybe some of that also applies to 4 servers, but to a way lesser extent.
> Steve: We chose Elixir because we were looking for a system that was easy for programmers to understand and could take better advantage of our servers. I was intrigued at Elixir’s combination of friendly syntax, powerful metaprogramming features, and incorporation of the Actor model.
Am I wrong or is this guy in some sort of bubble where only functional languages are taught?
Is it though? I've been using dynamic langs my entire career, and static ones too!
I feel like statically typed langs, (I'm looking at you Java), are a bit more stubborn to work with, especially around API design, prototyping, greenfield stuff.
I do like langs like Haskell and Standard ML where they are statically typed, the the type system is mathematically sounds, and the types are inferred.
I want my type system to be inferred and bomb proof or to just get out of the way.
Its the best of both world.
* Reading and understanding code is much easier because the types are written down. You spend much less time figuring out what variables can contain.
* Navigating code is much faster because tools like go-to-definition, code completion and find-all-references work reliably
* Refactoring code is a lot easier - or in large projects actually tractable. In large dynamically typed projects something as simple as renaming a variable can be an impossible task.
* Obviously the one people talk about most is catching bugs. The degree it does that depends on how strong the typing is (e.g. Rust will catch many more bugs than Java). But they will all catch the embarrassing things that dynamic languages can't like typos and missing arguments.
If you've only ever written short greenfield projects you might not appreciate some of these benefits as much as you should because you wrote all the code yourself so all the details are still in your head. It's a bit like saying seatbelts are an unnecessary pain because you haven't ever been in a crash.
> types are inferred
Some local type inference is good, but Haskell / ML style global type inference is kind of the worse of both worlds. You have to satisfy the type checker, which is harder because global solver errors are always harder, and you don't get the documentation benefits of static types because the inferred type is frequently a generic type.
Rust went with local type inference only for very good reasons.
I want to point out goto definitions do work for a number of dynamic langs.
This is all preference. Ive also worked on ruby on rails projects that are 15 years old with 500K LoC
i admit changing stuff can be pretty sketchy if your project doesn't have great test coverage. but that all ultimately comes down to the culture of the programmers working on that code base.
I think things like dynamic/ vs weak typing or functional vs imperative have a much greater impact on code quality/ease of coding than that of static vs dynamic.
I personally think programming in Java, c# is painful, but a lang like Crystal or Standard ML to be very pleasant. and vice versa, i think vanilla javascript is painful for all the reasons you mentioned but langs like Ruby, Erlang, or Elixir to be very pleasant.
“Some languages have a greater association with defects than others, although the effect is small.” Languages associated with fewer bugs were TypeScript , Clojure , Haskell , Ruby , and Scala ; while C , C++ , Objective-C , JavaScript , PHP , and Python were associated with more bugs.
> I want to point out goto definitions do work for a number of dynamic langs.
Not reliably. It can work in a small subset of situations. For statically typed languages it always works.
> “Some languages have a greater association with defects than others, although the effect is small.” Languages associated with fewer bugs were TypeScript , Clojure , Haskell , Ruby , and Scala ; while C , C++ , Objective-C , JavaScript , PHP , and Python were associated with more bugs.
This is mixing up too many things. For example C++ has a relatively decent static type system, but obviously it's going to have way more defects than memory safe languages.
Totally hear you on the static typing benefits, but let's zoom out a bit. As I mentioned before, I've got experience with both dynamic and static languages, and I think we're missing some nuance here. Specifically, I wanna bring functional vs. imperative and strong vs. weak typing into the mix.
JavaScript's weak typing does it no favors, agreed. But that's not a universal dynamic language issue. Ruby, for example, doesn't have those type coercion headaches.
Now, about Haskell and Standard ML—these guys offer a different flavor of static typing. It's not the Java-esque rigidity; it's more flexible and, dare I say, enjoyable.
On the tooling front, I've seen dynamic languages with solid IDE support and go-to-definition features. It's not a static-only perk; it's about the ecosystem's maturity.
That study is a neat data point, but it's not the whole picture. We should consider multiple variables like paradigms and type strengths, not just the static vs. dynamic lens.
I do also want to say in the defense of Elixir (and Erlang), with the advent of the dialyzer lib, it's a gradually typed language, and soon to be (fingers crossed,) a lang with a pretty unique type system. It will be both dynamic but have the same guarantees as a statically typed lang.
I think Elixir is closer to an easier to read Lisp than to Haskell. In fact if someone claimed to know Rust, but couldn't figure out Elixir then I'd simply assume they were lying about their Rust experience.
Monads and Haskell is not a requirement to understand a functional programming language.
Also, I find it hard to believe that anyone who knows N+1 programming languages would have a hard time understanding Elixir quickly, it looks like most mainstream programming languages used today, with slightly different syntax for some things.
As much as I love Elixir, it's really not just "basically Ruby with some slight modifications". The syntax is similar at a glance, but there are some major design differences (e.g. immutability); solving the same problem in Elixir and Ruby can require a completely different structure to your code.
No monads though. No pure functions, no Option types, pretty much no recursion in practice. Really it’s got none of the stuff that makes some FP languages hard to learn. (Except maybe “no for loops”).
That's true - Elixir is simpler to learn than, say, Haskell. All I'm saying is that it's not a trivial jump from Elixir to Ruby, despite the superficially similar syntax.
I don’t think eg ruby->go is a bigger jump. Learn defer, structural typing, interfaces vs learn to replace all loops with map/filter/etc + pattern matching. Feels similar, no?
> it's basically Ruby with some slight modifications.
It /looks/ like Ruby in a lot of ways, but treating it like Ruby w/ modifications is a mistake that I've seen lead to a lot of really bad usage that (at best) fails to take advantage of the underlying Beam VM, and at worst [and more commonly] actively works against it.
I picked it up pretty quickly as primarily a JS dev at the time.
I’ve also worked on several projects where I’ve pair programmed with a dev new to the language. They’re usually pretty productive within a few days. Not writing their own DSLs or going deep into OTP, but productive in terms of writing application code.
Phoenix is pretty straightforward for those who have experience with Rails, Laravel or a similar full-stack web framework.
Rust screams Erlang influence, but maybe it's just me. IIRC, in the beginning they even tried to put in a green threading VM for tail calls, which seems insane for something that you want to be a systems language.
I bet it was an architecture problem more than a language problem.
I’d be expecting to leave most of that in place and fix the hotspots that are dragging performance down. Maybe some compiled code in certain places.
Almost certainly a detailed analysis would have yielded 20 things that could be tuned.
Posts like this are almost something to be ashamed of “we rewrote an entire subsystem because we wanted to use a language we like”. That’s really failing in your responsibility to the company.
> Posts like this are almost something to be ashamed of “we rewrote an entire subsystem because we wanted to use a language we like”. That’s really failing in your responsibility to the company.
I would tend to agree. I don't understand these sorts of decisions to rewrite huge sites in niche languages and frameworks. You tend to get one engineer who really likes some language, has some clout, and is able to pitch it as a good idea. And then the blog posts about how it saves the company untold millions, while hiding all of the associated costs.
Not only is it niche, so your costs going forward will be higher, it even required investing in new "tools and strategies" just to bechmark it.
"Elixir has proven so efficient that testing the limits of our services became a challenge unto itself, requiring investment in new benchmarking tools and strategies".
I wouldn't personally gamble on a robust job market for Elixir developers in a decade. And those there are will be able to command higher wages.
Sure, they will be out there. But so were COBOL developers for Y2K.
And they got a 50x reduction in server cost - 200 servers down to 4. I have never heard of a performance optimization project with such substantial gains without a complete rewrite.
It's important to remember the BEAM is a monster. I'll believe almost anything after I've seen big projects run on such small resources. I've been busy believing anything about BEAM since Whatsapp got so big on nothing, never mind Discord.
It's a vendor post, written to sell you something. Accordingly, it lacks any nuance or perspective and makes claims it doesn't even attempt to back up.
It’s hacker news, hooray! Personally I don’t mind the skepticism. I glean insight from it that I would not have on my own. Rarely it’s downright malevolent, but the tone is usually about the same.
I agree with the skeptics about this article, it’s a flashy headline that can’t be accurate, per the points being raised.
Yeah. There is a phase of discovery - we simply don't know what (some combination of) the project goals, the user needs, the market, the business model and the architectures possible. By doing something we find out. often our guesses are good. Often the company politics mean we build some atrocious monster. The point here is we are elimating what's wrong (the choices are /right/ not (provably) wrong/ (provably) wrong/ and not even wrong (as a famous physicist would mark his students)
Once this is done (maybe proof of concept, maybe research maybe years of fighting the politics) we can build out something that's not wrong.
If we time it well and get it right it's a unicorn. It's Facebook. if we time it right but get it merely not wrong, it's friendster or myspace.
And of course by the time we have learnt all that, mobile hits and we have to relearn
I mean you're not wrong; they built working software in Python, solved a problem, scaled it, and learned about the problem it solved. Once they understood it, they could monitor it. Once they had numbers, they could look into improving it, taking their learnings on the problem and picking a different language / architecture to solve it.
I think they stayed on AWS to be fair. But also to be fair they were probably doing something dumb, and not doing that dumb thing is saving them $2M, and at the same time they move to Elixir. OTOH to be fair to Elixir maybe it made it easier to do that refactor compared to say Actor models in Java or whatever.
Does anyone has any followup information about the companies and people mentioned? Sadly last blog from Cory O'Daniel is from 2019 and Bleacher Report engineering blog post shows 2019 as the last post. Pinterest has 81 public repos on GitHub but only 3 with Elixir code base.
Elixir still gets limited use at Pinterest. I think there's only one significant system still actively developed in Elixir, though it's pretty non-trivial both in terms of cost and complexity.
Yep. Massdriver [0] is built on elixir and golang.
All user facing APIs and cloud provisioning state is written in elixir and runs on pretty much just fumes. Cloud systems interaction is written in golang because cloud stuff.
Email is in profile if you’d like to chat. Always happy to talk elixir and golang
Couldn't they have just moved their python to Cython / Pyrex? Or changed their arch and kept the tech stack the same?
I wonder how much time and team hours were spent learning the new framework and it's caveats vs if they'd just spent a little time optimizing their existing stack.
All too often common with non-perf folks making sweeping changes without rigorously measuring along the way. I guess it makes for good headlines.
Agreed. I get triggered whenever I see a headline like "We moved from tech A to tech B and reduced/increased C by X%"
No, that's not what happened! What happened is you put more thought into it the second time.
I hate this management BS; if the project fails, blame the engineers, if the project succeeds, praise the tech... They just do everything to turn engineers into commodities and most of us just play along.
I wrote the lambda piece, you could say re-arch saved us the money (although we could only re-arch cheaply because we already had other massive scale serverful systems in place), but I don’t believe I could have made it as efficient and dependency free in any other language.
i had been developing in ruby for over a decade and about 5 years of Node exp at the time.
im not a language enthusiast and mostly write in go today.
Go is my current favourite as well. Heck, just replacing python with Go would have done similar numbers. The key is “replacing python” and “rearch”, Elixir was just the mechanism.
"The combined effect of better architecture and Elixir saved Pinterest..." I appreciate that they also call out better architecture here. People often read these kinds of headlines and think "Oh Elixir is better than python" or whatever.
I remember when twitter had fail whales constantly and they rewrote it from ruby to java I think. At the time everyone assumed it was all ruby's fault, and that might be partly true. But it's also true that the engineers who rebuilt it understood the problem much better now, and knew the major pain points. They also completely changed the architecture to be better suited to the problem. I submit that a lot of rewrites could happen in the same language and still have major gains.
All that being said Elixir is great, and particularly well suited to these kinds of problems.
>Bleacher Report, a top real time sports and media website, was hitting scaling problems due to their business success. With Elixir, they were able to 8x the average daily traffic load, going from 150 servers to only 5:
The site was originally written on the Ruby on Rails framework but Bleacher Report reached the point where they could no longer scale it, according to Dave Marks, BR’s senior engineering director.
And they said they could do it with 2. That is anywhere from 30x to 75x. Although I think the current work on Fibre and Async could have reduced it by 10x to 20x as well. Rails just need to adopt it. It is unfortunate they didn't try to optimise it but instead went with Elixir.
It's not unusual to gain 50-100x performance gains on code when moving from python to a compiled language without much refactoring. Dropping from 200 to 4 servers is well within the lines of performance gains one can expect.
Yes, those numbers are for code, network speeds are not directly impacted the same way. But again, does not sound unreasonable.
I'm glad to read this and I think these advantages have their place, but...
Startups rarely die because of server bill specifically. More often it's lack of PMF, slow iteration, expensive labor. $2M is pocket change for Pinterest, they probably spend more on office snacks, so even if you grow to that scale, the savings in this example are not exactly life-or-death.
What is life-or-death, however, is how easy it is to hire for, how many quality libraries (implementation speed) and how many library maintainers work on any given stack. If it's easier to hire for Python or even Java, I'll use one of those. For stability, we used Elixir's Oban library to schedule jobs for financial transactions, and a bug in Oban regularly crippled our card authorizer, leading to our customers unable to use their cards and million-dollar losses. Oban, with all due respect, has probably 10x fewer maintainers than comparable solutions/libraries in other stacks. Maybe if we had a "true Elixir guru" on our team, we could have fixed it or even rewritten it from scratch, but we live in the real world, so we ripped it out, replaced it with a more boring solution and are much happier for it, much fewer after-hours pager duty panic attacks. Is it hitting the servers harder? Maybe, I don't care, it just works and the cost difference is likely negligible.
The bigger cost is that it takes a lot more devs to do the same thing in Python or especially in Java. Productivity per developer is hands-down Elixir's strongest selling point, not reducing server costs.
Python and Java have a multitude of web framework authors and yet none have made anything with the capabilities and ergonomics of Phoenix. I don't think they will either. It would take features those languages lack.
> more devs to do the same thing in Python or especially in Java
i guess it totally depends on "the thing" in question, but do you have any references for that assertion? that common web framework-ish, orm-ish stuff takes a lot more devs in python and especially in java?
It takes very little effort to ask a question like this and an essentially unbounded amount of effort to answer it to the satisfaction of the asker, so please be understanding if this isn't as much depth as you're looking for. I'll answer in three ways:
The first reference for this assertion is anecdotal. I've been a professional dev for 13 years and grew up with family members and friends in the same line of work. I've seen and heard of many, many projects built in different languages and anecdote after anecdote has been in the direction of people getting more done more quickly in higher-level languages. Writing similar things in Elixir/Clojure/etc goes faster than it does in Ruby/Python/JavaScript, which goes faster than using Java/.Net, which goes faster than using C/Fortran/Pascal. You haven't been around for those anecdotes I've heard, but a public one you may be familiar with is Discord. They scaled to over 5 million concurrent users on just 7 server engineers IIRC. Two years later, at a much larger scale, they still only had 4 engineers focused on infrastructure and 40 total: https://news.ycombinator.com/item?id=19238221. How many cases can you think of where a team of 7 Java or Python devs built something equivalent?
Secondly, there's research. Google "Function Point Metrics". Most research on programming productivity is paywalled but I shared one paper that isn't on one of my first Elixir-focused YT videos 5 years ago: https://www.youtube.com/watch?v=1e2_NXLxi-E&t=412s. Obviously this isn't perfect since it doesn't include what is appropriate for what domains or how well they scale with project size. Still it's a useful data point, as are pure measures of expressiveness: https://redmonk.com/dberkholz/2013/03/25/programming-languag...
Finally, thinking from first principles, why would anyone be adopting newer languages if they didn't offer some advantages over older ones? Why would certain features, such as pattern-matching and macros regularly be adopted by languages of the past 15 years, despite being very rare in languages created in the 90s? Furthermore, why would startups—for which productivity is a matter of survival—be early adopters? The simplest explanation, in my opinion, is that there really are some productivity advantages and sometimes those advantages are enough to overwhelm the difficulty in learning something new and working in a smaller ecosystem.
Guys, Pinterest hasn't used Elixir at all for more than half a decade. I think there was a single rate limiting service that used it and I have no idea if that was swapped out or not.
I was there and it was not being used in new projects. Maybe that spam filtering service has existed for that long. I would definitely not say that pinterest has fully bought into using elixir because of cost savings which is what the article seems to be insinuating.
There was a point like 8 (ish?) years ago where Pinterest wanted to consolidate on a single backend language internally, after it became apparent Python was not suitable, and multiple contenders were being tried in different places. I think at that time Steve was advocating for Elixir to be it. However it ended up being Java.
Nowadays, kind of ironically after the consolidation, we have backend services in at least six languages I can think of. Which is just the reality of how things play out for a tech company doing acquisitions. But yeah we’re not making new ones in Elixir.
An even less significant rounding error; on the order of ~0.002% relative to AWS carbon footprint at best using FY23 forward net sales as a rough proxy.
Since there are really many companies and people using AWS, the effect would be noticeable if everyone managed such a decrease. Calling it "an even less significant rounding error" borders on bad faith.
> Since there are really many companies and people using AWS, the effect would be noticeable if everyone managed such a decrease.
Your condition is predicated on "everyone" at Pinterest revenue scale justifying the business case of throwing away their existing tech stack and adopting Elixir with an equivalent architecture---never mind that the article is coming from a company that sells Elixir consulting services---while major CSPs voluntarily bend over to notionally substantial revenue decline...and I'm bordering on bad faith??
At least I bothered to put a supporting numerical estimate on the hypothetical; you're just handwaving greenwashed bullshit with improbable, unsupported outcomes.
Author of the “Lambda” piece [0], all of the cost savings I discussed came from API Gateway alone. The lambdas were negligible.
That being said, I don’t think we could have processed the traffic more efficiently in another language.
We were processing 12M requests per hour. You could run the entire thing on 2 vCPU we “over provisioned the crap out of it” by running it on 4 pods w 2 vCPU as the upper request limit.
> Our service didn’t quite grow exponentially in use, but it did hockey stick. It went from free, to a few hundred bucks, to around $12,000 just for API Gateway. No Kinesis. No Lambda. Just API Gateway.
> A good part of this entire system still runs in Lambda, although it will be moving into Elixir over time to make it easier to reason about and develop on locally.
> What everyone should do is think about where your service is going, and can you afford those costs when you get there. If you don’t have a team of ops people and you aren’t familiar with serverful stuff, spending $30k/mo on HTTP requests might be cheaper than an ops team.
Can anyone comment on whether there are plans to exploit the new Linux iouring APIs in the BEAM runtime? For example can it help avoid having dedicated OS threads for file IO?
I use Go and it gives me a good sweet spot between cost and perf.
I’m nowhere close to performance requirements of Pinterest. But my auth service runs on a single 1vcpu, 512mb ram, and 10gb ssd. I use leveldb and swap on the 10gb. I’ve benchmarked it to handle 8-9k rps while delivering 150ms max response time. Not bad for a few bucks a month.
I'm running seven Elixir/Phoenix apps (one of these is umbrella with 12 apps) on shared CPU cheap VPS with 1gb memory. I actually ran out of disk space recently than anything else so far.
How many FTEs were required with elixir knowledge? This day and age it doesn’t take many engineers to burn up $2M in comp and ben. Not saying it’s a bad idea as the cost savings will accumulate over time, but it’s probably not pure savings.
Honestly sounds like correlation vs causation. I doubt doubt Elixir is amazing - it is. But maybe the old code was just written poorly from a performance perspective, and then they rewrote it, and now it's performant.
And to Java. That's more of a surprise. Most of the gains may have come from architecture changes though: "The combined effect of better architecture and Elixir saved Pinterest over $2 million per year in server costs".
Python is fine to build a first version; you should never pick a language or architecture because you THINK you may need its performance, that's cargo cult. Pinterest could not have predicted they needed 200 servers for this workload - and they probably didn't for a long time.
So the cost of 3-4 FTEs at a company that employs >4000. Cool.
And how much does the change cost them in terms of:
- Not being able to hire talent when they need because they don't know or aren't willing to work with Elixir.
- Spending extra time training engineers.
- Having to hire more senior engineers/those versed in functional programming vs generalists.
- The rewrite itself (I'm sure it didn't happen magically overnight).
Unless the number had a few more zeros at the end of it, there's no scenario where this project made real business sense vs being some principal engineer's vanity project. $2M is what a company like Pinterest spends on an off-site or a random exec's spot bonus. There are probably analytics dashboards that no one looks at that cost >$2M/yr to produce.
It wasn't too long ago that engineers had to walk over hot coals to get their company to use Python. All of the things you're saying were applicable to Python when the company "only" had programmers who knew Java.
Elixir is easier out of the box; Erlang, or rewriting small parts into 'pure' Erlang would be an option if needs be, but it's generally better to go with the easier of the two.
there's a joke that there reason there's so few jobs is 1) they are very high paying, and that the engineers are so productive they only need a few engineers to get work done.
Big corporations have a habit of promoting open source, then gullible developers produce exceptional work from which said big corporations make millions or billions without paying a penny back.
Then developers learn they can't pay bills with exposure tokens and employment doesn't offer the lifestyle they had hoped for.
But there is no worry, they get replaced by next cohort of gullibles believing in open source.
What I am trying to say, it should be illegal for big corporations to use open source without paying royalties to all contributors.
fwiw this is from 2018 and i would say the conclusion from those who were around pinterest at this time and shortly after would not be nearly as positive
A lot of enraged engineers are here arguing about the claims, ignoring the real problem at hand: Pinterest still exists. WTF is going on with people still using Pinterest?
Elixir has a repl too! and I mean a repl in the LISP sense of a REPL. I can open a repl on my machine and peer into a production machine and its internals.
They could have just used different Java/Python libraries and changed the architecture for the same result. The choice of a Programming Language alone has minimal impact on performance at that scale.
> They could have just used different Java/Python libraries and changed the architecture for the same result.
Yep, their blog post is carefully worded to say "The combined effect of better architecture and Elixir" but they didn't mention how much of it is related to architecture or what specifically they did with Elixir to make things faster. It feels like a marketing piece for their consulting services.
I mean they put:
> Rewrote an #AWS APIGateway & #lambda service that was costing us about $16000 / month in #elixir. Its running in 3 nodes that cost us about $150 / month
They saved 100x here by moving from an expensive architecture (Serverless lambdas) to potentially reserved instances which are reasonably affordable, at least for cloud standards.
I remember once needing to parse XML in Python. I started with the easy approach of using the first XML parsing library I found which was xmltodict. Eventually I stumbled upon lxml which improved overall performance by 20x and I didn't have to rewrite much code at all. Sometimes it's easy to get big wins in your existing language if you know what the problem is.
Rewrite would give you a more optimized code almost always, because you know what you are writing.
Though the article says that they rewrote the notification system, and erlang/elixir is pretty amazing for that stuff. From the point "memory footprint per long-running connection".
I mean there are plenty of examples in the wild left and right.
Have you ever seen basic db optimization? Alone in my companies people were just using stuff wrong.
Performance and Architecture are a after thought in our industry. A normal developer doesn't think about it.
There was one query in my company which was running in one region slower than in another one and there was also an explain statment available. No one looked at the explain statement and thought "huh why does this simple select use so much memory". People weretrying to see why the regions themselfs were different not what the problem with the query was.
Unfair comparison, your Elixir example includes a module definition.
That said, code volume is never the issue, code clarity is. Right now you're comparing trivial examples, but what does in this case Pinterest's high volume spam detection code look like? I wouldn't judge a language on microscopic code snippets, or disregard them as "too noisy" when it usually isn't a deciding factor in programming languages.
> Unfair comparison, your Elixir example includes a module definition.
Huh? TekMol explicitly points out that the python code also includes a module definition. It's just that the python module definition takes up zero characters in the source file. (Note that this is not always true, but in this case it is.)
It takes more than that, in the name of the file, but the Elixir file will also have to have a name, and the comparison ignores both of those equally.
In TFA, it gets better though: "Steve: That’s pretty easy. When I started on the spam team, we had close to 1,400 servers running. When we converted several parts to Elixir, we reduced that by around 95%. One of the systems that ran on 200 Python servers now runs on four Elixir servers (it can actually run on two servers, but we felt that four provided more fault tolerance). The combined effect of better architecture and Elixir saved Pinterest over $2 million per year in server costs. In addition, the performance and reliability of the systems went up despite running on drastically less hardware. When our notifications system was running on Java, it was on 30 c32.xl instances. When we switched over to Elixir, we could run on 15. Despite running on less hardware, the response times dropped significantly, as did errors."