Immutability allows for incredible performance when you are working in the guts of database engines. Having assurances that a specific data log file offset cannot possibly be modified allows for substantial optimization throughout.
For users who do not need a serialized view of events, access to "near-real-time" data can be made to scale as far as needed, even if a single machine is driving all the writes. You could theoretically have a billion readers working off a single synchronous core system.
By utilizing careful abstractions, that single core system can write something on the order of 10-100 million transactions per second. We do it every day on the various financial exchanges around the world. In many cases, a single thread is what is holding up the entire exchange. Arrays of structs, hot cache lines, correct branch prediction and full pipelines are a hell of a combo.
In some practical businesses apps, it is also feasible to coalesce logical writes before they need to enter a fully-synchronous context, so you can get some step-down ratio like 100:1 on transactions that actually need to go across the network and be handled by the core system.
I'm increasingly coming to realize that in programming, flexibility is something you pay for in performance, whether you use it or not. Data being mutable even if you never mutate it. Automatic bounds checks for data that you've already made sure can't go out of bounds. vtable lookups even when you're not using inheritance, because who knows, someone else might. Compilers needing to do extra work in case two pointers are aliases, even though that never makes sense for them to be from the programmer's perspective. Objects allocated one at a time, scattered around memory, even if you don't need them to be, and wouldn't mind their innards being kept together in cache aligned arrays.
They say you can solve any problem in computer science with another level of indirection - except performance, it seems, because we're paying the cost of all this indirection all the time, even if we don't need it most of the time. There's a crazy amount of unnecessary "housekeeping" done by compilers and runtimes to make sure everything always works all the time, which requires them to always do extra work to protect against pathological cases that almost never arise.
Modern compilers and runtimes are getting better and better at recognizing the common "happy case" and optimizing for that, and that's good. But it still feels backwards - both that we can't explicitly opt out of unneeded flexibility, so instead the compiler needs to deduce that we aren't using it. But also that a lot of this isn't opt-in to begin with, rather than opt out - i.e. make things immutable by default. Or let the programmer tell the compiler a little bit more about what they need and don't need the code to do, so the compiler can do a much better job if it.
> flexibility is something you pay for in performance
I lightbulb moment I had recently was that the classic interface-based OO programming paradigm utilising virtual functions was essentially the following declaration by the programmer to the compiler: The function that is invoked can change at any time, even call-to-call; assume nothing. it is a black box with unknown, unknowable contents. Do not inline it, do not inspect it, expect the worst.
If you look at a typical web framework, with call stacks hundreds of methods deep, you'll come to the same horrifying realisation that these are all inherently un-optimisable because of that statement. The compiler can do nearly nothing, and the processor will also struggle to paper over the inefficiency. Meanwhile, in reality, the specific implementations of those interfaces never change during actual production execution! Dynamic dispatch is utilised only during development time, and even then only at compile time to inject test harnesses, mocks, or whatever.
At runtime, 99.999999% of all function calls are essentially static, and dynamic dispatch is a pure waste.
Except that a smart runtime can actually work with the assumption that only one implementation of a class will be used, and inline what would otherwise be virtual methods. And, as far as I know, the JVM definitely does this, and at multiple levels too - if there's only one implementation of an interface, it can just assume that everything's static and inlineable (at the cost of deoptimizations if that turns out to be false later). If not, it can still decide to inline, and just check the type beforehand.
Optimizing JIT-compiling JVMs are often able to optimize virtual calls that only ever resolve to one implementation. They may even inline such calls. (edit dzaima got there first.)
The Java HotSpot Performance Engine was released on April 27, 1999, built on technologies from an implementation of the programming language Smalltalk named Strongtalk, originally developed by Longview Technologies, which traded as Animorphic. The Longview virtual machine was based on the Self virtual machine, with an interpreter replacing the fast-and-dumb first compiler. When Sun cancelled the Self project, two key people, Urs Hölzle and Lars Bak left Sun to start Longview. In 1997, Sun Microsystems purchased Animorphic.
Well, the alternative to vtables is manually branching (besides completely rewriting the code to avoid different actions), which isn't that much better and still requires branch prediction. And I think JVMs also might convert a dispatch from two classes to a conditional branch, but I'm not sure.
Just as a note, one may get horrified at the cost of all these indirections, C code is full of function pointers passed as callbacks. Hell, actually there is no much difference between a vtable and typical C code doing the same thing. Chances are, C++ compiler writers made it more efficient. Also, c++ doesn’t default to virtual so you rarely pay the price.
I'm always on the lookout for talks about interesting and generally useful concepts, but where I can follow only a small part of them. Makes me feel like I can learn and grow into a wiser person. This certainly fits that criteria.
Ya. I've spent the last two decades removing levels of indirection, reducing call stack depth. This has put me at odds with most everyone. I've been insulted many times for not understanding "software architecture". Heh.
I started a design pattern study group shortly after GoF's book was published. I think it's a lot like jazz. 1st time thru, mind blown. 2nd time, ah, I think I'm starting to get it. 3rd time, well, duh.
Eventually students get full circle, a la "there is no spoon", and you reject the veneer of "design patterns". Said another way, it's just another tool, learn to use it wisely. Learn to reject temptation.
Aspects, annotations, runtime code reflection, metaprogramming, frameworks, schema mappings, factories, singletons, and whatnot are just monkey work. (Most of the time.) Exquisite puzzles (finger traps) to suck smart people people in. To distract from the real work of solving customer's problems, business needs.
I think the kids these days call it "yak shaving." Yes, I am very guilty of this. Past, present, and surely future. I'm no better than anyone else. I recognize my failings and am trying to improve.
As much as people like to talk about "footguns", that's the value proposition of C++, zero cost for anything you don't use.
Managed/interpreted langs have a different proposition: pay up front for all the features, but your complexity never increases whether you use none, one or all of them.
Well, pointer aliasing for example is a very tricky thing in C++ too. Even with strict aliasing, it is very unintuitive from the outside when you're going to trip over unexpected performance hurdles. (Does this data structure have a char* inside? Who knows, but if it does, there goes vectorization!).
If you don't dereference the char* it will get treated an an integer and be vectorizable. In C++ you don't pay for what you don't use. If you need the char* in the loop I'm not sure why you'd be thinking about vectorization. Maybe you meant parallelism?
Definitely agree. Another example of lost performance that comes to mind is (ignorance of) C struct padding. In terms of modern tools, Rust has definitely made some headway here:
• Variables are immutable by default
• The layout of a struct’s fields in memory is opaque to the programmer, allowing the compiler to reorder them to minimize padding
• As a natural consequence of the borrow checker, Rust has much more information about whether or not two references could alias than most other languages
• The programmer is very much made to feel the cost of dynamic-dispatch (via dyn), and static-dispatch is the norm
• The compiler can omit bounds checks when it can prove they’re unnecessary
• It’s easy to safely have structs that live entirely on the stack (i.e. no heap allocation)
Overall this is an unsolved problem though. It seems to me that the trend is that the more details about the internals that the language exposes to the programmer, the fewer optimizations can be made automatically. Essentially the ideal is declarative performance, where the programmer expresses the idea and the compiler figures out the optimal implementation (which is in stark contrast to most high-performance code being hand-tuned today). Take a look at something like Halide for a successful application of this philosophy: https://halide-lang.org/
> Or let the programmer tell the compiler a little bit more about what they need and don't need the code to do, so the compiler can do a much better job if it.
> Adding the readonly modifier to members that don't mutate state provides two related benefits. First, the compiler enforces your intent. That member can't mutate the struct's state. Second, the compiler won't create defensive copies of in parameters when accessing a readonly member. The compiler can make this optimization safely because it guarantees that the struct is not modified by a readonly member.
This feels like some seriously missed static analysis optimization though - why do I need to specify this? There's vanishingly few cases the answer to "looks like you don't modify this ever, make it readonly?" where the answer is not yes.
Probably because reflection & compilation+activation of types existing outside the original compilation context is possible at runtime.
I much prefer to having a language/runtime that can handle extremely complex problem scenarios out of the box (with the need for occasional tuning), rather than the other way around. It is not very often that I find myself screwing around with performance-intensive code. Usually, a few very important things are addressed at the lowest levels and then you put all your nice abstractions back on top for daily driving.
I've been working occasionally on a database engine design that takes this idea to heart. It's been hugely beneficial.
The big picture design is simply described as an append only log of copy on write page states, with Kafka style compaction to maintain whatever window of history you want. This is supplemented with a compact in memory index that holds the LSN for the latest page states to make accessing live data as fast as possible.
The write path goes through a single batch committer thread to amortize fsync overhead (technically two threads in series so I can overlap prepping the next batch with waiting on fsync). Single writer isn't a big limit since even on consumer grade SSDs you can write at GiB/sec now. Append only log means I hit the happy case on SSDs. FTLs have come a long way in equalizing sequential vs random write performance, but this comes at the cost of a lot more write cycles behind the scenes to shuffle stuff around.
Here's two examples of how immutability has drastically simplified the design:
First, the in memory index is just a hand tweaked version of a Bagwell Trie. Since the trie is copy on write, once the fsync finishes the write thread can publish the new index state with a single atomic write. Client threads never see any intermediate state. The database moves from snapshot to snapshot atomically from their view. (This is the part I'm coding up atm)
A second example is want to have an on heap cache, to avoid overtaxing the SSD or the downfalls of a pure mmap caching approach (mmap can be the miss path for this cache however). I'm planning on using a lock free hash table. Since I'm working in a language that doesn't have such as a library, I'll have to roll my own. Ordinarily this would be an insane idea, as such structures are infamous for being nearly impossible to get correct, even when using formal modeling tools. But in my case, the relationship of LSN -> Immutable Page State is stable, so the cache can be quite sloppy. With a mutable cache I'd in essence have to ensure the CAS always went the right place, even under resizing, etc. But with immutability duplicates are only a potential memory waste, not a correctness problem, so I get to be quite a bit more sloppy.
One shouldn't treat any single thing as a panacea, but I've definitely come around to the view that immutable by default is a good approach. You get to just trivially dodge entire categories of nasty problems when doing something like the above.
I'd love to hear anything more you can share about the stuff you work on. It sounds really interesting.
I really enjoy seeing how similar themes emerge across the industry.
I've been working on a key-value store that is purpose built for low-latency NVMe devices.
I have an append only log (spread across file segments for GC concerns), with the only contents being the nodes of a basic splay tree. The key material is randomly-derived, so I don't have to worry about the sequential integer issues on inserts. Writes to the database are modified roots of the splay tree, so the log is a continuous stream of consistent database snapshots. For readers that don't require serialization, they can asynchronously work with the historical result set as appropriate without any translation or sync required. You just point the reader to a prior root node offset and everything should work out.
Can you explain the API a bit more? If you're batching writes in a single thread, doesn't that imply that clients don't 'know' when their writes are actually committed to disk? Or are their requests kept open until fsync?
So first a disclaimer. I've been thinking about this design or something like it in the back of my head for a couple years while keeping up to date on VLDB papers and the like. It's only recently that I've gotten serious about shipping a proof of concept. As it stands I'm just working bottom up trying to confirm the more risky components will work. So it's by no means a done deal. Obviously I think I'm on the right trail or I wouldn't be doing it though.
I'm working in golang. Client goroutines that execute read write transactions buffer up their read and write sets, then submit them to the committer goroutine via a channel. Part of the struct they submit has a blocking channel for the committer notify them of the result. So nothing returns to the client unless it's stable on disk. Assuming I get far enough, in a future version that'll also include optionally waiting for raft replication.
> single core system can write something on the order of 10-100 million transactions per second...a single thread is what is holding up the entire exchange. Arrays of structs, hot cache lines, correct branch prediction and full pipelines
Without going back to school, how would I learn how to do that? (Books I should read, good open source examples, any Coursera courses?)
The article "Immutability is not enough" is sub-par in every aspect.
The author never defines, what sort of problems immutable data can help avoid, or what the benefits of immutable data are. And still, after every paragraph, he wonders why functional programming didn't help him avoid bugs in his business logic.
Like, what the hell, the author produced a bug because the order of your operations was wrong? Great, write a test and restructure your code to make it work, at least you know where to look.
Functional Programming does not fix your Game Loop, and it never promised to.
FWIW I came to the comments because I also found the “Immutability changes everything” paper nearly unreadable.
I just wanted to comment that I love your last paragraph from a conceptual level. You want immutability? Great, let's talk about what performant game design looks like in that context. Games are special, in that they kind of do everything that computers are good at, to varying degrees. The author's ideas about snapshots of relational databases inviting further schema changes in the pipeline—I guess I see the idea you're trying to introduce, but what does it actually look like in a chess program? Or a choose your own adventure game?
The article rang true for me. I heard the same FP promises that the author did, and eventually ran into the same issues in my FP programs, and left disillusioned. The article captured the experience perfectly.
I like to say that mutable state is to software as moving parts are to hardware. If you can avoid them, great! But a number of systems/designs simply require them.
Anyone interested in what an immutable programming language looks like should check out Unison [0] written by Paul Chiasano. He is known for writing the book FP in Scala.
The premise is that any edit produces a new function, but the old one still exists, so you need to migrate calls from the old version to the new version.
The benefits really start to manifest in distributed computing and package management: much less need to worry about transitive dependencies and conflicting versions.
The editable code is a projection of a data structure, so refactoring and static analysis tools will be very easy to write. Also the syntax is just a display issue and can be changed easily, everyone could read and write code in the syntax they prefer.
It does increase some complexity, but I think it is essential complexity we currently have only poor ways to manage, and explicit knobs to turn will allow us overall less effort.
I don't understand why this would need a new language, it seems like tools would be easier. If it is from development, the whole point is to edit functions. If it is to edit live programs, one obvious and good way is to swap out a function pointer atomically and call functions by getting their pointer atomically. This doesn't really have anything to do with a different language though.
> The benefits really start to manifest in distributed computing and package management: much less need to worry about transitive dependencies and conflicting versions.
I can see how that would work better if function references are to hashes of entire functions, rather than just the names of functions.
Pat also wrote a pretty great follow-up article that expands on the "Data on the Outside vs. Data on the Inside" section, covering the intersection of immutability, temporality, schema and relational querying: https://queue.acm.org/detail.cfm?id=341501
I'll try to look at that. At a more local level you might like "Purely Functional Data Structures" by Chris Okasaki. Functional programming revolves around those.
> Normalization is not necessary in an immutable data set, however. The only reason to normalize immutable data sets may be to reduce the storage necessary for them. On the other hand, denormalized data sets may be easier and faster to process as inputs to a computation.
I disagree here. To derive new data or from a immutable data source you still want normalized data. Denormalization can happen at some point for processing. Sometimes normalization is not feasible, because it often requires modelling/design. But it is the best general case scenario for data to be well-structured and normalized.
You're being kind. The author's statement is utter nonsense. If you store the same fact in two places and only change one of the them, it doesn't matter whether you update in-place, or create a new version of record -- your data is still corrupted.
I feel like the programming tools haven’t kept up and so to do this we also need immutable requirements and impeccable programmers. Currently if I change the code in my application it gives different results. I’m not sure if there is a word for it, but I’m thinking of pure applications like we have pure functions. I don’t think the right tools exist for this.
Also the “accountants don't use erasers” feels really glib to me, it’s hard work.
Yeah I my argument is that those tools are ok but a bit inadequate. Certainly not first class solutions. It’s just a thought, I’m not sure it would open up a world of possibilities but there could be more rigorous ways on showing how the logic has changed over time and keeping that alongside the current state of the logic.
This idea can be extended by observing the following analogy:
purity is to functions
as
immutability is to data
Immutability is definitely a valuable technique that can help make software faster (sometimes), more reliable, and easier to maintain. I would take it one step further and argue that we can get a similar boost by extending the idea to functions, not just data. And the way you do that is by writing pure functions (i.e. no side effects...or controlling/limiting them in some way) and using languages that can enforce that your functions are pure.
Yes, and computation and data both become interchangeable facts. And by becoming facts, they lessen the tech debt, and start to fade into the background. I think the only way to have tractable large systems is to have pervasive immutability and purity.
Unfortunately, this just doesn't pan out in practice. There are various techniques that are much more inefficient to express with immutability and pure functions (e.g. computing a histogram). There is also a pretty big body of evidence already that doesn't show any hugely significant (say, 2x) changes in productivity, correctness or speed for using pure, strict functional languages and styles vs more traditional imperative, impure or even dynamic languages.
First of all, I am arguing for the base case: absent any studies, the null hypothesis is naturally that there are no major differences between programming paradigms.
On the issue of bug counts, there was one huge study, that was successfully reproduced [0], which finds that there is a real but small negative correlation between FP and bug counts in large GitHub projects.
For productivity I haven't been able to find any large-scale studies that compare functional vs imperative/OOP languages. I did find one small scale study [1] that found significantly better productivity for Haskell (and a Lisp dialect) vs Ada, C++ and a few others, but it was about a specific small prototype problem with just a handful of programmers participating (the Haskell program had 85 lines, the worse, C++, had 1105; and programs took between 10h and 54h to finish - hardly representative of large-scale coding). Most other studies on productivity that I could find [2], [3] didn't include any FP languages at all.
As I understand it, immutability usually refers to mutable data structures hidden behind an interface that allows the programmer to reason about things as if they were immutable. The naive implementation is of course to duplicate all data whenever a change is made, but pure / immutable languages use smarter things like persistent data structures behind the scenes.
If the compiler knows the original data is never needed, it can optimize away the old copies, too. At least that's the hope, I'm not sure how true it is.
> As I understand it, immutability usually refers to mutable data structures hidden behind an interface that allows the programmer to reason about things as if they were immutable.
That is the definition of an algorithm, not immutability.
You can look at the R data.table package. R by default is a purely functional language, and the built-in data.frame type for in-memory tabular data is immutable. That means computing anything that changes the data requires copying it all. The data.table packages achieved pretty impressive speed-ups by dropping down into C and doing everything in-place. Also makes it harder to run out of memory.
Granted, some effects like this can be mitigated with sufficiently clever language design. Haskell with lazy evaluation. I believe some compilers for functional languages will reuse memory addresses they can prove nothing else refers to, so in reality they are mutating in-place even if semantically the variables are "immutable." Heck, even Python does that with the built-in immutable types. When you assign the changed data to a new name, it doesn't actually allocate new memory and just does it in-place under the hood. In cases like that, the data isn't really "immutable." It just can't be mutated by the programmer, only by the runtime or compiler.
Of course, if we want to get sufficiently pedantic, no data is ever immutable. The law may say a ledger is append-only, but nothing is stopping an accountant from just lying. A compiler may prevent you from assigning new data to an already-existing name binding, but a compiler can't stop someone from using rowhammer or an x-ray machine or something and flipping the bits in memory anyway.
Accountants don't use erasers; otherwise they may go to jail. All entries in a ledger remain in the ledger. Corrections can be made but only by making new entries in the ledger.
This is not true. Accounting is usually a most mutable data source you ever have. Facts arrive at random times and then are entered on a schedule. Corrections await on someone to resolve analytical issues. Different accounting units have different timelines, and in a unit there is many threads of an unordered fact flow. It is all synced together only at specific points, usually when it’s time to report officially or internally. If you ask your accountant and the only thing they drop is “busy”, it’s one of these weeks of the year.
When a company's quarterly results are published, they include small corrections to the previous quarter. Small fixes are OK. They are append-only, too.
This is true, because public reports are legally binding.
On topic: in my opinion, immutability at small scales is good, because modifications do not leak accidentally into the fields of your data structures (e.g. a mutable string is a bad idea in general). But at the bigger scale immutability gets in the way, if not hidden properly. It requires a less common sort of developers who have to think in immutable/functional way, and even when they can, this increases their cognitive distance to the business logic. Properly hidden immutability helps a lot though, and we mutable guys use it a lot, despite a widespread belief. It’s transactions and execution context isolation. There is nothing wrong with mutating the operative area if these mutations cannot escape it. But if you’re short on primitives (a context object, an object model vs global.store and raw fields of a one-for-all object) you end up with setting globally visible state that can be read from the outside and trouble creeps in. Moreover, a real world code rarely looks like you’ve seen in a tutorial and instead of a clear `assemble(this(), that())` you have to resort to non-ordinary functional tricks and concepts in the name of it. As a result, you have to hire (or be one of) these functional guys who may be better in general, but also spend half their cognitive abilities on things that you couldn’t explain to an accountant.
So yes and no, immutability changes everything, but it’s not the only way to change it. It is just the easiest manual way to feel safe if you start naked in the woods.
> Facts arrive at random times and then are entered on a schedule. Corrections await on someone to resolve analytical issues. Different accounting units have different timelines, and in a unit there is many threads of an unordered fact flow. It is all synced together only at specific points, usually when it’s time to report officially or internally. If you ask your accountant and the only thing they drop is “busy”, it’s one of these weeks of the year.
I fail to see how any of this implies the data is mutable.
An accountant loads a raw payment data from a bank. Every payment has to be recognized and related to some analytics (a project, an expense/income class, etc). If it’s not obvious, then corresponding analytics are left empty or “_Unrecognized”, which rings at top of all related reports. Later the info arrives and they simply go to that day and fill in fields. The change history may be useful for data-debug reasons, but the model that a regular accountant uses is of a mutable nature at their level.
Of course one could design a system that made changesets append-only and explicit for them. But they would object, the same way you’d object if instead of updating a working tree, `git pull` simply appended a set of patches to every file in it and expected you to modify source code only by appending patches yourself. Developer’s own work model is mutable (a working tree), and you can’t expect an accountant to be surprisingly fine with what you personally are not.
bank is a perfect example of immutable accounting.
once debit is posted to your account (let's say it is fraudulent transaction, or transaction entered by mistake), nobody will go and erase it as if it never happened.
They would actually do reversal, by creating a new entry with the same amount and opposite sign.
So you would env up with two transactions: +100 and -100.
same is true for most accounting systems. Once the accounting period is closed (daily in banking system, monthly for most other companies), you can't change the closed month. All changes go as new transaction adjustments
That’s what I’m talking about! You don’t have to go all-immutable, only at the “commit and set in stone” level (you may have many). Obsessing yourself with a completely immutable state management at all levels does more damage to your workflow than the problems it solves. It’s yet another silver bullet that everyone tries to bite these days.
For users who do not need a serialized view of events, access to "near-real-time" data can be made to scale as far as needed, even if a single machine is driving all the writes. You could theoretically have a billion readers working off a single synchronous core system.
By utilizing careful abstractions, that single core system can write something on the order of 10-100 million transactions per second. We do it every day on the various financial exchanges around the world. In many cases, a single thread is what is holding up the entire exchange. Arrays of structs, hot cache lines, correct branch prediction and full pipelines are a hell of a combo.
In some practical businesses apps, it is also feasible to coalesce logical writes before they need to enter a fully-synchronous context, so you can get some step-down ratio like 100:1 on transactions that actually need to go across the network and be handled by the core system.