Immutability Changes Everything (2016)

bob1029 · on June 26, 2021

Immutability allows for incredible performance when you are working in the guts of database engines. Having assurances that a specific data log file offset cannot possibly be modified allows for substantial optimization throughout.

For users who do not need a serialized view of events, access to "near-real-time" data can be made to scale as far as needed, even if a single machine is driving all the writes. You could theoretically have a billion readers working off a single synchronous core system.

By utilizing careful abstractions, that single core system can write something on the order of 10-100 million transactions per second. We do it every day on the various financial exchanges around the world. In many cases, a single thread is what is holding up the entire exchange. Arrays of structs, hot cache lines, correct branch prediction and full pipelines are a hell of a combo.

In some practical businesses apps, it is also feasible to coalesce logical writes before they need to enter a fully-synchronous context, so you can get some step-down ratio like 100:1 on transactions that actually need to go across the network and be handled by the core system.

m12k · on June 26, 2021

I'm increasingly coming to realize that in programming, flexibility is something you pay for in performance, whether you use it or not. Data being mutable even if you never mutate it. Automatic bounds checks for data that you've already made sure can't go out of bounds. vtable lookups even when you're not using inheritance, because who knows, someone else might. Compilers needing to do extra work in case two pointers are aliases, even though that never makes sense for them to be from the programmer's perspective. Objects allocated one at a time, scattered around memory, even if you don't need them to be, and wouldn't mind their innards being kept together in cache aligned arrays.

They say you can solve any problem in computer science with another level of indirection - except performance, it seems, because we're paying the cost of all this indirection all the time, even if we don't need it most of the time. There's a crazy amount of unnecessary "housekeeping" done by compilers and runtimes to make sure everything always works all the time, which requires them to always do extra work to protect against pathological cases that almost never arise.

Modern compilers and runtimes are getting better and better at recognizing the common "happy case" and optimizing for that, and that's good. But it still feels backwards - both that we can't explicitly opt out of unneeded flexibility, so instead the compiler needs to deduce that we aren't using it. But also that a lot of this isn't opt-in to begin with, rather than opt out - i.e. make things immutable by default. Or let the programmer tell the compiler a little bit more about what they need and don't need the code to do, so the compiler can do a much better job if it.

jiggawatts · on June 26, 2021

> flexibility is something you pay for in performance

I lightbulb moment I had recently was that the classic interface-based OO programming paradigm utilising virtual functions was essentially the following declaration by the programmer to the compiler: The function that is invoked can change at any time, even call-to-call; assume nothing. it is a black box with unknown, unknowable contents. Do not inline it, do not inspect it, expect the worst.

If you look at a typical web framework, with call stacks hundreds of methods deep, you'll come to the same horrifying realisation that these are all inherently un-optimisable because of that statement. The compiler can do nearly nothing, and the processor will also struggle to paper over the inefficiency. Meanwhile, in reality, the specific implementations of those interfaces never change during actual production execution! Dynamic dispatch is utilised only during development time, and even then only at compile time to inject test harnesses, mocks, or whatever.

At runtime, 99.999999% of all function calls are essentially static, and dynamic dispatch is a pure waste.

dzaima · on June 26, 2021

Except that a smart runtime can actually work with the assumption that only one implementation of a class will be used, and inline what would otherwise be virtual methods. And, as far as I know, the JVM definitely does this, and at multiple levels too - if there's only one implementation of an interface, it can just assume that everything's static and inlineable (at the cost of deoptimizations if that turns out to be false later). If not, it can still decide to inline, and just check the type beforehand.

MaxBarraclough · on June 26, 2021

> The compiler can do nearly nothing,

Optimizing JIT-compiling JVMs are often able to optimize virtual calls that only ever resolve to one implementation. They may even inline such calls. (edit dzaima got there first.)

duped · on June 26, 2021

It's old too, Self was doing this in the late 80s

scns · on June 26, 2021

It was written by the same people:

The Java HotSpot Performance Engine was released on April 27, 1999, built on technologies from an implementation of the programming language Smalltalk named Strongtalk, originally developed by Longview Technologies, which traded as Animorphic. The Longview virtual machine was based on the Self virtual machine, with an interpreter replacing the fast-and-dumb first compiler. When Sun cancelled the Self project, two key people, Urs Hölzle and Lars Bak left Sun to start Longview. In 1997, Sun Microsystems purchased Animorphic.

Source: wikipedia

jasonwatkinspdx · on June 26, 2021

The only reason vtables et all work as well as they do is modern branch predictors are incredible.

dzaima · on June 26, 2021

Well, the alternative to vtables is manually branching (besides completely rewriting the code to avoid different actions), which isn't that much better and still requires branch prediction. And I think JVMs also might convert a dispatch from two classes to a conditional branch, but I'm not sure.

nayuki · on June 26, 2021

Yes: https://shipilev.net/blog/2015/black-magic-method-dispatch/#...

andi999 · on June 26, 2021

Yes and speaking of c++ the v table pointer might destroy your cache line.

kaba0 · on June 26, 2021

Just as a note, one may get horrified at the cost of all these indirections, C code is full of function pointers passed as callbacks. Hell, actually there is no much difference between a vtable and typical C code doing the same thing. Chances are, C++ compiler writers made it more efficient. Also, c++ doesn’t default to virtual so you rarely pay the price.

strictfp · on June 26, 2021

Yup,and this is excerbarated by the fact that c++ doesn't really offer any convenient equivalent of virtual methods with static dispatch.

std::variant is almost it, but it's interface is quite horrendous as of now.

Another interesting thing is that c++ also intermingles dynamic allocation with polymorphism, which can lead to a lot of unnecessary pointer chasing

dkersten · on July 1, 2021

If you're ok with third party libraries, there are some solutions to static polymorphism. For example, EnTT comes with this: https://github.com/skypjack/entt/wiki/Crash-Course:-poly

strictfp · on July 12, 2021

Interesting! As by pure chance, my current project already incorporates EnTT, so I will definitely give this a closer look. Thank you.

de_keyboard · on June 26, 2021

Great talk: Constraints Liberate, Liberties Constrain — Runar Bjarnason

https://www.youtube.com/watch?v=GqmsQeSzMdw

adverbly · on June 26, 2021

Wow thanks for linking that. Great talk.

I'm always on the lookout for talks about interesting and generally useful concepts, but where I can follow only a small part of them. Makes me feel like I can learn and grow into a wiser person. This certainly fits that criteria.

whoisburbansky · on June 26, 2021

This talk fits a similar bucket in my head; do you keep a playlist of other such talks you’ve run across in the past?

specialist · on June 26, 2021

> ...with another level of indirection

Ya. I've spent the last two decades removing levels of indirection, reducing call stack depth. This has put me at odds with most everyone. I've been insulted many times for not understanding "software architecture". Heh.

I started a design pattern study group shortly after GoF's book was published. I think it's a lot like jazz. 1st time thru, mind blown. 2nd time, ah, I think I'm starting to get it. 3rd time, well, duh.

Eventually students get full circle, a la "there is no spoon", and you reject the veneer of "design patterns". Said another way, it's just another tool, learn to use it wisely. Learn to reject temptation.

Aspects, annotations, runtime code reflection, metaprogramming, frameworks, schema mappings, factories, singletons, and whatnot are just monkey work. (Most of the time.) Exquisite puzzles (finger traps) to suck smart people people in. To distract from the real work of solving customer's problems, business needs.

I think the kids these days call it "yak shaving." Yes, I am very guilty of this. Past, present, and surely future. I'm no better than anyone else. I recognize my failings and am trying to improve.

mtVessel · on June 26, 2021

As much as people like to talk about "footguns", that's the value proposition of C++, zero cost for anything you don't use.

Managed/interpreted langs have a different proposition: pay up front for all the features, but your complexity never increases whether you use none, one or all of them.

MauranKilom · on June 26, 2021

Well, pointer aliasing for example is a very tricky thing in C++ too. Even with strict aliasing, it is very unintuitive from the outside when you're going to trip over unexpected performance hurdles. (Does this data structure have a char* inside? Who knows, but if it does, there goes vectorization!).

username90 · on June 26, 2021

If you don't dereference the char* it will get treated an an integer and be vectorizable. In C++ you don't pay for what you don't use. If you need the char* in the loop I'm not sure why you'd be thinking about vectorization. Maybe you meant parallelism?

Svetlitski · on June 26, 2021

Definitely agree. Another example of lost performance that comes to mind is (ignorance of) C struct padding. In terms of modern tools, Rust has definitely made some headway here:

• Variables are immutable by default

• The layout of a struct’s fields in memory is opaque to the programmer, allowing the compiler to reorder them to minimize padding

• As a natural consequence of the borrow checker, Rust has much more information about whether or not two references could alias than most other languages

• The programmer is very much made to feel the cost of dynamic-dispatch (via dyn), and static-dispatch is the norm

• The compiler can omit bounds checks when it can prove they’re unnecessary

• It’s easy to safely have structs that live entirely on the stack (i.e. no heap allocation)

Overall this is an unsolved problem though. It seems to me that the trend is that the more details about the internals that the language exposes to the programmer, the fewer optimizations can be made automatically. Essentially the ideal is declarative performance, where the programmer expresses the idea and the compiler figures out the optimal implementation (which is in stark contrast to most high-performance code being hand-tuned today). Take a look at something like Halide for a successful application of this philosophy: https://halide-lang.org/

bob1029 · on June 26, 2021

> Or let the programmer tell the compiler a little bit more about what they need and don't need the code to do, so the compiler can do a much better job if it.

You can do this today in languages like C#:

https://docs.microsoft.com/en-us/dotnet/csharp/write-safe-ef...

> Adding the readonly modifier to members that don't mutate state provides two related benefits. First, the compiler enforces your intent. That member can't mutate the struct's state. Second, the compiler won't create defensive copies of in parameters when accessing a readonly member. The compiler can make this optimization safely because it guarantees that the struct is not modified by a readonly member.

XorNot · on June 26, 2021

This feels like some seriously missed static analysis optimization though - why do I need to specify this? There's vanishingly few cases the answer to "looks like you don't modify this ever, make it readonly?" where the answer is not yes.

bob1029 · on June 26, 2021

Probably because reflection & compilation+activation of types existing outside the original compilation context is possible at runtime.

I much prefer to having a language/runtime that can handle extremely complex problem scenarios out of the box (with the need for occasional tuning), rather than the other way around. It is not very often that I find myself screwing around with performance-intensive code. Usually, a few very important things are addressed at the lowest levels and then you put all your nice abstractions back on top for daily driving.

jasonwatkinspdx · on June 26, 2021

I've been working occasionally on a database engine design that takes this idea to heart. It's been hugely beneficial.

The big picture design is simply described as an append only log of copy on write page states, with Kafka style compaction to maintain whatever window of history you want. This is supplemented with a compact in memory index that holds the LSN for the latest page states to make accessing live data as fast as possible.

The write path goes through a single batch committer thread to amortize fsync overhead (technically two threads in series so I can overlap prepping the next batch with waiting on fsync). Single writer isn't a big limit since even on consumer grade SSDs you can write at GiB/sec now. Append only log means I hit the happy case on SSDs. FTLs have come a long way in equalizing sequential vs random write performance, but this comes at the cost of a lot more write cycles behind the scenes to shuffle stuff around.

Here's two examples of how immutability has drastically simplified the design:

First, the in memory index is just a hand tweaked version of a Bagwell Trie. Since the trie is copy on write, once the fsync finishes the write thread can publish the new index state with a single atomic write. Client threads never see any intermediate state. The database moves from snapshot to snapshot atomically from their view. (This is the part I'm coding up atm)

A second example is want to have an on heap cache, to avoid overtaxing the SSD or the downfalls of a pure mmap caching approach (mmap can be the miss path for this cache however). I'm planning on using a lock free hash table. Since I'm working in a language that doesn't have such as a library, I'll have to roll my own. Ordinarily this would be an insane idea, as such structures are infamous for being nearly impossible to get correct, even when using formal modeling tools. But in my case, the relationship of LSN -> Immutable Page State is stable, so the cache can be quite sloppy. With a mutable cache I'd in essence have to ensure the CAS always went the right place, even under resizing, etc. But with immutability duplicates are only a potential memory waste, not a correctness problem, so I get to be quite a bit more sloppy.

One shouldn't treat any single thing as a panacea, but I've definitely come around to the view that immutable by default is a good approach. You get to just trivially dodge entire categories of nasty problems when doing something like the above.

I'd love to hear anything more you can share about the stuff you work on. It sounds really interesting.

bob1029 · on June 26, 2021

I really enjoy seeing how similar themes emerge across the industry.

I've been working on a key-value store that is purpose built for low-latency NVMe devices.

I have an append only log (spread across file segments for GC concerns), with the only contents being the nodes of a basic splay tree. The key material is randomly-derived, so I don't have to worry about the sequential integer issues on inserts. Writes to the database are modified roots of the splay tree, so the log is a continuous stream of consistent database snapshots. For readers that don't require serialization, they can asynchronously work with the historical result set as appropriate without any translation or sync required. You just point the reader to a prior root node offset and everything should work out.

jasonwatkinspdx · on June 26, 2021

Yup, I think a bunch of us are running up the same trail here. Definitely the zero coordination required read only transactions are super appealing.

fishtockos · on June 26, 2021

Can you explain the API a bit more? If you're batching writes in a single thread, doesn't that imply that clients don't 'know' when their writes are actually committed to disk? Or are their requests kept open until fsync?

jasonwatkinspdx · on June 26, 2021

So first a disclaimer. I've been thinking about this design or something like it in the back of my head for a couple years while keeping up to date on VLDB papers and the like. It's only recently that I've gotten serious about shipping a proof of concept. As it stands I'm just working bottom up trying to confirm the more risky components will work. So it's by no means a done deal. Obviously I think I'm on the right trail or I wouldn't be doing it though.

I'm working in golang. Client goroutines that execute read write transactions buffer up their read and write sets, then submit them to the committer goroutine via a channel. Part of the struct they submit has a blocking channel for the committer notify them of the result. So nothing returns to the client unless it's stable on disk. Assuming I get far enough, in a future version that'll also include optionally waiting for raft replication.

dataangel · on June 26, 2021

> Bagwell Trie

I'm getting almost no google results for "Bagwell Trie". Where can I read about this?

jasonwatkinspdx · on June 26, 2021

https://lampwww.epfl.ch/papers/idealhashtrees.pdf

These are the prior work for HAMTs, which have been implemented in a bunch of languages.

DennisP · on June 26, 2021

> single core system can write something on the order of 10-100 million transactions per second...a single thread is what is holding up the entire exchange. Arrays of structs, hot cache lines, correct branch prediction and full pipelines

Without going back to school, how would I learn how to do that? (Books I should read, good open source examples, any Coursera courses?)

rigtorp · on June 26, 2021

I've written some on this topic. Start here https://rigtorp.se/virtual-memory/ , there is more on similar stuff on my website.

belter · on June 26, 2021

Immutability changes everything but does not Solves Everything. You are just changing to a different scenario that might help.

"Tesler's Law" [1] :-) makes sure you are looking from a different perspective. Complexity is still present.

"Immutability is not enough"

https://codewords.recurse.com/issues/six/immutability-is-not...

[1] - "Tesler's Law" https://en.wikipedia.org/wiki/Law_of_conservation_of_complex...

cuddlecake · on June 26, 2021

The article "Immutability is not enough" is sub-par in every aspect.

The author never defines, what sort of problems immutable data can help avoid, or what the benefits of immutable data are. And still, after every paragraph, he wonders why functional programming didn't help him avoid bugs in his business logic.

Like, what the hell, the author produced a bug because the order of your operations was wrong? Great, write a test and restructure your code to make it work, at least you know where to look.

Functional Programming does not fix your Game Loop, and it never promised to.

crdrost · on June 26, 2021

FWIW I came to the comments because I also found the “Immutability changes everything” paper nearly unreadable.

I just wanted to comment that I love your last paragraph from a conceptual level. You want immutability? Great, let's talk about what performant game design looks like in that context. Games are special, in that they kind of do everything that computers are good at, to varying degrees. The author's ideas about snapshots of relational databases inviting further schema changes in the pipeline—I guess I see the idea you're trying to introduce, but what does it actually look like in a chess program? Or a choose your own adventure game?

moldavi · on June 26, 2021

The article rang true for me. I heard the same FP promises that the author did, and eventually ran into the same issues in my FP programs, and left disillusioned. The article captured the experience perfectly.

cachehit · on June 26, 2021

I like to say that mutable state is to software as moving parts are to hardware. If you can avoid them, great! But a number of systems/designs simply require them.

[link redacted]

jt2190 · on June 26, 2021

Thanks for the link to “Immutability is not enough” I’ve resubmitted the article here: https://news.ycombinator.com/item?id=27642263

JackMorgan · on June 26, 2021

Anyone interested in what an immutable programming language looks like should check out Unison [0] written by Paul Chiasano. He is known for writing the book FP in Scala.

The premise is that any edit produces a new function, but the old one still exists, so you need to migrate calls from the old version to the new version.

The benefits really start to manifest in distributed computing and package management: much less need to worry about transitive dependencies and conflicting versions.

The editable code is a projection of a data structure, so refactoring and static analysis tools will be very easy to write. Also the syntax is just a display issue and can be changed easily, everyone could read and write code in the syntax they prefer.

It does increase some complexity, but I think it is essential complexity we currently have only poor ways to manage, and explicit knobs to turn will allow us overall less effort.

[0]https://www.unisonweb.org/

fsloth · on June 26, 2021

F# is immutable by default, so are all ML variants (e.g. OCaml, Scala) for that matter.

DennisP · on June 26, 2021

F# has immutable data by default. Your parent comment describes immutable code.

CyberDildonics · on June 26, 2021

What is that supposed to mean?

DennisP · on June 26, 2021

I mean, it's right there in the above comment with presumably more at the link. This seems like the key point:

> any edit produces a new function, but the old one still exists, so you need to migrate calls from the old version to the new version

I'm not the guy to expand on it further, but maybe the above commenter could answer questions.

CyberDildonics · on June 26, 2021

A running program "editing" its functions is already pretty exotic and mostly relegated to JIT compilation.

DennisP · on June 26, 2021

I think they're talking about software development, not running functions.

CyberDildonics · on June 26, 2021

I don't understand why this would need a new language, it seems like tools would be easier. If it is from development, the whole point is to edit functions. If it is to edit live programs, one obvious and good way is to swap out a function pointer atomically and call functions by getting their pointer atomically. This doesn't really have anything to do with a different language though.

DennisP · on June 26, 2021

> The benefits really start to manifest in distributed computing and package management: much less need to worry about transitive dependencies and conflicting versions.

I can see how that would work better if function references are to hashes of entire functions, rather than just the names of functions.

refset · on June 26, 2021

Pat also wrote a pretty great follow-up article that expands on the "Data on the Outside vs. Data on the Inside" section, covering the intersection of immutability, temporality, schema and relational querying: https://queue.acm.org/detail.cfm?id=341501

news_hacker · on June 26, 2021

Your link wasn't working for me, here's a different one: https://queue.acm.org/detail.cfm?id=3415014

refset · on June 26, 2021

I somehow messed up the copy+paste and lost the last character, sorry :(

Thank you for your persistence though and for chiming in with the right answer!

samuell · on June 26, 2021

On my reading list for the summer: "The Art of Immutable Architecture: Theory and Practice of Data Management in Distributed Systems" [1]

[1] https://www.goodreads.com/book/show/52653567-the-art-of-immu...

throwaway81523 · on June 26, 2021

I'll try to look at that. At a more local level you might like "Purely Functional Data Structures" by Chris Okasaki. Functional programming revolves around those.

WhyIsItAlwaysHN · on June 26, 2021

+1 on "Purely Functional Data Structures", an excellent book to learn a different programming paradigm

whiddershins · on June 26, 2021

So I want to store historical sensor data, like room temperature, for many sensors across many rooms.

And then events that happen less frequently. Hourly. Daily.

There is no need to ever change that data, only append to it.

Does anyone here know what I should be looking at for append-only data storage technologies?

rzzzt · on June 26, 2021

"Time series database" and "log-structured storage" might be two keywords for starting points.

jasonwatkinspdx · on June 26, 2021

For industrial scale Kafka et all.

For more modest scale you could use something like lmdb, where append only hits a happy case optimization.

deterministic · on June 27, 2021

Just use a single file on disk. Or (if you worry about things getting corrupted) SQLite.

dgb23 · on June 26, 2021

> Normalization is not necessary in an immutable data set, however. The only reason to normalize immutable data sets may be to reduce the storage necessary for them. On the other hand, denormalized data sets may be easier and faster to process as inputs to a computation.

I disagree here. To derive new data or from a immutable data source you still want normalized data. Denormalization can happen at some point for processing. Sometimes normalization is not feasible, because it often requires modelling/design. But it is the best general case scenario for data to be well-structured and normalized.

mtVessel · on June 26, 2021

You're being kind. The author's statement is utter nonsense. If you store the same fact in two places and only change one of the them, it doesn't matter whether you update in-place, or create a new version of record -- your data is still corrupted.

jbjbjbjb · on June 26, 2021

I feel like the programming tools haven’t kept up and so to do this we also need immutable requirements and impeccable programmers. Currently if I change the code in my application it gives different results. I’m not sure if there is a word for it, but I’m thinking of pure applications like we have pure functions. I don’t think the right tools exist for this.

Also the “accountants don't use erasers” feels really glib to me, it’s hard work.

kaba0 · on June 26, 2021

As others in the thread mentioned, the Unison language is exactly what you are getting at.

dgb23 · on June 26, 2021

We have Git but a commit hash has no notion of whether the consumers have to care about a change.

The artifact we produce, typically immutable in a package repository, then has "semantic versioning" (possibly) to convey whether they should care.

jbjbjbjb · on June 26, 2021

Yeah I my argument is that those tools are ok but a bit inadequate. Certainly not first class solutions. It’s just a thought, I’m not sure it would open up a world of possibilities but there could be more rigorous ways on showing how the logic has changed over time and keeping that alongside the current state of the logic.

dgb23 · on June 26, 2021

You might find Unison[0] interesting. Or the more recent developments (or intents) of Clojure spec[1] and the talk Speculation[2].

[0] https://www.unisonweb.org/

[1] https://github.com/clojure/spec-alpha2

[2] https://www.youtube.com/watch?v=oyLBGkS5ICk

mightybyte · on June 26, 2021

This idea can be extended by observing the following analogy:

purity is to functions

as

immutability is to data

Immutability is definitely a valuable technique that can help make software faster (sometimes), more reliable, and easier to maintain. I would take it one step further and argue that we can get a similar boost by extending the idea to functions, not just data. And the way you do that is by writing pure functions (i.e. no side effects...or controlling/limiting them in some way) and using languages that can enforce that your functions are pure.

jhgb · on June 26, 2021

> purity is to functions as immutability is to data

I thought all data was just a bunch of nullary functions. ;)

sitkack · on June 26, 2021

Yes, and computation and data both become interchangeable facts. And by becoming facts, they lessen the tech debt, and start to fade into the background. I think the only way to have tractable large systems is to have pervasive immutability and purity.

tsimionescu · on June 26, 2021

Unfortunately, this just doesn't pan out in practice. There are various techniques that are much more inefficient to express with immutability and pure functions (e.g. computing a histogram). There is also a pretty big body of evidence already that doesn't show any hugely significant (say, 2x) changes in productivity, correctness or speed for using pure, strict functional languages and styles vs more traditional imperative, impure or even dynamic languages.

benrbray · on June 26, 2021

For completeness, would you mind posting a link to some of this evidence so we don't have to take your word for it?

tsimionescu · on June 26, 2021

First of all, I am arguing for the base case: absent any studies, the null hypothesis is naturally that there are no major differences between programming paradigms.

On the issue of bug counts, there was one huge study, that was successfully reproduced [0], which finds that there is a real but small negative correlation between FP and bug counts in large GitHub projects.

For productivity I haven't been able to find any large-scale studies that compare functional vs imperative/OOP languages. I did find one small scale study [1] that found significantly better productivity for Haskell (and a Lisp dialect) vs Ada, C++ and a few others, but it was about a specific small prototype problem with just a handful of programmers participating (the Haskell program had 85 lines, the worse, C++, had 1105; and programs took between 10h and 54h to finish - hardly representative of large-scale coding). Most other studies on productivity that I could find [2], [3] didn't include any FP languages at all.

[0] https://arxiv.org/pdf/1901.10220.pdf

[1] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36...

[2] https://www.e-informatyka.pl/attach/e-Informatica_-_Volume_1...

[3] https://www.researchgate.net/publication/4261726_Do_Programm...

mightybyte · on June 27, 2021

> There is also a pretty big body of evidence already that doesn't show any hugely significant (say, 2x) changes in productivity

> For productivity I haven't been able to find any large-scale studies that compare functional vs imperative/OOP languages

I'm having trouble reconciling these two statements of yours.

username90 · on June 26, 2021

Sorting is an example, quicksort is fast since it is done in place and mutates the data instead of allocating new memory.

Very few algorithms are not faster with mutable variables.

benrbray · on June 26, 2021

As I understand it, immutability usually refers to mutable data structures hidden behind an interface that allows the programmer to reason about things as if they were immutable. The naive implementation is of course to duplicate all data whenever a change is made, but pure / immutable languages use smarter things like persistent data structures behind the scenes.

If the compiler knows the original data is never needed, it can optimize away the old copies, too. At least that's the hope, I'm not sure how true it is.

username90 · on June 27, 2021

> As I understand it, immutability usually refers to mutable data structures hidden behind an interface that allows the programmer to reason about things as if they were immutable.

That is the definition of an algorithm, not immutability.

nonameiguess · on June 26, 2021

You can look at the R data.table package. R by default is a purely functional language, and the built-in data.frame type for in-memory tabular data is immutable. That means computing anything that changes the data requires copying it all. The data.table packages achieved pretty impressive speed-ups by dropping down into C and doing everything in-place. Also makes it harder to run out of memory.

Granted, some effects like this can be mitigated with sufficiently clever language design. Haskell with lazy evaluation. I believe some compilers for functional languages will reuse memory addresses they can prove nothing else refers to, so in reality they are mutating in-place even if semantically the variables are "immutable." Heck, even Python does that with the built-in immutable types. When you assign the changed data to a new name, it doesn't actually allocate new memory and just does it in-place under the hood. In cases like that, the data isn't really "immutable." It just can't be mutated by the programmer, only by the runtime or compiler.

Of course, if we want to get sufficiently pedantic, no data is ever immutable. The law may say a ledger is append-only, but nothing is stopping an accountant from just lying. A compiler may prevent you from assigning new data to an already-existing name binding, but a compiler can't stop someone from using rowhammer or an x-ray machine or something and flipping the bits in memory anyway.

rurban · on June 26, 2021

Taught us SICP already 45 years ago (1985)

wruza · on June 26, 2021

Accountants don't use erasers; otherwise they may go to jail. All entries in a ledger remain in the ledger. Corrections can be made but only by making new entries in the ledger.

This is not true. Accounting is usually a most mutable data source you ever have. Facts arrive at random times and then are entered on a schedule. Corrections await on someone to resolve analytical issues. Different accounting units have different timelines, and in a unit there is many threads of an unordered fact flow. It is all synced together only at specific points, usually when it’s time to report officially or internally. If you ask your accountant and the only thing they drop is “busy”, it’s one of these weeks of the year.

When a company's quarterly results are published, they include small corrections to the previous quarter. Small fixes are OK. They are append-only, too.

This is true, because public reports are legally binding.

On topic: in my opinion, immutability at small scales is good, because modifications do not leak accidentally into the fields of your data structures (e.g. a mutable string is a bad idea in general). But at the bigger scale immutability gets in the way, if not hidden properly. It requires a less common sort of developers who have to think in immutable/functional way, and even when they can, this increases their cognitive distance to the business logic. Properly hidden immutability helps a lot though, and we mutable guys use it a lot, despite a widespread belief. It’s transactions and execution context isolation. There is nothing wrong with mutating the operative area if these mutations cannot escape it. But if you’re short on primitives (a context object, an object model vs global.store and raw fields of a one-for-all object) you end up with setting globally visible state that can be read from the outside and trouble creeps in. Moreover, a real world code rarely looks like you’ve seen in a tutorial and instead of a clear `assemble(this(), that())` you have to resort to non-ordinary functional tricks and concepts in the name of it. As a result, you have to hire (or be one of) these functional guys who may be better in general, but also spend half their cognitive abilities on things that you couldn’t explain to an accountant.

So yes and no, immutability changes everything, but it’s not the only way to change it. It is just the easiest manual way to feel safe if you start naked in the woods.

bruce343434 · on June 26, 2021

> Facts arrive at random times and then are entered on a schedule. Corrections await on someone to resolve analytical issues. Different accounting units have different timelines, and in a unit there is many threads of an unordered fact flow. It is all synced together only at specific points, usually when it’s time to report officially or internally. If you ask your accountant and the only thing they drop is “busy”, it’s one of these weeks of the year.

I fail to see how any of this implies the data is mutable.

wruza · on June 26, 2021

An accountant loads a raw payment data from a bank. Every payment has to be recognized and related to some analytics (a project, an expense/income class, etc). If it’s not obvious, then corresponding analytics are left empty or “_Unrecognized”, which rings at top of all related reports. Later the info arrives and they simply go to that day and fill in fields. The change history may be useful for data-debug reasons, but the model that a regular accountant uses is of a mutable nature at their level.

Of course one could design a system that made changesets append-only and explicit for them. But they would object, the same way you’d object if instead of updating a working tree, `git pull` simply appended a set of patches to every file in it and expected you to modify source code only by appending patches yourself. Developer’s own work model is mutable (a working tree), and you can’t expect an accountant to be surprisingly fine with what you personally are not.

slt2021 · on June 26, 2021

bank is a perfect example of immutable accounting.

once debit is posted to your account (let's say it is fraudulent transaction, or transaction entered by mistake), nobody will go and erase it as if it never happened. They would actually do reversal, by creating a new entry with the same amount and opposite sign. So you would env up with two transactions: +100 and -100.

same is true for most accounting systems. Once the accounting period is closed (daily in banking system, monthly for most other companies), you can't change the closed month. All changes go as new transaction adjustments

wruza · on June 26, 2021

That’s what I’m talking about! You don’t have to go all-immutable, only at the “commit and set in stone” level (you may have many). Obsessing yourself with a completely immutable state management at all levels does more damage to your workflow than the problems it solves. It’s yet another silver bullet that everyone tries to bite these days.

lewisjoe · on June 26, 2021

Curious: Under what kind does redux's use of immutability come under?

skohan · on June 26, 2021

Cheeky headline

chaz6 · on June 26, 2021

I do not think immutability is compatible with data protection legislation such as GDPR.

pjc50 · on June 26, 2021

Interestingly France requires immutability (well, durable records, not quite the same thing) for point of sale systems, under "NF525": https://www.ikosoft.com/en-gb/all-knowledge-about-the-nf525-...

aszen · on June 26, 2021

Immutability doesn't mean you can't delete old records.

tyingq · on June 26, 2021

In practice, it could. Kafka, for example, doesn't provide a non-hacky way to delete a specific record from a topic.

davidgerard · on June 26, 2021

(2016)

baby · on June 26, 2021

Can you change the title?

junon · on June 26, 2021

The original title from the article matches the one here. There's nothing to change.

baby · on June 26, 2021