What are software bugs that can be avoided by choosing data structures like these?
I'm making a broad, high-level presentation about immutability in technology. At my company we have folks who have heard of it in the context of ransomware-resilient backups, others who have heard of it in the context of infrastructure as code, and very few who have heard of it in terms of data structures (distributed and non-distributed). My goal is to showcase the concept in various contexts so that people can better understand its role as a key design choice in technology.
Personally I have no experience working on software that utilizes these, so if others here do, I would appreciate your input on how these make your software more reliable.
The emphasis on software reliability and bugs-avoided is because the audience works under the company's risk-management division.
Purely functional data structures are a means, not an end in themselves.
Programming with immutable values and the more declarative style they support does design out at least one infamous source of bugs: shared mutable state. That has become increasingly important in recent years, for example because modern chips support ever increasing numbers of concurrent threads in hardware and because a lot of the software we write is now part of distributed systems.
That programming style has other advantages in terms of readability and reasoning about code. If you can be confident that a particular name in the code always refers to the same value once it’s been defined, tracing the flow of data through a function or through an entire system often becomes much easier. Having more understandable code naturally tends to improve reliability, among other advantages.
If we adopt that programming style then we still need to represent collections of values and different relationships within our data, but we can’t always use “traditional” data structures to do that any more because many of them rely on mutability, which we have excluded. So we need other ways to represent structured data that are compatible with immutability and declarative style, and that is what purely functional data structures give us.
_Immutable_ data structures (not 100% the same thing, but a lot of overlap) avoid all kinds of concurrency problems, because you can safely pass them around and you'll never get a data race. You don't even need any locking (just to make things complicated, _lock-free_ data structures are another closely related but not identical concept).
Once you're running a distributed system, this kind of stuff comes into its own.
Careful with this wording. They avoid shared memory mutations. They don't necessarily change data races. Rather, they just change them since, by definition, every edit is now creating stale data.
At large, in distributed software, this is a distraction. Since most passing of messages around from one distributed piece to the other was already doing a copy across mediums. Such that sent data was already immutable from the senders perspective. (Granted, the preparation step can have you step on your own feet.)
As the peer comment points out, functional data structures force you to be deliberate about where/when state changes take place. In particular, they force you to be cognizant of which version of a data structure a particular piece of code is referencing.
Immutable structures can help significantly in reducing race conditions in parallel code, as it is impossible for one thread to modify part of a data structure while another thread is accessing it. Each thread sees a consistent 'version' of the data structure until the code explicitly replaces it with a new version.
Immutability can also help with memory optimizations: A node of one data structure can be safely re-used in another data structure. For example, if you need to retain older versions of a tree, you can share common nodes (this is how GIT encodes version changes).
Purely functional data structures are very common in purely functional languages like Haskell but are also used in non functional languages via libraries like immutable.js.
At a high level, immutability forces you to be extremely deliberate about state changes in your application. This improves reasoning/understanding, reduces bugs, and eases debugging.
An example of immutability that you might be familiar with would be react props/state. You don’t modify your state. This makes reasoning about state much more simple.
Disclaimer: Not a programmer - I do not have a CS degree. One of my minors as an undergrad was MIS and while I took a database programming course as a senior, my knowledge is limited.
The high-level explanation I recall reading years ago somewhere (Joel on Software, I think??) was that a) Functional programs have no side effects, leading to the notion that b) They can, without much effort, be adapted for parallel tasks.
The example here was MapReduce, which was the original guts of the Google Search algorithm.
E.g, Things are fast because we can send your query to many, many computers to find an answer, and chances are good that that machine is (relatively) close to you, which means low wait times to get your answer.
Modern consensus is done by constructing a replicated log - an append only data-structure shared across multiple workers. It's what you use Paxos for, and it's what Raft streamlines.
I'm making a broad, high-level presentation about immutability in technology. At my company we have folks who have heard of it in the context of ransomware-resilient backups, others who have heard of it in the context of infrastructure as code, and very few who have heard of it in terms of data structures (distributed and non-distributed). My goal is to showcase the concept in various contexts so that people can better understand its role as a key design choice in technology.
Personally I have no experience working on software that utilizes these, so if others here do, I would appreciate your input on how these make your software more reliable.
The emphasis on software reliability and bugs-avoided is because the audience works under the company's risk-management division.