Excellently written. One thing that jumped out at me is this in the discussion of the parallel arrays:
> The compiler can’t tell which arrays a given index is or isn’t valid for, and neither can the debugger or garbage collector...
(I know people are getting tired of hearing about the awesomeness of this but) in Rust I have grown very fond of the 'zip' function on 'Iterator', which actually does guard against index issues across the parallel arrays/iterators. This is a big deal to me, because to speed up access of datasets across databases or tables or partitions; selecting out ordered arrays of data could often speed up access in the DB by magnitudes. In Java there was no guard against an index violation, and there's not simple way to create a merged view of the returned parallel arrays. But in Rust you can zip them all together and not worry about access violations (though you should still probably detect them so that you know you don't have any data gaps).
This was a great article. And made me start thinking of immutable memory in a different way when you consider punch cards!
`zip` solves the problem when you're iterating over an entire class of objects, which is relatively often; it doesn't help when you are dealing with some subset or relationship, so that your index is being fetched from some other array somewhere.
This is a beautiful essay, and I've really enjoyed it. However, while interesting and inspirational it doesn't all ring true. Lisps from the years before generational garbage collection might spend five percent of their time in GC; in particularly awkward cases it might rise to ten percent. I'm not saying this was desirable: it was not, particularly in interactive processes. But it was nothing like 'usually spent between a third and half of their time running the garbage collector'.
I've seen some very long GCs indeed. I'm sure I've seen fifteen minute GCs. But I've never seen GC eat a third of the processor cycles. I don't believe it.
When I was in college in the early 90s, unix systems were nonstop-paging quite often, and lisp systems would frequently pause while you were interacting with them, for nontrivial amounts of time, like multiple seconds.
And that was the 90s, after Lisp had been around for decades...
That's why my university had a site license of Allegro CL for its SUN SPARC cluster. Thus every student had access to a useful Allegro CL installation.
above the 90s were mentioned. SPARC systems shipped in the late 80s, 87something. At that time (90s) Lisp systems on stock hardware had already quite useable GCs, especially the big three commercial CL implementations on Unix (Lucid, Allegro and Lispworks). Even MCL on Mac OS had a useful ephemeral GC.
It's hard to say how many time was spent on GC in a Lisp program in the 70s. The installations were different, programs were different. How much time did a Multics Emacs written in Maclisp spend in GC? I have no idea. A macsyma running for some hours? AN early rule based system? Probably there were taken some benchmark numbers, but one would need check the literature to find them.
There are also pathological cases where Memory is mostly full and time would be spent mostly in GC, or where a lot of virtual memory is used, and much time is spent in IO paging in/out memory.
Note also that the invention of generational GCs was triggered by the availability of larger virtual memory based systems and the programs using it - available to the market outside of the original research. Before that, most of them were quite a bit smaller or very rare. Systems like the Symbolics 3600 were extremely rare and expensive. When they first shipped, GCs were copying and compacting, but not generational. Systems like that did not exist on the market before.
The claims being debated here are, as I understand them, my half-remembered claim that it was common to spend ⅓ to ½ of your execution time in the garbage collector until generational GC, and Simon Brooke's claim that actually 5% was normal and 10% was in pathologically bad cases. I think Gabriel's book shows conclusively that allocation-heavy programs could easily spend ⅓ to ½ of their time in GC, or 80% to 90% in pathological cases. But Brooke could still be right about the typical case, since when allocation is expensive, people avoid it in performance-critical code when possible.
I agree that the problem mostly went away once generational (aka ephemeral) GC was shipped. In the essay, I dated that to Lieberman and Hewitt in 1980, but in a more recent discussion with Allen Wirfs-Brock, I learned that actual generational GC didn't ship until 1986. (And Lieberman and Hewitt's paper, though submitted in 1980, wasn't published until 1983.)
Given the question of whether non-generational GC would cost more like 5% or more like 25% of your performance, the performance of generational GCs is somewhat of a false trail. Hopefully nobody was claiming generational GCs would eat double-digit fractions of your CPU time. I wasn't, at least.
See also ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-1417.pdf. for an early perspective on generational GC.
Ephemeral GC btw. means something slightly different. An ephemeral GC is mostly concerned with the detection and collection of short lifetime objects in main memory.
The main problem is that architectures, implementations, actual computers, programs were so different, that you could find all kinds of numbers. Even a generational GC can get extremely busy and may not prevent from having full GCs over large virtual memory from time to time. It's also a huge difference if the GC was incremental (the work was spread over the whole compuation) or if the system stopped for much of the GC work.
But come on, I have used emacs for 25 years, and on a daily basis it stalls for annoying amounts of time while I am just doing something simple like editing a .cpp file. Today. In the year 2017.
The performance claim is from my memory of reading LISPCraft, the 1984 edition written before generational GC, where Wilensky claims that something like a third to half of your program runtime is typically spent in GC, so consing is much more expensive than it appears. But I don't have a copy of LISPCraft here, and I read it in probably 1990 or so, so I might be misremembering.
How can we verify the truth about these conflicting claims about LISP systems from the early 1980s? Maybe Gabriel's book has a more definitive answer?
(A possible confounding factor is that, at the time, the frequently-executed parts of Lisp programs were written in a more imperative style to reduce the load on the GC.)
Looking at Gabriel's 1985 book on the performance of Lisp systems (lispm posted a link in another thread: http://rpgpoet.com/Files/Timrep.pdf) we see many benchmarks using 0 GC time. His "Boyer" theorem-proving benchmark (p.133, 142/294) very commonly uses as much time in the "GC" column as in the "CPU" column (½ of the total), twice as much on SAIL (⅔ of the total), and almost never less than about half as much (⅓ of the total).
I'd like to claim that one of the four configurations of VAX Franz Lisp shone here, since it spent only about ⅐ of its total execution time in the GC. However, it spent roughly as much absolute time in the GC as the other three Franz Lisp configurations; it's just that it took 6× as long as normal to do the non-GC part of the benchmark; "normal", for context, was several times slower than SAIL. It's not clear that this counts as a performance improvement in any sense!
But, since many of his benchmarks use 0 GC time, I don't think I can justify the statement that GC just before the introduction of generational GC normally used 50% of a program's total run time; most programs did not consist primarily of evaluating tree-rewriting rules, which obviously are allocation-intensive. (Boyer isn't the worst, though. On Vaughan Pratt's Deriv symbolic differentiation benchmark, SAIL spends almost 90% of its total CPU time in the GC, though some other collectors handle it better.)
Reading through some of the other benchmarks, Franz Lisp commonly comes out rather GC-heavy, though less so than SAIL; on div2, for example, Franz spends 80% of its time in the GC. It may be significant that Wilensky was using Franz to write the book I got my original impression from.
I'm somewhat surprised that the FFT benchmark is written to use GC time, since normally I think of FFTs as not requiring any allocation at all, much less significant
numbers of allocations of separate objects...
So, in conclusion, I do think it's justifiable to say that there were significant classes of programs for which all Lisp systems continued to spend ⅓ to ½ of their time in the GC, or 80% to 90% in bad cases, right up until the introduction of generational GC shortly after Gabriel's book. But it's probably not justifiable to apply that statement generally to all Lisp programs. Your 5%–10% number sounds quite reasonable as an overall average.
> I'm somewhat surprised that the FFT benchmark is written to use GC time, since normally I think of FFTs as not requiring any allocation at all, much less significant numbers of allocations of separate objects...
Keep in mind:
a) not every machine had FP hardware, so FP instructions might be emulated in software.
b) dealing with floating point numbers may mean allocate float objects. For example not every Lisp had a direct array of floats. The array then would be a general array pointing to float objects on the heap.
c) not every compiler/interpreter was clever enough to keep floats in registers (or similar) while doing float computations.
Even today, allocating floats can be a problem in Lisp applications.
About Erlang memory model : The answer is that Erlang has two different memory models, because it has two paradigms.
Inside an Actor, you are running whatever you want. That sequential part of erlang has a graph memory model, but you could write it in any other language and model, as soon as you write an Actor.
Outside an Actor, you are in the non graph society of actors. And this one is closer to general system architecture, and is rarely talked about. Which probably explain why it is rare to find a memory model of it.
One omission is the spreadsheet memory model. The spreadsheet can be argued as the multi-dimension array like in Fortran, but it's not. Both the data of primitive types and the computation declaration are stored in the table cells. Data dependency and computation are declared. Computation order is not specified and is run on the whole table as a dependency graph. It's kind of unique.
I think spreadsheets are kind of at a different level of abstraction from what I was talking about in this essay, a higher level. They're more like a data model or a programming paradigm than a memory model.
You should separate spreadsheet computation (which for these languages is written in the language) from the spreadsheet data.
Spreadsheets slightly generalize the SQL table by allowing cells to have varying data type, but without loss of generality any spreadsheet can be broken down into a set of SQL tables.
It's similar in that it reaches back to the 60's to examine some things we take for granted in a programming language. What is a name, a value, an L-value, an R-value, how do closures capture, how does evaluation work, etc.?
A lot of popular languages have settled on the same choices, but it's good to see the universe of possibilities.
This paper of Strachey's looks very interesting; thank you! Essentials of Programming Languages argues that we can understand many of the important differences between programming languages by way of what kinds of different lvalues (approximately EOPL's "denoted values") and rvalues (approximately EOPL's "expressed values") the language has. For example, in C, you can have a variable that denotes an array, while in Python, your variable can only denote a pointer to an array ("list").
This obviously ties in rather closely with the theme of the essay.
Learning about the parallel array memory model, I immediately thought of its appropriateness for machine learning. I've just recently started working with TensorFlow, and the syntax for memory allocation suggests that the values operated upon during a session are in parallel arrays. Of course TensorFlow is in Python, which would suggest a graph model, but it appears as if a parallel array memory model has been grafted onto a graph model.
Does anyone know if this is correct? And if so, is this memory allocated upon session start?
The internal representations of Python's core types are indeed pointer-heavy ("graph model"), and Python also encourages making references. But Python is not used like C, nor is it used exactly like Lisp; one does not usually increment loop variables, nor base everything on linked lists. If you are writing programs which use NumPy or TensorFlow heavily, you might actually be relying more on parallel array representations.
It seems your comment implies that Lisp bases everything on linked lists. That is a misconception. As early as Lisp 1.5 (1962), arrays were available (see section 4.4 of [1]).
Moreover, the submitted article itself makes it clear that when it says "LISP’s only data-structuring mechanism is something called a cons, which consists of two pointers" it is talking about Lisp 1.0. It also acknowledges that LISP is called Lisp now, too,
I'm flattered that people seem to have enjoyed this essay of mine. I'm a little embarrassed that this happened even though I hadn't gotten around to updating the text to refer to the examples that are in the actual diagrams. On the other hand, nobody seems to have noticed.
I had promised to take a one-year break from commenting on HN because it had turned into a place where people were comparing one another to garbage https://news.ycombinator.com/item?id=12598071 but I am breaking that promise now to answer these comments.
I’m calling these six conceptualizations “memory models”, even though that term used to mean a specific thing that is related to 8086 addressing modes in C but not really related to this. (I’d love to have a better term for this.)
I would just call them "data models". The relational model is clearly a data model; it's based on tables with homogeneous columns. C++ uses the data model of typed memory; Python and Java use the object graph data model, etc.
Another possible name would be "meta-object model". (That term shouldn't require "classes")
Although I'm not sure that they all go together. The first 3 are definitely related (records, object graph, parallel arrays), and the last 2 are (relational vs hierachical file system).
I don't see how pipes really fit in (and I'm writing a shell [1]). A pipe is a fixed-size buffer inside the kernel, outside the address space of any process. That distinction is orthogonal to how you actually organize your memory within a process, which is what the first 3 things are about.
And I think that persistence is an orthogonal concern as well. It applies to all of them except pipes. SQL and hierarchical file systems are persisted in practice, and you will have no problem with COBOL-style records or parallel arrays.
Object graphs have the problem with serialization that he mentions: JSON and Protocol Buffers can't represent cyclic data structures; they represent trees.
The analogy is that Zephyr ASDL is to ML as protocol buffers are to C++ (although protobufs now have unions which could be represented as sum types I think).
Maybe the desire to call them "data models" is an argument that the relational model doesn't really belong in this essay, because it really is much more of a data model (abstractly describing an ontology of mathematical objects) than a memory model (describing how to map that ontology onto bytes in memory).
Yes I agree about the relational model. It is higher level than the rest -- one of the key ideas in Codd's paper was to abstract data away from concrete storage. In contrast, the records, parallel arrays, and to some degree the object graph model are pretty closely tied to concrete storage. C/C++/Go all explicitly specify the memory layout and allow the programmer to control it by design.
And as mentioned, I think the relational model and file system are interesting but orthogonal topics.
I do think this pattern of taking the memory model/data model and "externalizing" into a DSL is interesting (JSON, protobufs and many other schemes, ASDL). That makes it clear that persistence is an orthogonal concern.
One thing I've been thinking about, and which your article helped me hone in on, is that scripting languages almost use the object graph model, but that model is inefficient on modern computers. Pointers are huge and they lead to scattered data.
This seems like a pretty enormous amount of overhead... there is more "metadata" than there is data!
Another point: Someone else mentioned R and pandas. I've been meaning to write a blog post called "R is the only language without the ORM problem". There's no mismatch, because R's data model is the same as SQL -- tables with homogeneous columns (this is in the logical sense). It's meant for "measurements and observations" rather than "business data", but I don't see any fundamental reason why these are different. It's more about R's implementation quirks than the logical model.
So that is another argument that persistence is a separate concern. R has non-persistent tables, but SQL has persistent tables.
Another example is Redis. Redis is a persistent (although it didn't start off that way), but it doesn't use the relational model. I haven't used it too much, but as far as I know it has dictinoaries, sets, and lists. So it looks like a database server but has a different model.
So I think these concerns should be represented in the taxonomy:
- logical vs physical model (logical is what the user sees; physical is concrete storage). You can have an SQL database that is row-oriented or column-oriented. And I noticed that the Jai programming language has this structure-of-arrays vs array-of-structures duality built in.
- Persistence -- each model can be dealt with in-memory or on disk. I didn't know that COBOl dealt with records on disk, which is interesting. A B-Tree is a data structure with pointers, but it's designed for being seralized.
A great article addressing a difficult idea. The article slightly overgeneralizes for simplicity.
One omission is that the author doesn't mention the "data table" concept when he talks about SQL tables. Data frames in R or Python/pandas map directly onto the SQL table idea.
And thus the article explains, without stating it explicitly, why Python is such a good data analysis language. You get the LISP "everything is an object pointer" data model in the core language[1], and FORTRAN arrays (numpy) and SQL tables (pandas) are for practical purposes part of the language. (One could also argue that COBOL records are dealt with by pandas, but in practice people use both namedtuples and pandas tables for large named records, and both are not perfect for this purpose.)
Having those "Big Three" data models at your fingertips in Python gives great power, and even though python is not fast, the big three data models make Python useful for almost all forms of data analysis.
[1]. When python says "everything is an object", it really means "everything is a pointer to an object". You can tell because everything is passed by reference, not value. Even "print(genericObject)" gives you the address of the object.
Whether numpy and pandas are part of the language depends greatly on how you use Python. There are still probably lots of people writing Django webapps without importing either of them.
Go has both value types and pointer types, like C and C++, but unlike Java/Python/Ruby/JS/etc.
My reading of the COBOL part was that it's basicall like C and C++ without pointers. Records are contiguous and nested inside each other rather than using pointers. So you can't have cyclic data structures.
> My reading of the COBOL part was that it's basicall like C and C++ without pointers.
But the author also says C is one of the COBOL derived languages: "COBOL-derived languages like C and Golang give the garbage collector an easier time, because they perform fewer, larger allocations; they tend to mutate the allocated objects instead of allocating modified versions; and programmers tend to nest records rather than link them with pointers where possible..."
Actually there are a few things about this passage that are odd, since C does not have a garbage collector.
Thank you for pointing out how that part is confusing! It's certainly true that C usually does not have a garbage collector.
As chubot said, what I was saying is that C and similar languages take (the simplified ur-form of) the COBOL memory model and add pointers to it, and programmers in these languages mostly use pointers when they want one of the things you can't get in the COBOL memory model: aliasing, nullability, cyclic relationships, or dynamic memory allocation.
You could very reasonably argue that the distinction between GC'd and non-GC'd languages is more profound than that, since in a GC'd language you can make all your data immutable and toss around entire subgraphs of the object graph as if they were integers, simply by referring to them by a pointer, not worrying when or if they get deallocated. While I agree that this is a big difference, I think the differences I'm focusing on in the essay are actually even more pervasive — so pervasive that we rarely even take note of them at all. That's why I feel justified in lumping Golang with C instead of Lisp here.
Yeah, I think thought he was basically saying that the records come from COBOL, and the pointers come from Lisp. So then C/C++/Go are derived from BOTH COBOL and Lisp.
I think he is being a little imprecise there but the meaning gets through. The point is that Go programmers can structure their data so as minimitze garbage collection overhead, but Java programmers can't. I'm pretty sure this was mentioned in some of the recent Go garbage collector design docs, so it's a pretty relevant point.
I always thought of C/C++/Go as ALGOL-like languages, but I guess that refers to the syntax (block structure) and not semantics (the data model, which I prefer over "memory model").
> The point is that Go programmers can structure their data so as minimitze garbage collection overhead, but Java programmers can't.
There is an anecdote that Minecraft, in the old days, used to pass around block coordinates as "int x, int y, int z" rather than as an instance of a BlockCoordinate class in order to reduce GC overhead. The BlockCoordinate class was introduced by a refactoring later on, thus turning Minecraft into the memory hog it now is.
So you can certainly optimize for GC overhead in Minecraft, though probably at the expense of readability.
> I think he is being a little imprecise there but the meaning gets through.
It's one thing not to go into details. It's quite another to misrepresent them. To describe C and Go as somehow alike in their memory allocation strategy, as though C were garbage-collected, is to foster a misconception.
What is the distinction between Nested records and Parallel arrays?
Nested records support an "array of a specific number of data objects of the same type stored one after the other". That is what is needed for parallel arrays. The parallel arrays examples just happen to have more numerical types.
The distinguishing features listed seem to mostly be about the implementation supporting fast numerical computation better ("used binary instead of decimal", "supported floating-point math"). These might distinguish the languages Fortran and Cobol, but I think this is orthogonal to the memory model.
Is it because Nested records in Cobol does not have dynamic memory allocation and pointers?
What if an object graph was statically allocated - is allocation also orthogonal?
Is the distinction having a storage layout which is row-oriented vs column-oriented?
With parallel arrays in their pure form, the things in your arrays are always primitive data types like characters, floating-point numbers, or integers, and each array is homogeneous. With nested records, you have heterogeneous data collections (records) and maybe even arrays of them. So, yes, column-oriented vs. row-oriented — or potentially some more ramified structure of rows containing rows containing columns containing rows...
I agree that what kind of arithmetic your computer uses is mostly irrelevant to the memory model.
I'll see if I can clarify these things in the essay. Thank you!
There's something else parallel arrays give you: control over the visibility of fields based on subsystem that needs those fields.
Because the index into the arrays is the identifier for an entity, subsystems can keep attributes related to the entity in their own private arrays. They don't need to modify record definition or use some complicated attribute extension mechanism or associative lookup mechanism. Their access will be just as efficient as other attributes.
It is indeed about row-oriented vs column-oriented, but the distinction isn't merely "storage layout", like how you'd choose to the order of dimensions in a multidimensional array.
Rather, it affects how the software is designed, developed and debugged. The referenced PDF with a defence of the approach by Adam Rosenberg is quite interesting: http://www.the-adam.com/adam/rantrave/st02.pdf - particularly pages 16 and onwards until you get the point.
Adam Rosenberg has some real points. The billions of dollars lost to insecure programs written in idiomatic C style wouldn't have happened, at least not in the same way, with the parallel array approach. For my part though, I think that parallel arrays represent a local maximum. I found Adam fairly convincing, and I think the approach is probably superior to pointer soup for programs that can fit inside a single person's head. But I don't think the approach scales; you need better encapsulation and composition tools to scale up, and those come with more indirections.
I find Adam's book very thought-provoking, but I'm not at all persuaded that his approach is safer. He advocates run-time bounds-checking on every memory access, but of course that only works to convert one kind of program failure into another, and then only potentially — if you're indexing the points "table" with a circle index, you're only ever going to get a detected error if there are more circles than points. The corresponding bug in a struct-based program is a compile-time error (although, in C, only since V7 UNIX; in V6 all struct fields were in a single namespace).
Parallel arrays typically don't store non-identically typed things one after the other. Unless I am mistaken, of course.
I think of it this way. Say you have two pieces of related data A and B. If you did this with nested arrays, say Foo{A, B}, than you would have an array that was A_1, B_1, A_2, B_2 in memory. With parallel arrays, you would have A_1, A_2 in one array, and B_1, B_2 in another.
> The compiler can’t tell which arrays a given index is or isn’t valid for, and neither can the debugger or garbage collector...
(I know people are getting tired of hearing about the awesomeness of this but) in Rust I have grown very fond of the 'zip' function on 'Iterator', which actually does guard against index issues across the parallel arrays/iterators. This is a big deal to me, because to speed up access of datasets across databases or tables or partitions; selecting out ordered arrays of data could often speed up access in the DB by magnitudes. In Java there was no guard against an index violation, and there's not simple way to create a merged view of the returned parallel arrays. But in Rust you can zip them all together and not worry about access violations (though you should still probably detect them so that you know you don't have any data gaps).
This was a great article. And made me start thinking of immutable memory in a different way when you consider punch cards!