What is the distinction between Nested records and Parallel arrays? Nested recor...

kragen · on Jan 2, 2017

With parallel arrays in their pure form, the things in your arrays are always primitive data types like characters, floating-point numbers, or integers, and each array is homogeneous. With nested records, you have heterogeneous data collections (records) and maybe even arrays of them. So, yes, column-oriented vs. row-oriented — or potentially some more ramified structure of rows containing rows containing columns containing rows...

I agree that what kind of arithmetic your computer uses is mostly irrelevant to the memory model.

I'll see if I can clarify these things in the essay. Thank you!

barrkel · on Jan 1, 2017

There's something else parallel arrays give you: control over the visibility of fields based on subsystem that needs those fields.

Because the index into the arrays is the identifier for an entity, subsystems can keep attributes related to the entity in their own private arrays. They don't need to modify record definition or use some complicated attribute extension mechanism or associative lookup mechanism. Their access will be just as efficient as other attributes.

It is indeed about row-oriented vs column-oriented, but the distinction isn't merely "storage layout", like how you'd choose to the order of dimensions in a multidimensional array.

Rather, it affects how the software is designed, developed and debugged. The referenced PDF with a defence of the approach by Adam Rosenberg is quite interesting: http://www.the-adam.com/adam/rantrave/st02.pdf - particularly pages 16 and onwards until you get the point.

Adam Rosenberg has some real points. The billions of dollars lost to insecure programs written in idiomatic C style wouldn't have happened, at least not in the same way, with the parallel array approach. For my part though, I think that parallel arrays represent a local maximum. I found Adam fairly convincing, and I think the approach is probably superior to pointer soup for programs that can fit inside a single person's head. But I don't think the approach scales; you need better encapsulation and composition tools to scale up, and those come with more indirections.

kragen · on Jan 2, 2017

I find Adam's book very thought-provoking, but I'm not at all persuaded that his approach is safer. He advocates run-time bounds-checking on every memory access, but of course that only works to convert one kind of program failure into another, and then only potentially — if you're indexing the points "table" with a circle index, you're only ever going to get a detected error if there are more circles than points. The corresponding bug in a struct-based program is a compile-time error (although, in C, only since V7 UNIX; in V6 all struct fields were in a single namespace).

taeric · on Jan 1, 2017

Parallel arrays typically don't store non-identically typed things one after the other. Unless I am mistaken, of course.

I think of it this way. Say you have two pieces of related data A and B. If you did this with nested arrays, say Foo{A, B}, than you would have an array that was A_1, B_1, A_2, B_2 in memory. With parallel arrays, you would have A_1, A_2 in one array, and B_1, B_2 in another.

So... is that not accurate?

kragen · on Jan 2, 2017

Right. I tried to draw this in the diagrams.