> So, we have a single array in which every entry has the key and the value paired together. But, during lookups, we only care about the keys. The way the data is structured, we keep loading the values into the cache, wasting cache space and cycles. One way to improve that is to split this into two arrays: one for the keys and one for the values.
Also, I think what you meant by data locality here is really optimizing data layout, which, as you also mentioned, is a hard problem. But if it's just optimizing (cache's) locality, I think the classic loop interchange also qualifies. Though it's not enabled by default in LLVM, despite being there for quite a while.
Well, to be honest, it's not any extraordinary idea and it's not even mine. Mike Acton used this exact example in his CppCon video (https://youtu.be/rX0ItVEVjHc?t=1558 at 25:58). I just happened to know that LLVM's DenseMap has this problem.
On the other comment, data locality is a property (well... it's not really a binary thing (i.e, an execution has it or it doesn't), but really a spectrum; but ok, it will do) of a program. For example, a simple definition which I like is in the paper "Improving Data Locality with Loop Transformations (https://dl.acm.org/doi/pdf/10.1145/233561.233564):
> Data locality is the property that references to the same memory location or adjacent locations are reused within a short period of time.
Changing the the layout is just a transformation to achieve this property. So, the two things are different in the sense that one is a property while the other is a transformation, but they're not different in the sense that data layout transformations are not relevant to data locality.
Regarding cache locality, that is also a property that we could define let's based on this (https://doi.org/10.1109/12.752657) say as: "The property that data brought into cache should be reused as much as possible before it is replaced". Data locality depends on the hardware on which the program is executed. Since we use caches in our hardware, to have data locality you usually need to need have cache locality so these two coincide.
Finally, yes you're absolutely right (well, last time I checked...). LLVM has a bunch of loop transformations implemented---some relevant to data locality---but they're not turned on by default. One reason for that is that LLVM loops are too unstructured while these transformations usually need nice for-loop nests. They're unstructured on the one hand because they're coming e.g., from C, and OTOH because LLVM doesn't have a loop structure, something Chris Lattner has said he regretted.
Tangent: cases like these make me think we're often working on too low level of abstraction, reimplementing relational databases poorly without realizing it.
Like, if you're thinking, "Hey, I need a key-value map so I can efficiently look up Bars given some Foos", then this isn't equivalent to "I need std::map<Foo,Bar>" - it's closer much closer to "I need a table with Foos and Bars, where Foo is a key, and I need an index on it". Now that may translate to more than one object - e.g. like two arrays, one with Foos, other with pointers to Bars. That achieves the intended data locality.
Now, imagine a new requirement, "I also want to efficiently look up Foos corresponding to a given Bar". Should you add another array to support reverse mapping? Well, in database terms, you've said, "I also need an index on Bar". So yes, add one. And add standard boilerplate to keeping all the arrays in sync.
It feels tedious and arbitrary, and people may call it "premature optimization", but the problem really is that we don't have those basic database concepts built into languages. "I need to map Foos to Bars, primarily for efficient lookup, but also for efficient back-referencing" is a lot of work, we're tempted to skip the efficiency bits to keep the code simple; however all that just translates to something like `auto m_bars = Table<Foo, Bar> index(Foo) index(Bar)`; if we could express that directly, we'd skip the boilerplate and compilers could give us very efficient implementations (and we couldn't accidentally break pieces of them out of sync).
I encourage everyone to try and look out for this pattern. I find it shows up frequently, especially in more complex cases, possibly across object boundaries. I first realized this when I was making a roguelike game with Entity Component System, and realized I need fast lookup of entities based on X/Y coordinates in the game world. I started adding some data structures to support that, and then it hit me - I'm just adding a spatial index to the entity list; this isn't complex idea, it's an atomic concept - if you think in database terms, instead of usual programming terms.
boost.multiindex is close to what you want. You define your table schema (as a struct definition) then ask for indices (of various kinds) on specific subsets of your schema. IIRC It doesn't support anything other than row major layout, but an extension to it is not too far fetched.
I dream of a language with first class support for relational tables as the primary data structure.
Regarding a fast string lookup: you really need a short string optimization here, if not static perfect hashes. Short strings directly from the word in the cache.
> So, we have a single array in which every entry has the key and the value paired together. But, during lookups, we only care about the keys. The way the data is structured, we keep loading the values into the cache, wasting cache space and cycles. One way to improve that is to split this into two arrays: one for the keys and one for the values.
Recently someone proposed this on LLVM: https://discourse.llvm.org/t/rfc-add-a-new-structure-layout-...
Also, I think what you meant by data locality here is really optimizing data layout, which, as you also mentioned, is a hard problem. But if it's just optimizing (cache's) locality, I think the classic loop interchange also qualifies. Though it's not enabled by default in LLVM, despite being there for quite a while.