"Imagine a world where data layout doesn’t matter, where apps are optimized for sub-millisecond storage, where 100 byte I/Os are faster and just as efficient as 8KB I/Os. The architectural implications are huge and would take a decade or more to get our heads around."
Umm.. I've worked with in memory data structures (who hasn't?) and yeah, layout DOES matter. Especially if your data structure is larger than a cache line.
>"Imagine a world where data layout doesn’t matter, where apps are optimized for sub-millisecond storage, where 100 byte I/Os are faster and just as efficient as 8KB I/Os.
with latency not being absolute 0, 100bytes I/O will always be less efficient the 8K I/Os. For example, with 300Mb/s and 0.01ms latency the throughput would be 10M/s vs 220M/s.
"100bytes I/O will always be less efficient the 8K I/Os"
Not if only 200 bytes of that 8K are relevant. It depends on the rpc overhead, but the small io's may be more efficient. Which is why you see them researching changes to the networking stack.
"300Mb/s and 0.01ms latency the throughput would be 10M/s vs 220M/s."
When you write numbers on a napkin this is true. When you're talking about real systems that are both concurrent and scheduled in quantized slices you'll see much more complex behavior.
Note, this is from a database literature perspective. The server doesn't need a complex storage management layer like most RDBMs. If the read/write latency is low enough, all it needs is object framing, and clients can pick their own more complex data models atop this.
"Umm.. I've worked with in memory data structures (who hasn't?) and yeah, layout DOES matter. Especially if your data structure is larger than a cache line."
That's only compared to the speeds gained by different memory layouts.
But compared to hard/solid state disks (which is the whole point here) the difference is less than insignificant.
It just itches a nerve to hear "It's a 1000x faster already! Data layout doesn't matter!" when you are thinking back "Well, it's also 1000x more expensive and it could have been 50,000x faster if you had put a bit of thought into your data layout."
I'm a crusty old console game dev. The explanation I give the new kids is: "Remember the PlayStation2? It ran at 300MHz and had a memory latency of 50 cycles. But, the PS3 is faster, right? It runs at 3000MHz and has a memory latency of 500 cycles. You know what happens when you don't think about memory layout? The PS3 runs at the same speed as the PS2!"
Umm.. I've worked with in memory data structures (who hasn't?) and yeah, layout DOES matter. Especially if your data structure is larger than a cache line.