Both Dynamo and Cassandra are written in Java. Are the replacements still Java?
At least Google's counterparts seem to be written in C++. Such as BigTable and GFS. Presumably also Spanner is C++.
In addition to C++, especially considering its less good safety/security record, Rust, Golang and Nim should be interesting alternative, safer, implementation language choices. In contrast to idiomatic Java, those languages provide significantly higher CPU cache hit rate for internal data structures due to no value boxing [1] and ability to reliably reduce problems like false sharing [2].
1: http://www4.di.uminho.pt/~jls/pdp2013pub.pdf (These issues can be worked around in Java by abandoning object orientation and instead having one object with multiple arrays (SoA, structure of arrays). In other words, not List or Array etc. of Point-objects, but class Points { int[] x; int[] y; ... } that contains all points.)
In what way is Rust, Golang, Nim safer than Java ? Rust I understand provides some nice semantics for thread safety but these exist in Java as world.
And I don't understand why anybody should care about CPU cache hit rate. The bottleneck is always going to be in the I/O pipeline. And Java is faster than C++ and vice versa in various situations.
Rust, Golang and Nim are safer than C++. Sorry that I didn't express it clearly enough; I considered the safety issues to be rather obvious.
> Rust I understand provides some nice semantics for thread safety but these exist in Java as world.
How do you get Java to fail compiling if thread safety constraints are not met? I'd be interested to try it out!
One should definitely care about cache hit rate, because it significantly affects runtime performance. There are just 512 L1D cache lines per CPU core.
I/O is the bottleneck? It is becoming less so, one of the few areas where there's actually some nice progress happening. PCIe SSDs are up to 1.5 - 2 GB/s (=up to 20 Gbps). More and more servers have 10 Gbps or 40 Gbps networking.
Sure, Java is faster when it can use JIT to prune excessive if-jungle, aggressively simplify and inline and adapt to running CPU. But memory layout control is where Java is rather weak. The problem is getting only worse, because the gap between CPU and memory performance is only widening year by year. Memory bandwidth is increasing slowly and latency hasn't improved for a decade.
C++ is going to be always faster especially if specialized to certain machine and use case. C++ is also going to win by a large margin when there's auto-vectorizable code or heavy use of SIMD-intrinsics. 10x is not unusual, if the problem maps well to AVX2 instruction set.
At least Google's counterparts seem to be written in C++. Such as BigTable and GFS. Presumably also Spanner is C++.
In addition to C++, especially considering its less good safety/security record, Rust, Golang and Nim should be interesting alternative, safer, implementation language choices. In contrast to idiomatic Java, those languages provide significantly higher CPU cache hit rate for internal data structures due to no value boxing [1] and ability to reliably reduce problems like false sharing [2].
1: http://www4.di.uminho.pt/~jls/pdp2013pub.pdf (These issues can be worked around in Java by abandoning object orientation and instead having one object with multiple arrays (SoA, structure of arrays). In other words, not List or Array etc. of Point-objects, but class Points { int[] x; int[] y; ... } that contains all points.)
2: http://mechanical-sympathy.blogspot.com/2011/08/false-sharin... (False sharing performance issues)