When working in a higher-level language, it's easy to forget that a lot of the performance disadvantage comes from using data types that are distanced from the hardware, require unboxing, RTTI, etc. Not only do those conveniences eat up CPU, they also eat up memory, which compounds the problem by forcing the data to live outside of cache, regardless of what optimizations the language implementation has.
Hence, I really appreciate it when a language offers some way to drop down to byte-level constructs and build space-efficient implementations where they're necessary. Improvements in memory compactness can often save you the effort of dropping all the way down to C.
Indeed. I had a recent situation where I was building a trie for prefix search, and found that object overhead / pointer cost was so expensive (a whole 8 bytes per instance in 32 bit - really adds up and worse in x64) I dropped down to using parallel integer arrays to store all data, and even then, I was able to squeeze out more space by going to a custom bit-packed array class.
Hence, I really appreciate it when a language offers some way to drop down to byte-level constructs and build space-efficient implementations where they're necessary. Improvements in memory compactness can often save you the effort of dropping all the way down to C.