As Yoric mentioned, that would require some custom Qemu patch to corrupt memory,...

inetknght · on April 7, 2021

Writing defensively is great! There's nothing wrong with that.

If you're writing code to defend against "dodgy RAM" whose symptom is a corrupt SHA then it's super trivial to test that code: have a valid JAR file and adjust its content so that it will still "load" but isn't the file you expect. Then verify that the program refuses to load the JAR.

KMag · on April 7, 2021

> have a valid JAR file and adjust its content so that it will still "load" but isn't the file you expect. Then verify that the program refuses to load the JAR.

Sorry for being a bit slow today. but what's the difference between "load"ing and loading? You clearly understand that you're using two different definitions of load, otherwise you wouldn't have put one in quotes. (Also, it would be self-contradictory if the two were identical.) However, I can't for the life of me figure out two different definitions for "load" which both involve CRC-32 and bytecode verification checks, and yet aren't identical.

If I'm correct, we saw a dramatic reduction in the "impossible" errors not because we were detecting the actual problem that caused the errors (corruption of internal JVM buffers after bytecode validation), but instead because our SHA-1 failures (loading chunks of files into our own buffers and checksumming those) were highly correlated with the "impossible" failures because they had the same underlying cause.

In our case, the JVM loaded the code from the JAR, which, unless I misunderstand Sun/Oracle's classloader, means that both CRC-32 checksums and bytecode verification pass, even for unsigned JARs. The chances of random bit flips passing CRC-32 checks are 4 billion to one. The odds of random bit flips passing both a CRC-32 check and bytecode verification are probably at least quadrillions to one. I'm pretty sure the corruption was happening minutes to hours after the classes actually got loaded by the JVM.

In other words, I think we were getting saved by SHA-1 summing a bunch of large files being a poor-man's memtest86: writing semi-random-ism patterns to big chunks of memory and verifying you can read back what you wrote. Failing this correlates heavily with seeing "impossible" errors at runtime, but we weren't directly examining the classloader's internal buffers that were the actual cause of the errors.

On a side note, modern OSes should really have a low rate memory checker built into the kernel's memory manager. It's a crutch, but cheap consumer IoT devices and WiFi APs aren't getting ECC any time soon.

inetknght · on April 7, 2021

The difference is that "load" means that the file is otherwise acceptable (valid header, valid content) but that it's been adjusted so that the checksum isn't what's described in the manifest.

That is:

1) the JAR file successfully loads by the JVM -- as in it will "load"

2) the JAR file doesn't pass the SHA checksum requirement you've added -- as in it isn't the file you expect.