Hacker News new | past | comments | ask | show | jobs | submit login

Bit flips can happen, but regardless if they can get repaired by ECC code or not, the OS is notified, iirc. It will signal a corruption to the process that is mapped to the faulty address. I suppose that if the memory contains code, the process is killed (if ECC correction failed).



> I suppose that if the memory contains code, the process is killed (if ECC correction failed).

Generally, it would make the most sense to kill the process if the corrupted page is data, but if it's code, then maybe re-load that page from the executable file on non-volatile storage. (You might also be able to rescue some data pages from swap space this way.)


If you go that route, you should be able to avoid the code/data distinction entirely; as data pages can also be completly backed by files. I believe the kernel already keeps track of what pages are a clean copy of data from the filesystem, so I would think it would be a simple matter of essentially pageing out the corrupted data.

What would be interesting is if userspace could mark a region of memory as recomputable. If the kernel is notified of memory corruption there, it triggers a handler in the userspace process to rebuild the data. Granted, given the current state of hardware; I can't imagine that is anywhere near worth the effort to implement.


> What would be interesting is if userspace could mark a region of memory as recomputable.

I believe there's already some support for things like this, but intended as a mechanism to gracefully handle memory pressure rather than corruption. Apple has a Purgeable Memory mechanism, but handled through higher-level interfaces rather than something like madvise().




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: