ECC is good, and I genuinely wish it were more common. Thankfully, Ryzen CPUs support ECC by default (except for pre-7000 series with integrated graphics that aren't "Pro" versions), so long as the motherboard does, too (like all ASRock that I've seen). I'm running several Ryzen servers with ECC.
On the other hand, there are many, many systems out there that don't have ECC, nor do they have the option to have ECC. While every video on Youtube wants us to believe that the difference between 580 and 585 frames per second in some silly game or another makes all the difference in the world, for me the difference between a system that runs 10% slower and one that crashes in the middle of the night is actually significant. I test all my systems at a certain memory frequency, then back off to the next slower frequency just to be sure.
That doesn't stop memory errors from happening, but most systems have lived their entire lives without having random crashes or random segfaulting. I consider that worthwhile.
Crashes in the middle of the night are not what worries me. Who cares. It's silent data loss that can go unnoticed for a very long time. And not just a single bit. If the flip hits file system structures or file layout you can have massive silent data loss.
Yep. It's why ZFS, BTRFS, Ceph and Gluster matter. Being able to detect that data at rest has gone wrong, and being able to reconstruct the original state is a big deal.
I'd like to think that as NAND continues to scale up in capacity and lower in cost, that we'll see some real shakeup to filesystems and storage where self-healing mass storage can be genuinely commoditized -- not something that's only accessible to businesses (and computing enthusiasts) due to cost and complexity.
Absolutely, but these are only half the solution. You still have to be sure that the data you're passing to the filesystem is not already corrupted in memory.
Indeed. I have a very low power storage server with tons of ECC, and the better consumer grade NASes also tend to have it. Again, the issue here is its functionally limited to enthusiasts today because of cost and the average person being completely unaware of the impact.
My hope as we move into more advanced fabrication nodes is the increasing shift to HBM in the data center space starts to at least create an HBM option in the consumer side of things. I expect at least AMD to try that push in 2027 and beyond, and I’m sure Intel is looking hard at it too. Granted, Apple is already there with its higher end silicon.
Granted, I still expect there to be product line segmentation with ECC, as it’s a good lever to push a buyer into a higher end product. Though when it’s done on package, you at least eliminate the need for a main board to actually have the traces, and the external modules to have the extra memory. So it might be the easier route to get to more ECC in the consumer space, at least for mid-range and up personal computers.
My most precious personal data are my family pictures. I protect them with par2. These are basically checksums for your files, but they provide so much added information that you can also repair your files if they are damaged.
Once I coded a shell script that verified all my photos, but I don’t bother with that anymore. I just back everything up, and if there’s ever a problem, the parity files provide an additional safety net.
> so long as the motherboard does, too (like all ASRock that I've seen).
I built a home server last year with an ASRock X570M Pro4 [0] with a Ryzen 4750 PRO (which I had to source OEM from Aliexpress as it's not sold direct). I'm not sure what's the current situation, but the only RAM I could find for it was the Kingston Server Premier KSM32ED8 [1], and the ECC premium was not fun to pay.
not just system support. availability of modules is bad.
got a HP that have both an AMD pro apu and ddr5 slots, with no soldered ram. i e. all the requirements.
it was $500 to 1500 depending on configuration. then 16 or 32gb of ecc sodimm runs over $2000 for regular consumers! and that's if you can find them in stock!
I think you overstate the problem here. Chances are, unless you’ve addressed other more pertinent issues, simply using ECC memory isn’t going to stop systems from crashing in the middle of the night.
It's pretty easy for those to be race conditions, too. Plenty of one time crashes in a fleet of thousands of machines with ECC. ECC lets you know it's almost certainly not a memory issue.
10% more fps doesn't matter at 150 fps, but it's nice when your FPS is lower. 60 -> 66 might mean you don't dip below 60 as often. 55 -> 60.5 is pretty nice too. Maybe less of a deal if you've got VRR etc.
What game runs at only 60 FPS because of a RAM issue? I know I only have a 3600, but if a game is running at 60 FPS it's 99% because of my GPU, not the RAM.
Most games still run at 90+ FPS, I would love to have ECC RAM to prevent a potential one-off crash or just to know that the RAM didn't report an error when it happened. I would pay money for this!
Better yet, the 3d cache CPUs don't care about RAM speed as much, according to benchmarks
> you can't tell the difference with 150 FPS and 165 FPS
First byte latency makes cache misses significantly slower which in turn makes 99%ile latency (which is perceived as microstutter by humans) significantly higher even if it doesn't affect throughput (fps). This was well documented way back when the first DDR5 sticks came out and they performed like crap compared to overclocked B-Die DDR4.
On-Chip ECC, it's an improvement but full ECC memory, which you can get for DDR5, also protects your data in transmission at 6400MT/s.
Additionally the on-chip EEC of DDR5 won't report the errors to your OS. ECC memory errors when corrected can be handled by the OS, and you'll even be informed of the uncorrectable 2 bit errors.
Want to protect for 2-bit errors? Make sure your platform has support for ECC-chipkill.
While technically true, that DDR5 comes with "on-die ECC", it is only because the memory is so unreliable it will not work properly without it.
However, even then, it is not the same as true ECC that have a extra data correction chip on the memory module and also protects against send errors to the CPU.
10% performance difference in exchange for maybe crashing slightly more often would be huge for people who only really use their PCs for gaming.
HN readers seem to have a skewed idea of how useful ECC is while pretending the downsides don't exist. Not everyone is primarily using their system as a workstation.
On the other hand, there are many, many systems out there that don't have ECC, nor do they have the option to have ECC. While every video on Youtube wants us to believe that the difference between 580 and 585 frames per second in some silly game or another makes all the difference in the world, for me the difference between a system that runs 10% slower and one that crashes in the middle of the night is actually significant. I test all my systems at a certain memory frequency, then back off to the next slower frequency just to be sure.
That doesn't stop memory errors from happening, but most systems have lived their entire lives without having random crashes or random segfaulting. I consider that worthwhile.