Hacker News new | past | comments | ask | show | jobs | submit login

There's two kinds of SMART errors: soft errors, like read retries, occur in normal operation and tell you more about the environment the drive is operating than the health of the drive. Hard errors, like Reallocated sectors, never occur in normal operation so the first time the drive logs one that means the drive is failing and you need to replace it. Monitoring for the first kind of error is pointless, but closely monitoring for the second kind of error will improve your chances of replacing a drive before it loses your data. It's just unfortunate that the error thresholds built into drives won't throw a SMART failure warning before the drive is completely dead.



It's a very different story for SSDs. Those, you definitely do want to monitor in case your workload is burning through the rated write endurance faster than planned for. And reallocated sectors are usually not an urgent problem: a handful early in the life of the drive can be a normal consequence of vendors not aggressively testing (excessively wearing out) a drive before it leaves the factory, and a steadily increasing number as a drive approaches end of life is expected behavior.

But SMART errors usually won't help you know when you're about to lose an SSD to a catastrophic firmware bug, which for many use cases is the more likely cause of death.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: