Hacker News new | past | comments | ask | show | jobs | submit login

What I don't understand about the URE is from [2]. If you have a 12TB raid 5 array and you need to rebuild. If 10^14 approaches an URE at around 12TB of data as the article says. What causes it to hit 12TB? Each disk has 10^14 which is 100TB. If you had 12TB from 4x3TB disks it should have alot to go through.



10^14 is bits. When you divide 10^14 by 8 you get 12.5 trillion bytes, or 12.5 TB.

If you have 4x 4TB drives in RAID5, and one fails, then in order to rebuild with a replacement drive, you have to read all data from all surviving drives (3 x 4TB = 12TB).

Here is an example from a manufacturer:

http://www.seagate.com/files/www-content/product-content/con...

They call it "Non-recoverable Read Errors per Bits Read" and then list "1 sector per 10^15". So for every 10^15 bits read they expect 1 sector to be unreadable.


OK i see now but if it needs to read 12TB from all drives together then that's still far from 12.5TB per drive which is the limit. That's where I am confused.


Say we have a department of a company with 36 employees, and one pair of dice. We decide that if any person out of the entire department rolls a 12, then everyone in the department will be fired. The chance of rolling a 12 is 1/36. It doesn't matter if one person keeps rolling the dice, or if they take turns, the chances of everyone being fired are close to 100%.

The same is true for a disk array. Each read operation is an independent event (for the purpose of doing this math). The chance of one URE happening is 1/(10^14) for every bit read. It doesn't matter which disk it happens on. When it happens, the entire array is failed.

Also 12.5 TB is not a hard limit, just an average. The URE could happen on the very first read operation, or you might read 100 TB without a URE.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: