Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A mathematical trick allows people to scatter their computer files (economist.com)
58 points by eru on Sept 10, 2008 | hide | past | favorite | 23 comments


Reed-Solomon coding is an old fella. Whenever you have a link which is unreliable and you can't afford to retransmit packets on the link when errors are introduced, RS is your friend. Mobile phones are among the prime users of this.

If I remember correctly, the PAR/PAR2 formats used on usenet is using RS-encoding as well.

An alternative would be to plot the file in N-dimensional space and define a set of vectors to pinpoint it. When you have enough vectors you have the precise pinpoint. Additional vectors gives the error-correction capability. Some microsoft guys played with this idea for Bittorrent-like networks a while back. But there is a disadvantage in the time it takes to decode the data, and it probably doesn't help the swarm that much :/

Another interesting viewpoint: We might need RS-encoding on the local harddisks soon (implemented in hardware or software), as it would circument the bit-error rate problem with those disks.



Hard disks already use error correction, but you can add more if the uncorrectable error rate isn't low enough for you.


What I am wondering about is how much faster TCP could be with RS for recovery instead of the current resend-packet technique.


I would guess it would be slower and you would break the internet. You would have to introduce enough redundancy to cope with the worst tolerable loss rate which would increase the number of bits to transmit. Worse, it is the noticing of dropped packets that tells TCP to slow down and decongest a link. If enough senders fail to decongest then packet loss on the congested links skyrockets wasting bandwidth elsewhere and doing silly things like favoring the sender with the biggest pipe.


Obviously you can't just eliminate congestion control, and the coding rate should be adaptive to reduce overhead.

At least one startup has gone broke on this idea already, but maybe it's possible to do it right.


The economist doing error correction codes?? Are you guys sure the LHC didn't do anything to the universe?


I'm not sure whether I'm excited that this was in the economist, or pissed off that they reduced error correction codes to "a mathematical trick"


We have an open source project http://allmydata.org that has been doing this for quite awhile. I'm also involved in the commercial side which does online storage and we've been running a business on a P2P backend (nice low costs) with non-peer clients. We tried a business model with a full peer grid and users were extremely uncomfortable storing "data" from other people on their computers. Possibly the market is better educated now and/or more used to this idea, but it may be a hard sell.


Erm.. I see tons of comments about the 'maths trick' behind the tech.. but have any of you tried out the app cos it's really amazing! A great idea, great execution. If this gets the news coverage it deserves then this could be huge I think.


We learned about a Hamming distance at University. But I could never figure out when what or why it should be used. It was either predicting the future, or just sending more bits to compensate for error.

But what if you get errors in the new bits? It's daft.


Beyond a certain error rate, you will definitely end up with bad data. The point is, with error detecting or correcting codes, you're introducing redundancy by encoding the information into more bits than minimally required to represent that information.

The simplest form is adding a parity bit, which allows you to detect (not correct) up to one bad bit. (so, say 1/8 bits or 12.5% if you store a byte of information in 9 bits)

Using R-S codes you can crank up the number of bits used for encoding, which also drives up your error tolerance. Plus, in addition to detecting errors, you can even correct them. So it doesn't matter if some bits come up bad (or missing) - the redundancy is spread equally across all of the transmitted/stored bits, so it's irrelevant which bits suffer from the failure. There aren't any "old" or "new" bits.


Even if this is all worked out to be amazingly effective.. how are you going to convince regular users to put their data on other peoples' computers?

Yes, I realize that it's all put into chunks so people won't be able to snoop on them, but just try getting that concept past my mom.

It's neat but I'd rather my data on my encrypted and fast S3 account.


The marketing angle is that your data is stored in "the cloud", where sometimes the cloud is a data center and sometimes it's P2P. People won't fear what they don't know.


which happens to be on amazons computers ;) face it, more and more of your data will be living on other peoples computers from here on out. best we can hope for is encrypt everywhere.


Why is S3 different? "My data is on other people's machines" is still the case there. Encrypt it before you send out the blocks, and you're exactly where S3 is.


You know, most people seem JUST FINE with their files scattered on other people's computers.

Just look at webmail!


Trust a company you know with your data != trusting strangers. For example if I suspect Google is mishandling my email I can take them to court. What do I do when I don't even know the people that have my data?


I think I'm being misunderstood here.

People don't seem to understand the concept of WHERE data is housed very well. My mom probably doesn't know that the email she reads is stored on Yahoo's servers. To her there is just a screen that she types into and her email is there.


That seems fairly odd to me. Most people at least understand that since they have to go to Yahoo to access their email Yahoo is storing a copy of their email. The exact details of how that email is being stored might not be relevant. People have traditionally entrusted companies with lots of data and there is usually the expectation that a company will not allow it to be misused.

A good analogy might be a bank. I know my money is in the bank. I don't know exactly where or how the bank keeps my money, but I would not like it very much if I found out they actually let their employees take a small chunk of my money home for safe keeping in order to reduce storage costs.


Seriously, go talk to some real people. You're way overestimating the average user. Many of them (cough my family cough) don't know the difference between Windows and Office. Many of them don't understand that the internet doesnt consist of that big blue 'e'. Etc. They're not stupid, they just don't care.

Most people consider a computer a magic box. To them it's about as mysterious as the internal workings of a processor are to the average programmer. (Explain why you need to increase voltage as a processor shrinks, all other things being equal, for example. hint: it has to do with quantum effects.)

Also, banks not only give out the money that you stored, they don't even give it to employees. They invest it in loans and such. -- in other words, they give it to complete strangers who they suspect have a good chance of bringing it back with a bit extra.


All of this is very interesting to me. I've talked to my parents and family friends about computers, none of which are computer savvy at all, but I've yet to encounter anyone that thinks computers are a magic box, thinking that IE icon == internet, etc (even though they don't understand for example that the difference between the URL bar and the google search text box).

I guess it just shows just how little I understand of the "average" user (I'm glad I don't have to design applications for them).


The anonymous p2p-project Freenet does similar forward error correction --- and did it for ages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: