Hacker News new | past | comments | ask | show | jobs | submit login

It could be made to work on Git, but you'd need to make a collision that included the git blob header. The resulting files would not have the same SHA-1 hash until the header was added though, so they wouldn't be useful except for testing Git itself.

My guess is that Git wouldn't be 'hosed' like SVN, since it currently doesn't have a secondary hash to detect the corruption. It would simply restore the wrong file without noticing anything was amiss.




> It would simply restore the wrong file without noticing anything was amiss.

Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is vulnerable for over a decade, but that vulnerability was dismissed with a lot of hand-waving over at Git. Is it a very difficult technical problem to switch, or just a problem of backward compatibility for existing repos (i.e., it would be expensive to change everything over)?


> Is it a very difficult technical problem to switch, or just a problem of backward compatibility for existing repos (i.e., it would be expensive to change everything over)?

A bit of both. Git has an ongoing effort to replace everywhere in the source code that passes around SHA1s as fixed-size arrays of bytes with a data structure. That'll make it possible to replace the hash. But even with that work, git will still need to support reading and writing repositories in the existing format, and numerous interoperability measures.


Mercurial hasn't switched either for similar reasons, although the format did reserve space for a bigger hash (32 bytes over sha-1's 20 bytes) since 2006, less than a year into hg's existence.


Worth noting, though, that while mercurial local storage can accomodate larger hash digests, the exchange protocols don't.


Linus suggested changing the hash algorithm doesn't need these changes, just take the first 160 bytes of a SHA2 or whatever and use that. The chances of collisions would still be less than SHA1.


You'd still have to do the second half of the work, namely building the interoperability and migration mechanisms.


That's the far larger half too.


... and policy. Because now you can have collisions between sha-1 and whatever else you pick.


Not yet you can't.


Any 160-bit hash can generate a collision in 2^80 time though.


I think the more important collision to worry about in 2^80 time is the Earth colliding with the Sun.

Our interstellar successors, if any, will probably have found something better than Git to use.

EDIT: I should be clear that I'm not making the usual silly claim that we don't need to worry about hashes being broken because they take forever to brute force. I'm saying that hashes will be broken, but not by brute forcing the entire hash space. A decade or so of cryptographic research will save you eons of compute time.


2^80 time is not as much as you think it is. The bitcoin network is currently calculating about 3 * 2^60 hashes per second. It can do 2^80 hashes in under a week.

The 2^80 space cost of doing a birthday attack is a lot more notable, but it's not unfeasible either. Yearly hard drive production is somewhere around 2^70 bytes. You're not pulling that attack off in the year 2017 but a big budget you could get probably get there in a few decades.


Checking 2^80 hashes is indeed faster than I thought, thanks for the stats.

But checking 2^80 hashes and writing them to long term storage is still ridiculous. That budget should still go to hiring cryptographers, not buying hard drives.


> Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is vulnerable for over a decade

Back then Linus shot this down in his typical abrasive fashion:

http://www.gelato.unsw.edu.au/archives/git/0504/0885.html


His style is a super abrasive unnecessary power trip but this key point is relevant:

> It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want.

The key phrase being "looks halfway sane". Git doesn't just look at the hash. It looks at the object structure too (headers) and that makes it highly resistant to weaknesses in the crypto alone. His point essentially is you should design to expect crypto/hash vulnerabilities, and that's a smart stance, as they are discovered every few years.


> It looks at the object structure too (headers)

Linus was not talking about the object headers, but about the object contents. It's harder to make the colliding objects look like sane C code, without some strange noise in the middle (which wouldn't be accepted by the project maintainers).

Yes, it's a "C project"-centric view, but consider the date: it was the early days of git. The main way of receiving changes was emailed patches, not pull requests. Binary junk would have a hard time getting in. And even if it did get in, the earliest copy of the object wins, as long as the maintainers added "--ignore-existing" to the rsync command in their pull scripts (yeah, this thread seems to be from before the git fetch protocol), as mentioned earlier in the thread.


Honestly, this isn't nearly as abrasive as some things Linus has said and it has some cogent generalized engineering advice mixed in. Certainly not the worst thing he's said. Also, he was correct at the time and left open the possibility of something changing in the future.


It's not pretty, it's a bully's style to rely more on forcefulness than good arguments.


It hasn't switched because Linus (1) doesn't think anyone would do that and (2) he sees hash collisions only as an accident vector not an intentional attack vector.

> You are _literally_ arguing for the equivalent of "what if a meteorite hit my plane while it was in flight - maybe I should add three inches of high-tension armored steel around the plane, so that my passengers would be protected".

> That's not engineering. That's five-year-olds discussing building their imaginary forts ("I want gun-turrets and a mechanical horse one mile high, and my command center is 5 miles under-ground and totally encased in 5 meters of lead").

> If we want to have any kind of confidence that the hash is reall yunbreakable, we should make it not just longer than 160 bits, we should make sure that it's two or more hashes, and that they are based on totally different principles.

> And we should all digitally sign every single object too, and we should use 4096-bit PGP keys and unguessable passphrases that are at least 20 words in length. And we should then build a bunker 5 miles underground, encased in lead, so that somebody cannot flip a few bits with a ray-gun, and make us believe that the sha1's match when they don't. Oh, and we need to all wear aluminum propeller beanies to make sure that they don't use that ray-gun to make us do the modification _outselves_.

> So please stop with the theoretical sha1 attacks. It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want. Even the "breakage" doesn't actually do that. And if it ever _does_ become true, it will quite possibly be thanks to some technology that breaks other hashes too.

> I worry about accidental hashes, and in 160 bits of good hashing, that just isn't an issue.


I think it is worth noting that the quotes in your comment are from 12 years ago: http://www.gelato.unsw.edu.au/archives/git/0504/0885.html

I don't mean this to say that you are being inaccurate, just that his current position seems a little different now:

"Again, I'm not arguing that people shouldn't work on extending git to a new (and bigger) hash. I think that's a no-brainer, and we do want to have a path to eventually move towards SHA3-256 or whatever" http://marc.info/?l=git&m=148787457024610&w=2


True.

I was just answering the question "Why hasn't Git switched...People have been warning that SHA-1 is vulnerable for over a decade"

Linus' 12-year-old opinions are the relevant thing for why it hadn't changed. A decade from now, things may be different.


He seems to have changed his tune now that he can't behind the "that's only an imaginary possibility" cover: http://marc.info/?l=git&m=148787047422954

> Do we want to migrate to another hash? Yes.


Assuming you meant "hide behind", his original attitude seems to be more like "this is sufficiently unlikely in practice that I consider attempting to mitigate it in advance to be overengineering with a higher opportunity cost than it's worth". Which, well, I think when it comes to security stuff he has a nasty tendency to underestimate the risks and thereby pick the wrong side of the trade-off, but to me it's clearly a trade-off rather than something to hide behind.


I think it's a reasonable assumption that, as computing power increases, hash functions will be broken. Not that they have to be, but it's reasonable to assume that, and I think it's beyond short-sighted for Torvalds to have failed to build a mechanism in git for hash function migration into git from the very start.


Cryptanalytic research is the fundamental thing that broke SHA-1, not simply the increase in available computing power. So that's not really a 'reasonable assumption', if it was, we could 'reasonably' assume SHA-512 will never be broken.


The point remains, since computing power increases, and cryptanalytic research advances, we really should make sure software that depends on cryptographic hashes has a reasonable way to move to different algorithms. At the very least we could add as a prefix to the resulting hash the name of the algorithm that generated it when we store it.


The advances of research and computing power are vastly outpaced by basic things like digest size. If you came up with a complexity reduction of the order of the one developed against SHA-1 for SHA-256, you won't be able to find any SHA-256 collisions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: