It could be made to work on Git, but you'd need to make a collision that include...

sverige · on Feb 24, 2017

> It would simply restore the wrong file without noticing anything was amiss.

Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is vulnerable for over a decade, but that vulnerability was dismissed with a lot of hand-waving over at Git. Is it a very difficult technical problem to switch, or just a problem of backward compatibility for existing repos (i.e., it would be expensive to change everything over)?

JoshTriplett · on Feb 24, 2017

> Is it a very difficult technical problem to switch, or just a problem of backward compatibility for existing repos (i.e., it would be expensive to change everything over)?

A bit of both. Git has an ongoing effort to replace everywhere in the source code that passes around SHA1s as fixed-size arrays of bytes with a data structure. That'll make it possible to replace the hash. But even with that work, git will still need to support reading and writing repositories in the existing format, and numerous interoperability measures.

jordigh · on Feb 24, 2017

Mercurial hasn't switched either for similar reasons, although the format did reserve space for a bigger hash (32 bytes over sha-1's 20 bytes) since 2006, less than a year into hg's existence.

glandium · on Feb 25, 2017

Worth noting, though, that while mercurial local storage can accomodate larger hash digests, the exchange protocols don't.

andy_ppp · on Feb 24, 2017

Linus suggested changing the hash algorithm doesn't need these changes, just take the first 160 bytes of a SHA2 or whatever and use that. The chances of collisions would still be less than SHA1.

JoshTriplett · on Feb 24, 2017

You'd still have to do the second half of the work, namely building the interoperability and migration mechanisms.

Ericson2314 · on Feb 24, 2017

That's the far larger half too.

glandium · on Feb 24, 2017

... and policy. Because now you can have collisions between sha-1 and whatever else you pick.

Dylan16807 · on Feb 25, 2017

Not yet you can't.

yuhong · on Feb 24, 2017

Any 160-bit hash can generate a collision in 2^80 time though.

rspeer · on Feb 24, 2017

I think the more important collision to worry about in 2^80 time is the Earth colliding with the Sun.

Our interstellar successors, if any, will probably have found something better than Git to use.

EDIT: I should be clear that I'm not making the usual silly claim that we don't need to worry about hashes being broken because they take forever to brute force. I'm saying that hashes will be broken, but not by brute forcing the entire hash space. A decade or so of cryptographic research will save you eons of compute time.

Dylan16807 · on Feb 25, 2017

2^80 time is not as much as you think it is. The bitcoin network is currently calculating about 3 * 2^60 hashes per second. It can do 2^80 hashes in under a week.

The 2^80 space cost of doing a birthday attack is a lot more notable, but it's not unfeasible either. Yearly hard drive production is somewhere around 2^70 bytes. You're not pulling that attack off in the year 2017 but a big budget you could get probably get there in a few decades.

rspeer · on Feb 25, 2017

Checking 2^80 hashes is indeed faster than I thought, thanks for the stats.

But checking 2^80 hashes and writing them to long term storage is still ridiculous. That budget should still go to hiring cryptographers, not buying hard drives.

mintplant · on Feb 24, 2017

> Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is vulnerable for over a decade

Back then Linus shot this down in his typical abrasive fashion:

http://www.gelato.unsw.edu.au/archives/git/0504/0885.html

abalone · on Feb 24, 2017

His style is a super abrasive unnecessary power trip but this key point is relevant:

> It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want.

The key phrase being "looks halfway sane". Git doesn't just look at the hash. It looks at the object structure too (headers) and that makes it highly resistant to weaknesses in the crypto alone. His point essentially is you should design to expect crypto/hash vulnerabilities, and that's a smart stance, as they are discovered every few years.

cesarb · on Feb 25, 2017

> It looks at the object structure too (headers)

Linus was not talking about the object headers, but about the object contents. It's harder to make the colliding objects look like sane C code, without some strange noise in the middle (which wouldn't be accepted by the project maintainers).

Yes, it's a "C project"-centric view, but consider the date: it was the early days of git. The main way of receiving changes was emailed patches, not pull requests. Binary junk would have a hard time getting in. And even if it did get in, the earliest copy of the object wins, as long as the maintainers added "--ignore-existing" to the rsync command in their pull scripts (yeah, this thread seems to be from before the git fetch protocol), as mentioned earlier in the thread.

tristor · on Feb 25, 2017

Honestly, this isn't nearly as abrasive as some things Linus has said and it has some cogent generalized engineering advice mixed in. Certainly not the worst thing he's said. Also, he was correct at the time and left open the possibility of something changing in the future.

kzrdude · on Feb 24, 2017

It's not pretty, it's a bully's style to rely more on forcefulness than good arguments.

paulddraper · on Feb 24, 2017

It hasn't switched because Linus (1) doesn't think anyone would do that and (2) he sees hash collisions only as an accident vector not an intentional attack vector.

> You are _literally_ arguing for the equivalent of "what if a meteorite hit my plane while it was in flight - maybe I should add three inches of high-tension armored steel around the plane, so that my passengers would be protected".

> That's not engineering. That's five-year-olds discussing building their imaginary forts ("I want gun-turrets and a mechanical horse one mile high, and my command center is 5 miles under-ground and totally encased in 5 meters of lead").

> If we want to have any kind of confidence that the hash is reall yunbreakable, we should make it not just longer than 160 bits, we should make sure that it's two or more hashes, and that they are based on totally different principles.

> And we should all digitally sign every single object too, and we should use 4096-bit PGP keys and unguessable passphrases that are at least 20 words in length. And we should then build a bunker 5 miles underground, encased in lead, so that somebody cannot flip a few bits with a ray-gun, and make us believe that the sha1's match when they don't. Oh, and we need to all wear aluminum propeller beanies to make sure that they don't use that ray-gun to make us do the modification _outselves_.

> So please stop with the theoretical sha1 attacks. It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want. Even the "breakage" doesn't actually do that. And if it ever _does_ become true, it will quite possibly be thanks to some technology that breaks other hashes too.

> I worry about accidental hashes, and in 160 bits of good hashing, that just isn't an issue.

palebluedot · on Feb 24, 2017

I think it is worth noting that the quotes in your comment are from 12 years ago: http://www.gelato.unsw.edu.au/archives/git/0504/0885.html

I don't mean this to say that you are being inaccurate, just that his current position seems a little different now:

"Again, I'm not arguing that people shouldn't work on extending git to a new (and bigger) hash. I think that's a no-brainer, and we do want to have a path to eventually move towards SHA3-256 or whatever" http://marc.info/?l=git&m=148787457024610&w=2

paulddraper · on Feb 25, 2017

True.

I was just answering the question "Why hasn't Git switched...People have been warning that SHA-1 is vulnerable for over a decade"

Linus' 12-year-old opinions are the relevant thing for why it hadn't changed. A decade from now, things may be different.

spb · on Feb 24, 2017

He seems to have changed his tune now that he can't behind the "that's only an imaginary possibility" cover: http://marc.info/?l=git&m=148787047422954

> Do we want to migrate to another hash? Yes.

mst · on Feb 24, 2017

Assuming you meant "hide behind", his original attitude seems to be more like "this is sufficiently unlikely in practice that I consider attempting to mitigate it in advance to be overengineering with a higher opportunity cost than it's worth". Which, well, I think when it comes to security stuff he has a nasty tendency to underestimate the risks and thereby pick the wrong side of the trade-off, but to me it's clearly a trade-off rather than something to hide behind.

kelnos · on Feb 24, 2017

I think it's a reasonable assumption that, as computing power increases, hash functions will be broken. Not that they have to be, but it's reasonable to assume that, and I think it's beyond short-sighted for Torvalds to have failed to build a mechanism in git for hash function migration into git from the very start.

pvg · on Feb 24, 2017

Cryptanalytic research is the fundamental thing that broke SHA-1, not simply the increase in available computing power. So that's not really a 'reasonable assumption', if it was, we could 'reasonably' assume SHA-512 will never be broken.

greiskul · on Feb 24, 2017

The point remains, since computing power increases, and cryptanalytic research advances, we really should make sure software that depends on cryptographic hashes has a reasonable way to move to different algorithms. At the very least we could add as a prefix to the resulting hash the name of the algorithm that generated it when we store it.

pvg · on Feb 25, 2017

The advances of research and computing power are vastly outpaced by basic things like digest size. If you came up with a complexity reduction of the order of the one developed against SHA-1 for SHA-256, you won't be able to find any SHA-256 collisions.