I implemented the Skein hash in a crypto class at my uni. What is remarkable about that hash is that it has a "hash tree" mode which provides an interesting opportunity for parallelization and doing hashing of partial data. In contrast, many traditional hash algorithms are inherently sequential by nature.
On the other hand, as Mr. Schneier points out in the article, the Skein hash utilizes a modified Threefish block cipher, while many of the SHA-3 contestants were AES-based (edit: seems like none of the finalists are). Now we have a hardware AES implementation shipping in mainstream processors, so it gives an edge to the AES-based hash functions out there.
edit: I went through the list of finalists and it seems none of them actually use the whole AES block cipher, although several of them use AES S-boxes or other parts of AES.
I believe they worked towards speed mostly and whilst AES is in alot of hardware the basis of there design was to be hardware independant and with that in mind you can see why no design bias was placed upon what is or not currently enabled in some hardware. In a world were we still have no standard in ragards of BIG or LITTLE ENDIAN then you can see they took the right approach.
Only aspect that Mr Schneier has not highlighted and which you touched upon is that some of the hashing finalists are faster and in that they will in brute force comparisions be not as good. That is another consideration, albeit one of limited time as you already said AES is now in some hardware as instructions and how long since AES being a standard has that taken.
But the whole ability to work on partial blocks and that is what you need to do to gain from parallelisation does open up a while new ballgame in modifiying hashed code and returning the same hash. Now if you can pick a partial block size and then do a brute force test modifying from 0-FF each byte of the block incrementaly for the every permutation and only get a match on the original then that will by design be the one I'd be picking, i'm sure they will be testing each parallel finalist in that way. But personaly if they all pass that test then why pick one winner and have them all as part of the standard SH-3.[1-5]. After all if they all work well then having the choice can only add to the entropy we call security these days.
Again: the "brute force" thing you're mentioning here is a non sequitur. It's akin to suggesting that Intel should manufacture slower chips to make passwords hard to crack. Hash functions should be as fast as possible; it's the job of a KDF construction to slow the process down to resist brute forcing.
It's the aspect of being able to change part of the data and only have to brute force that part of the data so if you have a block of data that you can modify and rehash check until you get the same hash you have a whole new issue with hashing by design of being able to work on partial block and paralise things. No I'm not suggetsing intel should make slower chips, I am suggesting that any design of hashing that allows you to modify a small part of the entire lump of data/code and you only have to recheck the hash of that small block instead of the entire lump of data/code then even if they hashing of X bytes took the same or even slower than SH-2 you will still by design induce a weaker system that is open to a easier attacks than any system were you have to rehash the entire block of data sequentualy. Nothing to do with KDF!
I can't wrap my head around the point you're trying to make so I'm just going to repeat my previous statement with slightly different words:
In the field of cryptography, holding all else constant, faster hash functions are better.
It is not the job of a hash function to resist incremental brute force attacks on the input domain of the function; that is a job left to higher-level constructions.
It is exactly misconceptions like this one that are the reason we lobby for "high level" cryptography interfaces that don't expose things like hash function cores to programmers.
Point I was making is if you have 100 bytes and you block that data every 10 bytes and hash those block and use those 10 block hash's to make the final hash value for those 100 bytes then if you only change data in one block then you would only have to modify that block until you got the same hash for that block knowing that the final hash will be the same.
Now having thought this thru I concluded that if you blocked in a striped fashion every Nth bytes then you could still have the ability to paralise the hashing without compromising security. So in the same example of 100 bytes you would have in block 1 bytes 1,11,21,31,... and in block 2 it would be 2,12,22,32,42... etc. In this way the ability to modify the data/program in a data/program meaningful way and maintain the overall hash would be as hard as hashing the entire lot of 100 bytes.
However having said that a quick look at the skien hash it would appear it uses threefish for its blocking and from what I can tell the aspect of striping the blocks or using contiguase blocks is something that is not defined and down to how it is implemented and in that I suspect they don't strip the data blocks as that appraoch would not be conducive to hashing small communications or rolling data like streamed data were you only know the block of data you have and in that without padding the data you have little scope of breaking it up into procesable blocks as it already is the block size. But that is a issue that stands currently, it is only for example say a fat Linux.ISO say with nice hash check were the approach of striping the data blocks would be benificial to negate the ability to work on just the blocks you are changing to maintain the overall same hash value for the entire file.
I hope that explains the point I was trying to make, nothing to do with cyptoanalysts vs programmers at all. But if programmers don't know about such issues then they will feed the data a block at a time contigiously. I agree that data in and data out is all a programmer should need at API level as anything else is adding the role of cryptographer to a job which is paid at programmer rates.
If I understand what you're saying, it's that a hash defined as e.g. (with b1, b2, ... standing for blocks) H(b1, b2, b3) = H(b1) ^ H(b2) ^ H(b3) is not very secure because it allows one to recover b2 from H(b1, b2, b3), b1 and b3 in time 2^(|b2|) ("2 to the power length-of-b2"). This is obviously true, but no sensible hash function is defined in this way, and I don't think any of the SHA candidates use blocks of a size that can easily be bruteforced.
Just to put this in perspective: SHA1 has a 512 bit block. Nobody is brute forcing 2^512. MD5 has an output size of 128 bits; nobody is brute forcing 2^128 either.
It is trivial to create constructions where changing a single bit of the data requires recomputing the hash over all of the data. A really simple option to that end is to feed the hash back into itself multiple times. Please go read a paper on PBKDF2 and/or scrypt - they are designed to work with an arbitrary hash function be impossible to parallelize.
ok read about PBKDF2 and what your desfribing/on about is key stretching or hash feedback cycles and is a great appraoch to take for what it does. Though for what I was on about it would still be a case of working out the hash for a smaller part than the sum of the entire data, but as I have thought though. If you stripe the data blocks across the sum of the data then the ability to modify a small part of the data would still result in you taking just as long to work out the hash for the entire sum of data. It is when blocking is done in contiguous blocks (100 byte data block 1 is bytes 1-10, block 2 is 11-20 etc) then is were you have the issue I describe and yes what you say about hash feedback is correct in that for any block it will add a extra time factor but it will still be less than the entire lot of data and if you can erhash 10 bytes instead of the entire 100 bytes it can only be quicker. But if those blocks are stippped then you would get the best of both worlds and have the ability to parallise your hashing without fear of being any weaker by design, but thats only if you stripe the data blocks you paralise and not if you use contiguise blocks. I'll have a good read of these finalists and more over the weekend for this, though far better brains than I wrote them so I expect this is me just learning what they already know.
Zenst - it sounds like you have a lot of interest in cryptography, and your lack of familiarity with PBKDF2 and friends suggests that you have just entered this space.
Thank you and signed up. I have interests in too many things that cross over, but this is one area I do need to step back a bit and learn the lingo a bit more. Looking forward to this course now.
You can speed up the first iteration, but after that there is NO similarity in the output so you can't re-use previous computations.
Also, as other posters pointed out, faster hashes are good for everything except password hashing, which is not the biggest use. In the case of password hashing, if the hash algorithm is twice as fast you can just run it twice as may times, so it doesn't hurt for it to be faster.
Actualy if the data is split in a way such as every Nth byte is used to make up a block (ie 100 byet data and 10 byte block you use bytes 1,11,21,... in block one and 2,12,22... in block 2 etc) would negate the issue I would have with this completely. Its not case of feeding hash back into itself its the case of entropy within the block and if that entrypy contains non contiguous bytes via stripeing then the abilty to modify the code or data in any nefarious way would be as hard as modifying the entire hash so I think I've eliminated my own concerns. Though of to read up on what you reference but if they refer to adding salts then that is a seperate issue to do with small data bytes. Thank you for putting up with my brain farts.
> But personaly if they all pass that test then why pick one winner and have them all as part of the standard SH-3.[1-5]
That would be horrible for security and ease of implementation. Anyone implementing hashes would then need to implement 10 different hashes, some way to specify hash used, keep them all performant and patched against risks like DPA, ... . It is exactly that kind of kitchen-sink approach which is responsible for many flaws in SSL, PGP, etc. to date, and which makes implementing real-world systems such a pain.
OTOH if you don't specify which hash is used, you'll be stuck using SHA1 for all time, or with a messy transition. Nobody knows yet what to do about git as SHA1 becomes more broken..
The right way to do this IMO is big version upgrades. Like V1 of protocol uses a specific hash, message format, etc., V2 might use another, etc. It's when you allow arbitrary combinations of pieces that complexity becomes absurd.
A lot of this happened due to the ITAR export crap where you needed to offer pluggable keylengths and algorithms for compliance, and then people had the insane idea of supporting arbitrary algorithms in every protocol.
Brute forcing a partial block is just as hard as bruteforcing the whole thing - you still need to create a second string which hashes to the same as the first one.
Simply trying all the values from 0x00 to 0xFF will (statistically) never result in 2 blocks with the same value since you are only bruteforcing 8 bits and the output of the hash is 512. The chance of two arbitrary blocks matching, regardless of length, is 1/2^512.
I'm a bit surprised that Schneier is advocating for "no award". Even if the SHA-3 candidates are not fundamentally better than SHA-512, we really do need a standardized algorithm that has built in protection from length extension attacks.
A hash function immune to length extension attacks is more fool-proof than HMAC because there is no additional construct required. It's also faster because it requires no additional passes over data.
If I remember correctly (I may not), two SHA-2 functions (SHA-224 and SHA-384?) aren't vulnerable to length extension attacks.
While I agree with you that this is an immediately important feature, I don't think Bruce's premise (that SHA-2 is still pretty good) is invalid.
Perhaps like you though, I don't understand why a new standard can't be incremental. I think it's silly to wait until something major happens to change.
The problem with a new standard is that it may induce many to start using it on the grounds that "newer is better", which may not actually be the case: the SHA-2 algorithms have withstood more scrutiny so far.
> If I remember correctly (I may not), two SHA-2 functions (SHA-224 and SHA-384?) aren't vulnerable to length extension attacks.
Interesting, is that because they only return part of the final state (by slicing sha-256 and sha-512) where unsliced 256 and 512 return all of the algorithm's running state as its result?
That's the only reason I can think of why they would be immune to length extension attacks. With SHA-224 one could just brute force the missing 32 bits of state, though.
I can envision a great Samsung-ish commercial about waiting in line to get the new hash - one of you creative types could have fun with that spoof (of a spoof)
Interesting that the reason for SHA-3 has been missed in that the finalists offer no better way to hash with the main difference being some are faster and some slower than the best SH2 variations.
What does this mean, well in effect no extra value is being directly offered, sure some have extra abilities by design like being more able to liberate parallel processing by sbeing able to split the data to be hashed into chunks and work on partial blocks of the final data and use the results to get the final hash result. That is nice.
But when it comes to brute forcing then being faster works against you, also the ability to work on partial chunks of the data allows you to modify the code and rechecking the partial hash for the part your changing until you get the same result, this alows you to do nasty things to code and get the official hash answear alot easier than having to rehash the end result every time and getting the same result or modifying the code to get the same result (usualy have area you jump over all nop and modify that to influence the hash, but more sane ways to do this but offtopic).
So in essence any hash that can be run faster in any way will make it weaker in terms of brut forcing (yes I know people assume there passwords will be the last one on the list to be checked bia brute forcing and assume if it takes 10 years to test all variations then there password is 10 years strong, you see the flaw in mentality there).
Now NIST still have an opertunity here and it is a simple, tried and tested approach and that would be to have all finalists winners and have them all in the standard as variations. This then allows end users/admins to pick there variation of choice or even perish the thought allow mixed usage so say your /etc/password file could have some users using one variation, others using another, etc. Whilst it add's no obvious extra benifit, it will allow more variations and in that fallbacks/choice and that is what n BIT encryption/hashing is all about, each bit being a choice in a way.
So in summary I believe NIST should let them all win and have SH3.n with n being the variation of finalist, let them all win, choice is good and that is what n bit encryption is after all, extra choices.
Ugh. Being faster does not work against secure hash functions! Holding all else equal, faster is invariably better.
What you're thinking of are password hashes, which are a variant/application of key derivation functions (KDFs). KDFs often use secure hash functions, which is where the confusion comes from.
You want your core crypto to be as fast as it conceivably can be, because you want to be making progress towards a state where all communications are encrypted by default.
In some instances yes faster is always betetr and take AES, whilst initialy was slower, now that it is enabled in hardware it is now alot faster.
The point I was making is that it is a concideration and the general mentality is that the larger the version number then the better it is and a point the original article was making in that none of them are any better than what is on offer with regards to security. Its is tha aspect of being able to get a large hashed file and modify part of that file and recheck just that partial HASH without having to rehash the whole thing. This for comminucations starts to open up a faster way to modify encrypted communications as by changing a small part you only have to rehash that part and know the final block is still ok. This is a area which makes by design any hash function can work with partial blocks, less secure.
So fast is good but it often comes as a compromise against security and any new standard should at least be better than what it is designed to replace and not open up whole new avenues of attack.
> was making in that none of them are any better than what is on offer with regards to security
In this case, possibly. It is quite clear by now that SHA-1 and MD5 are flawed, so the 'higher version' SHA-2 variants (especially the bigger ones) should be preferred.
> So fast is good but it often comes as a compromise against security and any new standard should at least be better than what it is designed to replace and not open up whole new avenues of attack.
Brute force attacks against 512 bit hashes are not practical today, and won't be practical for a long time. The concern with password storage is seldom pure brute force attacks, but rather attacks against a dictionary. This is because, for password storage, the input entropy is typically much less than 512 bits (or even 128 bits). It's a completely different use case.
> Its is tha aspect of being able to get a large hashed file and modify part of that file and recheck just that partial HASH without having to rehash the whole thing.
Is this an argument against hash trees? Can you explain more about the this potential attack? It seems to be to be equivalently hard to finding preimages.
>Is this an argument against hash trees? Can you explain more about the this potential attack? It seems to be to be equivalently hard to finding preimages.
If you only have to rehash a branch as apoosed to the entire tree and match hash's then you have a easier time as it is alot faster by design.
Now if the way the hashing work is that only say 1234567 will get the hash value 11 and no other variation then i will have no issues and welcome this as a great achievement and pure brilliance. But I dont feel this is the case and nor could it be by reducing any large amount of entropy into a shorter definition and that is what a hash function does after all and one hash value will match more than the original data.
Actualy if you pick the block to paralise striped (ie 100 bytes and block 1 is every 10th byte so byte 1,11,21,31... and block 2 is 2,12,22,32....) then the ability to modify the code/data in any nefarious means would be as hard (if not harder ) than having to rehash the entire lot.
The only issue with this approach of blocking is that it works on all the data and as such would for example be no use for streaming which is a terrable exmaple but you get the idea.
Just pulling off a preimage attack on a partial block is as hard as pulling off a preimage attack on the whole thing - you still need to find two inputs which hash to the same output.
Can you recommend a concise taxonomy of crypto that spells out these kinds of distinctions clearly? I think it would be particularly interesting to see these "X uses Y" (ie. KDF uses secure hash) relationships spelled out, to better understand the "primitives" of crypto and how more specialized functions are built on top of them.
I would never have thought of a password hash as a KDF, because you don't use the key for anything except to compare equality. I also wouldn't have thought that an important property of a KDF is for it to be slow. In the case that you're using a KDF to compute an actual encryption/decryption key, this property does not seem important, because the output (the key) is just as sensitive as the input (the password).
Password hashes and key derivation functions can be totally different - key derivation functions only need to be slow if they're intended for low-entropy input, while password hashes in no way need to maximize entropy (e.g. "bcrypt, then 128 zeroes" is a perfectly fine password hash, but I wouldn't want to use the result as e.g. an AES key.)
In practice, though, it's desirable for password hashes to maximize entropy, which makes them usable as key derivation functions; and the key derivation functions that you usually need take passwords as input.
> So in essence any hash that can be run faster in any way will make it weaker in terms of brut forcing
Hash functions have many, many uses beyond password storage. For most of those uses they become more, not less, useful with speed. Fast hash functions are good for everybody. If you explicitly want a slow expensive construct, then chose one properly designed to be slow and expensive. Don't just chose a crappy hash function.
> So in summary I believe NIST should let them all win and have SH3.n with n being the variation of finalist, let them all win, choice is good and that is what n bit encryption is after all, extra choices.
I completely disagree. An organization like NIST has a responsibility to have an opinion on what the 'best' hash function is. They then need to track the state of the art of research that might invalidate that decision, and clearly communicate changes in the decision. While there is a defense-in-depth argument to be made for multiple options, the pattern seems to have been that there are a lot more systems broken because they chose a poorly studied or known bad algorithm than by breaks being found in previously known-good algorithms. We have a lot to lose from everybody making it up as they go along.
If NIST pick just one and they later find a issue via research in that one solution then the whole standard is dust and if SH3 becomes no more. If they pick more than one as I said all of them for example then if one of those vatriations is found to be flawed later on then that subset can be dust and the SH3 standard and implementation can carry on moving on without have the issue of suddenly having nothing to fallback upon. Sure there are other standards that currently can be fallen back upon but if some kit supports SH3 only then with SH3 having variations as a standard can only be better thing than not. These finalists have been tested alot already, more so than previous so it is not a case of making it up appraoch at all, they would of had to pass alot of standard/hurdles to get this far. But what I'm saying is if they all pass all the tests then picking a winner gets down to other nuances and the abilty to have more than one winner add's more rubustness to the standard in that any single solution found later on to have a flaw would negate the whole standard as apposed to a subsection of it by having more than one option, hence the SH3.n.
That all said if everybody agreed on everything then life as we know it would be boring and we would all be fighting over the same women at some stage, which would not work out well.
"That all said if everybody agreed on everything then life as we know it would be boring and we would all be fighting over the same women at some stage, which would not work out well."
This would be especially awkward, since apparently she would also be fighting over herself and presumably would just elope with herself.
You do not want standards that have good parts and bad parts. If SHA3 blows up, then they pick something else and call it SHA4. And then everybody can look at the label on the box and know whether it's good or bad without having to read the entire list of ingredients.
> But when it comes to brute forcing then being faster works against you,
We are probably still talking less than an order of magnitude. So that slowness isn't going to save the day in theory. It might in practice but if it comes that close, the implementation will be deemed broken and something else will be advocated.
However a slow function means a lot of cumulative power and time wasted in the years to come to execute this new hash function. So I'd opt for a faster one.
Schneier picked a very misleading headline here. I was wary when I saw that the NIST page he links regarding the timeline still hasn't been updated since June, and then I saw him reply in the comments:
"> When will SHA3 be announced? Were you given special information the rest of us don't have access to?
I have no inside information on when SHA-3 will be announced. My guess is that they've made the decision, and are going over the final rationale again and again.
My guess is that it won't be Skein."
Even though this is the original title, I'd prefer the HN title be edited to something about Schneier hoping NIST will pick no SHA-3.
Out of interest these hash functions can be implemented in very few bytes with 100 being mooted for this skien hash. With that in mind when it comes to brute forcing I do wonder if it would be possible to just brute force a better solution easier than brute forcing a hash, I say that in jest.
But it does make you reliase how much empressive stuff you can do in just a few bytes and what else is out there.
On the other hand, as Mr. Schneier points out in the article, the Skein hash utilizes a modified Threefish block cipher, while many of the SHA-3 contestants were AES-based (edit: seems like none of the finalists are). Now we have a hardware AES implementation shipping in mainstream processors, so it gives an edge to the AES-based hash functions out there.
edit: I went through the list of finalists and it seems none of them actually use the whole AES block cipher, although several of them use AES S-boxes or other parts of AES.