Hacker News new | past | comments | ask | show | jobs | submit login

Thanks, this is fascinating.

Re-reading your comment above, I had missed that you were assigning the output of the hash function back into the state. I.e., something like:

    state = initial_seed
    def random():
        state = hash(state)
        return state
Whereas I was proposing something more along the lines of:

    seq_num = 0
    def random():
        result = hash([initial_seed, seq_num])
        seq_num += 1
        return result
If I understand correctly, hash collisions (which are inevitable) will cause the former to loop around in a cycle shorter than the size of the theoretical state space. Whereas the latter (I don't think?) suffers from this. But it may still have bad statistical properties, depending on the hash function.

For what it's worth, I did a (very informal) comparison of MurmurHash3 vs SipHash when I started my approach and found that MurmurHash3 (despite being advertised as passing the avalanche test) gave very statistically biased results. Something that should have generated a uniform distribution definitely did not. Whereas when I tested SipHash the output looked (at least to an untrained eye) essentially indistinguishable from a true random source.

Your parallel comment seems to indicate that there isn't a lot of great practical reading on this topic; I don't suppose you've seen a discussion anywhere going through anything like the approaches above, and whether there is a way to do it "properly"?




Yeah, you got it. The construction you used isn't susceptible to this particular flaw, though it still has others (e.g. predictability of the state updates, related-input attacks, etc). If the state size is reasonably close to the output size, you might still have statistical issues for similar reasons that the iterative construction fails. Whether any of these are relevant depends on the context.

I've been meaning to write some kind of public exploration on this stuff so people can tell me some other source explains it better, but haven't made the time to do more than write code like the blake2b demo I had handy to spit out numbers.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: