> Since the number of times it would be able to flip the bit changes due to random fluctuations in time due to context switching of processes, this generates an arguably truly random bit (I would love to see a PoC that shows otherwise, however).
This wouldn't be true random. It's just using the time jitter introduced by context switching to introduce entropy. Similar to many other pRNGs that use system entropy, mouse movements, etc in order to seed the pRNG.
A pRNG is 'p' because if you know the conditions used, you can deterministically predict the outcome. The difficulty of recreating those conditions has nothing to do with being truly random.
CSPRNGs require random seeds. They're the obvious right thing when you can generate 256 (or whatever your security level is) bits that are truly unpredictable, independent, and of equal probability 0 and 1, but when generating significantly more is hard. If you can't generate 256 truly unpredictable bits, the CSPRNG doesn't help you much. And if you can generate an unbounded number of them, it's not clear that there's a downside to just using the TRNG data (but there's certainly the argument that it's better-studied and more robust to use a CSPRNG anyway).
Whether this particular method of random bit generation is in fact sound is a different question entirely (and worth asking of all ways to generate those seeds, including the Linux kernel's entropy divination code).
It's expensive; it spins for a millisecond to obtain one bit.
Suppose this is running on a quiescent system with no interrupts or other tasks running. I can see it degenerating into deterministic behavior. Suppose the RTC counter and the CPU clock are from the same master clock, and the code is executing cleanly to the clock from on-chip caches.
Ultimately, the question is: is there really one bit of entropy from each call to get_bit(), and under what conditions?
It spins for a millisecond to have a chance of obtaining half a bit actually. It takes two clock calls to create a chance of getting a bit, and only if the bit pattern changed during that period.
It can create a whole bit per spin, but my guess is that the output was predictable enough for the author to notice, so they added a single round of the most basic random scrubber. 'Hey it looks random now, ship it!'
From that perspective, I think we can use the methodology in this code as an input into SHA3 / Keccak or Skein (A SHA3 finalist).
I dunno Keccak, so I'll talk from a Skein perspective. If every millisecond you added the "nanoseconds" field with the Skein cryptofunction salt = mix(nanoseconds + salt + current seed + other Skein stuff), it'd be pretty darn random. (Apparently Keccak can 'sponge up' entropy somehow, but I don't know the mechanism that it does it with)
The output would at least be a crypto-secure PRNG. All that needs to be proven after that fact is whether or not enough entropy was being gathered from the real time clock.
Analysis of the output can only disprove randomness. For example, not knowing the key it is impossible to condemn the deterministic output of AES-CTR (assuming the properties of AES hold).
The author really needs to cut back the grandiose claims. Projects like this are more part of the learning process, than something useful to others.
The big concern I have is how reliable this is on virtual machines. Just about all physical machines I'd want to use have high-quality, trustworthy-within-my-threat-model (i.e., "if the NSA wanted to attack my silicon, there's easier silicon for them to attack") hardware random number generators, and all physical machines I'd want to use pick up sufficient randomness from the kernel's entropy magic thing. But virtual machines often don't have access to the hardware RNG, and they don't have access to enough other hardware to populate entropy. It seems like this technique would be particularly risky there ... although I don't think anyone's published an attack on DakaRand yet, so maybe it's fine!
It's definitely based on my approach, but it's missing the concept a bit. The only way this approach gets entropy is if you cross two clocks at very different speeds, and get randomness from the mismatched tolerances. For example, using a computer's microsecond accurate clock to measure a human's 100 millisecond scale behavior yields bits, because we can't be microsecond accurate even if we try.
The bitflipping I was exploring involved matching the CPU clock (nanosecond scale) with the real time clock (millisecond scale). This of course has some risk because the OS can easily implement the latter with the former. And in fact, in this implementation, that's actually what happens -- he's measuring the number of bit flips at nanosecond accuracy. Output is distinguishable from PRNG, as seen elsewhere.
If I remember right somebody did break my "Defcon Challenge" with Firefox.
Exactly. Having a root, high-quality RNG whose output can be input into other systems directly or seeding a CRNG is a classic, design pattern of mine. It can be used for dedicated, simple machines too predictable for TRNG (eg RTOS's), virtual machines, and probably other things I haven't thought of.
I'm really hesitant to even consider this without it being run through the Diehard Tests[0], since from my understanding, "True Randomness" should be cryptographically secure should this be used in a CSPRNG.
rng-tools includes a command for testing random number generator output. I tried this with a FreeBSD port from https://github.com/waitman/rngtest
Results aren't great:
rngtest: bits received from input: 300032
rngtest: FIPS 140-2 successes: 0
rngtest: FIPS 140-2 failures: 15
rngtest: FIPS 140-2(2001-10-10) Monobit: 15
rngtest: FIPS 140-2(2001-10-10) Poker: 15
rngtest: FIPS 140-2(2001-10-10) Runs: 15
rngtest: FIPS 140-2(2001-10-10) Long run: 0
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=236.690; avg=245.396; max=251.742)bits/s
rngtest: bits received from input: 6762248
rngtest: FIPS 140-2 successes: 156
rngtest: FIPS 140-2 failures: 182
rngtest: FIPS 140-2(2001-10-10) Monobit: 182
rngtest: FIPS 140-2(2001-10-10) Poker: 174
rngtest: FIPS 140-2(2001-10-10) Runs: 160
rngtest: FIPS 140-2(2001-10-10) Long run: 0
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
A relatively recent (2013) library implementation of the diehard tests plus additional tests.
Some things to note:
The OP implementation is going to be VERY dependent on OS/language/etc. For example even if it works in C it won't necessarily work in JVM. It probably will not work on a microcontroller.
This assumes the bias can't change over time. This works for a coin because you're using the same coin for all flips, you just don't know the bias. This falls apart in this library because in the computer the source can be affected differently over time -- an attacker can effectively switch out your coin on every flip, rendering this whole process useless.
Yep, I agree. I do think the claims made in the readme are way too strong for what they are. This is only an explanation for what that small part of it does.
EDIT: And you're right about the bias changing making this useless. I hadn't thought of that.
I'm not convinced Von Neumann debiasing is vulnerable like you think. You're dropping runs (00's and 11's) so you don't care if it's 90% 0 or 90% 1, you care if there's an even number of a run or an odd number of a run.
In other words:
0000000000000001
000000001
001
...are all interchangeable. So biases can change all day.
There are _other_ issues I've seen in real world data, but not this one.
global count
count+=1
if (count%2)==0:
return random()<0.9
else:
return False
def get_fair_bit():
while 1:
a=get_bit()
b=get_bit()
if a!=b: return a
while 1:
print get_fair_bit()
I'm wondering how often this one shows up the field -- my interest here comes from the ~1% failure rate of RSA keys in the field because manufacturers actually will not install hardware RNG or externally inject key material. Field vulnerability trumps religion. It looks like you need a really stateful vuln -- something akin to "a 1 is always followed by a 0 due to power drain" wouldn't be enough because the number of 0's is still acceptably unbound.
I know there are debiasers that look at statistics across a group. Wonder what else is out there.
Assume a fixed bias: A coin flips 90% heads/ 10% tails. To make it 50/50, you can demand either a HT to make a heads, or TH to make a tails. Since HT and TH have the same odds of occurring, you get a 50/50 chance.
What if get_bit happens to be deterministic on a particular machine? Then get_fair_bit would be stuck in an infinite loop. This can potentially happen when CPU is so overloaded that executing the bit-flipping instruction takes longer than a millisecond.
Wouldn't an example of a truly random number generator be to open up a pseudo random webpage (say something like www.engadget.com/page/<pseudo_random_number>) and do a modulo count of the number of whitespace separated words/tokens?
typical round-robin times are on the order of 10ms, so you would have a significant amount of non-context-switched 'random' numbers, which when combined with analysis using the cycle counter.... namely a counter running at the clock-speed of your processor would yield a very non-random value.
The values presented are "N bits per character".
8 is the maximum, 0 is the minimum.
The less value is the worse.
The ugly test results show that "true"-random tool does not perform better than openssl / /dev/urandom. What makes it even worse is that the entropy decreases with the amount of iterations. These are simple bash tests, you can try on your own, playing with the values. Though, I suggest to understand some theory in the first place:
I've asked the "true"-random tool to give me 3000 of the really "true" numbers (as it claims) and out of 3000 it has thrown to me ~9% of char(0) and ~9% of char(255) values (see the fractions below), whilst the others <0.01% per char between char(1-254).
Running it without the char(0)/char(255) neither did outperform /dev/urandom, but only running terribly slow (~5mins on i7) and using 100% of a CPU core:
$ ./generate_constant_stream | cat - | sed -u 's/\x00//g;s/\xff//g' |head -c3000 |ent -c
Entropy = 7.737683 bits per byte.
$ cat /dev/urandom | sed -u 's/\x00//g;s/\xff//g' |head -c3000 |ent -c
Entropy = 7.924493 bits per byte.
Ego has infected this thread. We can't just say that this project is interesting. We have to talk about "expensive" computation. Why? What does "ambition" have to do with computer science?
This wouldn't be true random. It's just using the time jitter introduced by context switching to introduce entropy. Similar to many other pRNGs that use system entropy, mouse movements, etc in order to seed the pRNG.
A pRNG is 'p' because if you know the conditions used, you can deterministically predict the outcome. The difficulty of recreating those conditions has nothing to do with being truly random.