Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: True-Random – Generate “truly” random numbers (github.com/jaybosamiya)
31 points by jaybosamiya on May 10, 2016 | hide | past | favorite | 48 comments


> Since the number of times it would be able to flip the bit changes due to random fluctuations in time due to context switching of processes, this generates an arguably truly random bit (I would love to see a PoC that shows otherwise, however).

This wouldn't be true random. It's just using the time jitter introduced by context switching to introduce entropy. Similar to many other pRNGs that use system entropy, mouse movements, etc in order to seed the pRNG.

A pRNG is 'p' because if you know the conditions used, you can deterministically predict the outcome. The difficulty of recreating those conditions has nothing to do with being truly random.


'fuck, not again' - said the cryptographer. The title of the project is potentially very misleading.

I know most of you take this stuff seriously in your codes and rely on the well know cryptographically secure random number generators: https://en.wikipedia.org/wiki/Cryptographically_secure_pseud...


CSPRNGs require random seeds. They're the obvious right thing when you can generate 256 (or whatever your security level is) bits that are truly unpredictable, independent, and of equal probability 0 and 1, but when generating significantly more is hard. If you can't generate 256 truly unpredictable bits, the CSPRNG doesn't help you much. And if you can generate an unbounded number of them, it's not clear that there's a downside to just using the TRNG data (but there's certainly the argument that it's better-studied and more robust to use a CSPRNG anyway).

Whether this particular method of random bit generation is in fact sound is a different question entirely (and worth asking of all ways to generate those seeds, including the Linux kernel's entropy divination code).


It seems a bit ambitious to call this true random without any analysis of randomness quality or predictability.

I find it very unlikely this will be shown to be better than existing RNG solutions.

It's clever, but clever in the way that sleep sort is clever, at least until proven to be of actual benefit.


It's expensive; it spins for a millisecond to obtain one bit.

Suppose this is running on a quiescent system with no interrupts or other tasks running. I can see it degenerating into deterministic behavior. Suppose the RTC counter and the CPU clock are from the same master clock, and the code is executing cleanly to the clock from on-chip caches.

Ultimately, the question is: is there really one bit of entropy from each call to get_bit(), and under what conditions?


It spins for a millisecond to have a chance of obtaining half a bit actually. It takes two clock calls to create a chance of getting a bit, and only if the bit pattern changed during that period.


It can create a whole bit per spin, but my guess is that the output was predictable enough for the author to notice, so they added a single round of the most basic random scrubber. 'Hey it looks random now, ship it!'


From that perspective, I think we can use the methodology in this code as an input into SHA3 / Keccak or Skein (A SHA3 finalist).

I dunno Keccak, so I'll talk from a Skein perspective. If every millisecond you added the "nanoseconds" field with the Skein cryptofunction salt = mix(nanoseconds + salt + current seed + other Skein stuff), it'd be pretty darn random. (Apparently Keccak can 'sponge up' entropy somehow, but I don't know the mechanism that it does it with)

The output would at least be a crypto-secure PRNG. All that needs to be proven after that fact is whether or not enough entropy was being gathered from the real time clock.


Analysis of the output can only disprove randomness. For example, not knowing the key it is impossible to condemn the deterministic output of AES-CTR (assuming the properties of AES hold).

The author really needs to cut back the grandiose claims. Projects like this are more part of the learning process, than something useful to others.


This looks to be the same technique as Dan Kaminsky's DakaRand, including the debiasing: https://dankaminsky.com/2012/08/15/dakarand/

See also Kaminsky's implementation of the same approach in pure JS: https://gist.github.com/PaulCapestany/6148566

and Ryan Finnie's implementation in Perl: http://www.finnie.org/2012/08/14/twuewand-2-0-released/

The big concern I have is how reliable this is on virtual machines. Just about all physical machines I'd want to use have high-quality, trustworthy-within-my-threat-model (i.e., "if the NSA wanted to attack my silicon, there's easier silicon for them to attack") hardware random number generators, and all physical machines I'd want to use pick up sufficient randomness from the kernel's entropy magic thing. But virtual machines often don't have access to the hardware RNG, and they don't have access to enough other hardware to populate entropy. It seems like this technique would be particularly risky there ... although I don't think anyone's published an attack on DakaRand yet, so maybe it's fine!


It's definitely based on my approach, but it's missing the concept a bit. The only way this approach gets entropy is if you cross two clocks at very different speeds, and get randomness from the mismatched tolerances. For example, using a computer's microsecond accurate clock to measure a human's 100 millisecond scale behavior yields bits, because we can't be microsecond accurate even if we try.

The bitflipping I was exploring involved matching the CPU clock (nanosecond scale) with the real time clock (millisecond scale). This of course has some risk because the OS can easily implement the latter with the former. And in fact, in this implementation, that's actually what happens -- he's measuring the number of bit flips at nanosecond accuracy. Output is distinguishable from PRNG, as seen elsewhere.

If I remember right somebody did break my "Defcon Challenge" with Firefox.


The answer for VMs is guest OS drivers that taps into the host's system RNG (with support for doing so in the VM software).


Exactly. Having a root, high-quality RNG whose output can be input into other systems directly or seeding a CRNG is a classic, design pattern of mine. It can be used for dedicated, simple machines too predictable for TRNG (eg RTOS's), virtual machines, and probably other things I haven't thought of.


Do the common VM hosting services (AWS, DigitalOcean, etc.) offer such a thing?

Unfortunately, a superstitious RNG is more useful to a production service than a theoretical one.


I'm really hesitant to even consider this without it being run through the Diehard Tests[0], since from my understanding, "True Randomness" should be cryptographically secure should this be used in a CSPRNG.

[0]: https://en.wikipedia.org/wiki/Diehard_tests


rng-tools includes a command for testing random number generator output. I tried this with a FreeBSD port from https://github.com/waitman/rngtest

Results aren't great:

    rngtest: bits received from input: 300032
    rngtest: FIPS 140-2 successes: 0
    rngtest: FIPS 140-2 failures: 15
    rngtest: FIPS 140-2(2001-10-10) Monobit: 15
    rngtest: FIPS 140-2(2001-10-10) Poker: 15
    rngtest: FIPS 140-2(2001-10-10) Runs: 15
    rngtest: FIPS 140-2(2001-10-10) Long run: 0
    rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
    rngtest: input channel speed: (min=236.690; avg=245.396; max=251.742)bits/s


And with more data:

After nearly 8 hours:

    rngtest: bits received from input: 6762248
    rngtest: FIPS 140-2 successes: 156
    rngtest: FIPS 140-2 failures: 182
    rngtest: FIPS 140-2(2001-10-10) Monobit: 182
    rngtest: FIPS 140-2(2001-10-10) Poker: 174
    rngtest: FIPS 140-2(2001-10-10) Runs: 160
    rngtest: FIPS 140-2(2001-10-10) Long run: 0
    rngtest: FIPS 140-2(2001-10-10) Continuous run: 0


Yikes.


A reference from the diehard tests page. The dieharder tests:

http://www.phy.duke.edu/~rgb/General/dieharder.php

A relatively recent (2013) library implementation of the diehard tests plus additional tests.

Some things to note:

The OP implementation is going to be VERY dependent on OS/language/etc. For example even if it works in C it won't necessarily work in JVM. It probably will not work on a microcontroller.


This is almost certainly not more random than this: http://www.fourmilab.ch/hotbits/

It is a cool idea.


Or than

https://www.random.org/

Which even includes real-time statistics on the quality of the numbers.


Or

https://qrng.anu.edu.au/

That takes his number from random fluctuations from quantic processes.


The debiasing idea is due to von Neumann himself:

https://en.wikipedia.org/wiki/Fair_coin


How does checking `get_fair_bit(0)' that the next bit isn't the same as the current help fairness?


This is a known method for a simple way of getting an unbiased bit from a series of (possibly biased) coin flips.

Stated another way, the algorithm for getting an unbiased bit from a biased coin is:

1. Flip the coin twice.

2. If it's both heads or both tails, discard result and go back to 1.

3. If it's heads then tails, result = 0

4. If it's tails then heads, result = 1


This assumes the bias can't change over time. This works for a coin because you're using the same coin for all flips, you just don't know the bias. This falls apart in this library because in the computer the source can be affected differently over time -- an attacker can effectively switch out your coin on every flip, rendering this whole process useless.


Yep, I agree. I do think the claims made in the readme are way too strong for what they are. This is only an explanation for what that small part of it does.

EDIT: And you're right about the bias changing making this useless. I hadn't thought of that.


I'm not convinced Von Neumann debiasing is vulnerable like you think. You're dropping runs (00's and 11's) so you don't care if it's 90% 0 or 90% 1, you care if there's an even number of a run or an odd number of a run.

In other words:

0000000000000001 000000001 001

...are all interchangeable. So biases can change all day.

There are _other_ issues I've seen in real world data, but not this one.


So I have the ability to arbitrarily change the bias on every flip, and you don't think debiasing is affected?

How about this: for the next string of generated bits, for every even bit the bias is 90% towards 1, and every odd bit the bias is 90% towards 0.

raw: 0101010101010111010101010001010101010101

'debiased': 1111111111111111


Ah, there you go. Repro code:

#/usr/bin/python

from random import *

count=0

def get_bit():

   global count

   count+=1

   if (count%2)==0:

      return random()<0.9

   else:

      return False
def get_fair_bit():

   while 1:

      a=get_bit()

      b=get_bit()

      if a!=b: return a
while 1:

   print get_fair_bit()
I'm wondering how often this one shows up the field -- my interest here comes from the ~1% failure rate of RSA keys in the field because manufacturers actually will not install hardware RNG or externally inject key material. Field vulnerability trumps religion. It looks like you need a really stateful vuln -- something akin to "a 1 is always followed by a 0 due to power drain" wouldn't be enough because the number of 0's is still acceptably unbound.

I know there are debiasers that look at statistics across a group. Wonder what else is out there.


i.e. an edge detect algorithm.


Assume a fixed bias: A coin flips 90% heads/ 10% tails. To make it 50/50, you can demand either a HT to make a heads, or TH to make a tails. Since HT and TH have the same odds of occurring, you get a 50/50 chance.


What if get_bit happens to be deterministic on a particular machine? Then get_fair_bit would be stuck in an infinite loop. This can potentially happen when CPU is so overloaded that executing the bit-flipping instruction takes longer than a millisecond.


Could you not build a "truly" random number generator using a quantum computer?


Wouldn't an example of a truly random number generator be to open up a pseudo random webpage (say something like www.engadget.com/page/<pseudo_random_number>) and do a modulo count of the number of whitespace separated words/tokens?


This post reminded me of this other project: https://github.com/dasmithii/RCRand (a similarly silly but fun race condition based RNG)


The only thing in the world that might be classified as truly random is wavefunction collapse under observation, and even then we're not sure.


Or radiation.


this is not random.... it is deterministic.

typical round-robin times are on the order of 10ms, so you would have a significant amount of non-context-switched 'random' numbers, which when combined with analysis using the cycle counter.... namely a counter running at the clock-speed of your processor would yield a very non-random value.


The periodic interrupt frequency of the system this is run on will have an impact on the numbers produced.


You have to ship the hardware with it.


ugly tests

1. ("true"-random) 10 iterations; 32 random bytes | Result Avg Entropy: 4.82

  $ echo $(echo -n "("; for i in $(seq 1 10); do echo -n $(./generate_constant_stream |head -c 32 |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/10") |bc -l
  4.81945500000000000000
2. ("true"-random) 100 iterations; 32 random bytes | Result Avg Entropy: 4.37

  $ echo $(echo -n "("; for i in $(seq 1 100); do echo -n $(./generate_constant_stream |head -c 32 |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/100") |bc -l
  4.37395291000000000000
3. ("true"-random) 200 iterations; 32 random bytes | Result Avg Entropy: 4.35

  $ echo $(echo -n "("; for i in $(seq 1 200); do echo -n $(./generate_constant_stream |head -c 32 |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/200") |bc -l
  4.34563333000000000000
1. (openssl) 10 iterations; 32 random bytes | Result Avg Entropy: 4.88

  $ echo $(echo -n "("; for i in $(seq 1 10); do echo -n $(openssl rand 32 |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/10") |bc -l
  4.88125000000000000000
2. (openssl) 100 iterations; 32 random bytes | Result Avg Entropy: 4.87

  $ echo $(echo -n "("; for i in $(seq 1 100); do echo -n $(openssl rand 32 |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/100") |bc -l
  4.87404420000000000000
3. (openssl) 200 iterations; 32 random bytes | Result Avg Entropy: 4.88

  $ echo $(echo -n "("; for i in $(seq 1 200); do echo -n $(openssl rand 32 |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/200") |bc -l
  4.87885575000000000000
1. (/dev/urandom) 10 iterations; 32 random bytes | Result Avg Entropy: 4.82

  $ echo $(echo -n "("; for i in $(seq 1 10); do echo -n $(head -c32 < /dev/urandom |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/10") |bc -l
  4.82264100000000000000
2. (/dev/urandom) 100 iterations; 32 random bytes | Result Avg Entropy: 4.89

  $ echo $(echo -n "("; for i in $(seq 1 100); do echo -n $(head -c32 < /dev/urandom |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/100") |bc -l
  4.88655640000000000000
3. (/dev/urandom) 200 iterations; 32 random bytes | Result Avg Entropy: 4.88

  $ echo $(echo -n "("; for i in $(seq 1 200); do echo -n $(head -c32 < /dev/urandom |ent |head -1 |awk '{print $3}')"+"; done; echo -n "0)/200") |bc -l
  4.88061280000000000000


Can you explain what do these results mean?


The values presented are "N bits per character". 8 is the maximum, 0 is the minimum. The less value is the worse. The ugly test results show that "true"-random tool does not perform better than openssl / /dev/urandom. What makes it even worse is that the entropy decreases with the amount of iterations. These are simple bash tests, you can try on your own, playing with the values. Though, I suggest to understand some theory in the first place:

- randomness https://en.wikipedia.org/wiki/Randomness

- entropy https://en.wikipedia.org/wiki/Entropy_(computing)

- ent http://www.fourmilab.ch/random/


I think it is even worse than I thought...

I've asked the "true"-random tool to give me 3000 of the really "true" numbers (as it claims) and out of 3000 it has thrown to me ~9% of char(0) and ~9% of char(255) values (see the fractions below), whilst the others <0.01% per char between char(1-254).

  ./generate_constant_stream |head -c3000 |ent -c
  Value Char Occurrences Fraction
  0              309   0.103000
  1               33   0.011000
  ...
  ...
  253   �           28   0.009333
  254   �           29   0.009667
  255   �          312   0.104000
  Total:          3000   1.000000

  Entropy = 6.861709 bits per byte.

Update

Running it without the char(0)/char(255) neither did outperform /dev/urandom, but only running terribly slow (~5mins on i7) and using 100% of a CPU core:

  $ ./generate_constant_stream | cat - | sed -u 's/\x00//g;s/\xff//g' |head -c3000 |ent -c
  Entropy = 7.737683 bits per byte.

  $ cat /dev/urandom | sed -u 's/\x00//g;s/\xff//g' |head -c3000 |ent -c
  Entropy = 7.924493 bits per byte.


Ego has infected this thread. We can't just say that this project is interesting. We have to talk about "expensive" computation. Why? What does "ambition" have to do with computer science?


We detached this subthread from https://news.ycombinator.com/item?id=11669106 and marked it off-topic.


You are ego. The comment you replied to stated only a fact.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: