Monday 5 February 2024

Exploring the 'randomness' of /dev/urandom, and xored derivatives

 For quite a lot of cryptography, you need a good source of random numbers, particularly for one-time pads. It's difficult to get fast sources of natural event based numbers, so some sort of compromise is common.

Using /dev/urandom as found on linux and other Unix systems, like macOS, is generally considered quite a good source.

*** Update  - 19th March 2024 ***

The article stands, but more investigation has showed me that the major problem with /dev/urandom is that it is highly biased against repeated numbers. This is not representative of a good pseudo-random generator.

Compared to 𝛑, /dev/urandom gives very few repeated numbers ( 00, 66, 99 etc.), and hardly any occurences of multiple repeats (00000, 66666, 99999). This is a weakness to cryptanalysis.

So, if using /dev/urandom, it would be wise to process all sequences to add ('at random') repeats of numbers, allowing even long strings of repeats, maybe up to 8 repeated numbers, inserted at pseudo-random intervals.

*** End of Update ***

The initial statistics

To get a good characterisation of a sequence of 'random' numbers, you need quite a lot of them, so I generated 152,256,840,500 four digit hex numbers, like this:

od -x -An  </dev/urandom >fill_with_urandom

The 1.52 * 10^11, or just over a tenth of a billion, numbers take up 709Gb, and took 10 hours, 3 minutes and 59 seconds to generate on a Mac mini M2. 

To get an idea of the shape, I counted the number of times each hex number, from 0000 to ffff occurred. That's 65536 numbers. 

So the expected frequency would be 2323255 for each number, if they turned up equally often. The statistics give:

Min:              2316807  

Max:     2329592 

Mean:     2323255 

Standard Deviation:    1529.303 

Median:     2323255 

Fivenum:             2316807 2322219 2323255 2324284 2329592 

Skewness:     0.0004868033  

Kurtosis:             3.004057


As you can see, the mean is exactly that, and the standard deviation only 1529, which is only .0658%. The mean and median are the same, which also shows a very even spread of frequencies.

If the figures were distributed exactly in a normal distribution, the Skewness would be 0 and the Kurtosis 3. So it is extremely close.

Using XOR to produce a new sequence of numbers

In theory, if you xor one random sequence with another, you ought to get another, quite different random sequence. I thought it would be interesting to do just this.

First I produced a new sequence by xoring each number with its neighbour, doing it in groups of eight - making up the eighth number by xoring the first with the eighth. Using the same Mac mini M2, this took 34 hours, or  1 day, 10 hours, 13 minutes and 44 seconds.

The same statistics as above, gives:

Min: 2317159 

Max: 2330445 

Mean: 2323255 

Standard Deviation: 1530.588 

Median: 2323262 

Fivenum: 2317159 2322224 2323262 2324284 2330445 

Skewness: 0.001143814 

Kurtosis: 3.001915 

This is impressively different. The mean and median are different. However, the spread is almost exactly the same, 1530, instead of 1529 for the standard deviation.

What I found interesting were the last two.

The Skewness is   0.001143814  compared to 0.0004868033 the first being 2.35 times bigger! Though they are tiny numbers, that is a big difference.

The Kurtosis is 3.001915 compared to 3.004057, again, small, but the difference is about double.


Using XOR a second time

I now performed exactly the same operation as above, but using the new sequence as a starting point. This gave:

Min: 2316959 

Max: 2330963 

Mean: 2323255 

Standard Deviation: 1523.088 

Median: 2323254 

Fivenum: 2316959 2322234 2323254 2324272 2330963 

Skewness: 0.005033493 

Kurtosis: 3.03291 

The standard deviation at 1523 is similar to the previous two  1530, 1529 but different enough to show a different shape.

This is confirmed, with:

                       1st                         2nd                    3rd

Skewness:  0.001143814   0.0004868033.     0.005033493 

Kurtosis:    3.001915          3.004057            3.03291 

This time both are very different to the previous two - the skewness five times the first set, and the kurtosis difference ten times greater. All three showing very different shapes.


The final test - xoring the previous sequence

 Instead of over each set of eight numbers, taking five lines at a time, and xoring sets of 40 numbers at a time (with the 40th result being the 1st xored with the 40th). This gives:

Min: 2317132 

Max: 2330652 

Mean: 2323255 

Standard Deviation: 1523.587 

Median: 2323256 

Fivenum: 2317132 2322226 2323256 2324275 2330652 

Skewness: 0.00861003 

Kurtosis: 3.021 

The standard deviation is very similar to the previous run,  1523.587 vs  1523.088. The mean and median again differ by only one, but in the opposite direction.

Then: 

                       1st                         2nd                    3rd                4th

Skewness:  0.001143814   0.0004868033.     0.005033493.  0.00861003 

Kurtosis:    3.001915          3.004057            3.03291           3.021

Conclusions 

These results, using 152 milliard four digit hexadecimal numbers give /dev/urandom a fairly thorough test, showing it is, indeed, highly stochastic.

The experiments using xor, show that xoring a sequence of numbers, with itself, produces another random sequence that is significantly different, including having a different shape, but is still a good pseudo-random sequence.

The latter two sequences do show an increased skewness, and a greater distance from a standard deviation.  In a sense, suggesting they have become more stochastic.

It looks as if it would be interesting to try a final experiment, with, the 4th sequence, but, maybe, taking it in batches of 80 numbers.