Wilderness is Paradise enow: February 2024

For quite a lot of cryptography, you need a good source of random numbers, particularly for one-time pads. It's difficult to get fast sources of natural event based numbers, so some sort of compromise is common.

Using /dev/urandom as found on linux and other Unix systems, like macOS, is generally considered quite a good source.

*** Update - 19th March 2024 ***

The article stands, but more investigation has showed me that the major problem with /dev/urandom is that it is highly biased against repeated numbers. This is not representative of a good pseudo-random generator.

Compared to 𝛑, /dev/urandom gives very few repeated numbers ( 00, 66, 99 etc.), and hardly any occurences of multiple repeats (00000, 66666, 99999). This is a weakness to cryptanalysis.

So, if using /dev/urandom, it would be wise to process all sequences to add ('at random') repeats of numbers, allowing even long strings of repeats, maybe up to 8 repeated numbers, inserted at pseudo-random intervals.

*** End of Update ***

The initial statistics

To get a good characterisation of a sequence of 'random' numbers, you need quite a lot of them, so I generated 152,256,840,500 four digit hex numbers, like this:

od -x -An </dev/urandom >fill_with_urandom

The 1.52 * 10^11, or just over a tenth of a billion, numbers take up 709Gb, and took 10 hours, 3 minutes and 59 seconds to generate on a Mac mini M2.

To get an idea of the shape, I counted the number of times each hex number, from 0000 to ffff occurred. That's 65536 numbers.

So the expected frequency would be 2323255 for each number, if they turned up equally often. The statistics give:

Min: 2316807

Max:    2329592
Mean:    2323255
Standard Deviation:   1529.303
Median:    2323255
Fivenum:            2316807 2322219 2323255 2324284 2329592
Skewness:    0.0004868033

Kurtosis: 3.004057

As you can see, the mean is exactly that, and the standard deviation only 1529, which is only .0658%. The mean and median are the same, which also shows a very even spread of frequencies.

If the figures were distributed exactly in a normal distribution, the Skewness would be 0 and the Kurtosis 3. So it is extremely close.

Using XOR to produce a new sequence of numbers

In theory, if you xor one random sequence with another, you ought to get another, quite different random sequence. I thought it would be interesting to do just this.

First I produced a new sequence by xoring each number with its neighbour, doing it in groups of eight - making up the eighth number by xoring the first with the eighth. Using the same Mac mini M2, this took 34 hours, or 1 day, 10 hours, 13 minutes and 44 seconds.

The same statistics as above, gives:

Min: 2317159
Max: 2330445
Mean: 2323255
Standard Deviation: 1530.588
Median: 2323262
Fivenum: 2317159 2322224 2323262 2324284 2330445
Skewness: 0.001143814
Kurtosis: 3.001915

This is impressively different. The mean and median are different. However, the spread is almost exactly the same, 1530, instead of 1529 for the standard deviation.

What I found interesting were the last two.

The Skewness is 0.001143814 compared to 0.0004868033 the first being 2.35 times bigger! Though they are tiny numbers, that is a big difference.

The Kurtosis is 3.001915 compared to 3.004057, again, small, but the difference is about double.

Using XOR a second time

I now performed exactly the same operation as above, but using the new sequence as a starting point. This gave:

Min: 2316959
Max: 2330963
Mean: 2323255
Standard Deviation: 1523.088
Median: 2323254
Fivenum: 2316959 2322234 2323254 2324272 2330963
Skewness: 0.005033493
Kurtosis: 3.03291

The standard deviation at 1523 is similar to the previous two 1530, 1529 but different enough to show a different shape.

This is confirmed, with:

1st 2nd 3rd

Skewness: 0.001143814 0.0004868033. 0.005033493

Kurtosis: 3.001915 3.004057 3.03291

This time both are very different to the previous two - the skewness five times the first set, and the kurtosis difference ten times greater. All three showing very different shapes.

The final test - xoring the previous sequence

Instead of over each set of eight numbers, taking five lines at a time, and xoring sets of 40 numbers at a time (with the 40th result being the 1st xored with the 40th). This gives:

Min: 2317132
Max: 2330652
Mean: 2323255
Standard Deviation: 1523.587
Median: 2323256
Fivenum: 2317132 2322226 2323256 2324275 2330652
Skewness: 0.00861003
Kurtosis: 3.021

The standard deviation is very similar to the previous run, 1523.587 vs 1523.088. The mean and median again differ by only one, but in the opposite direction.

Then:

1st 2nd 3rd 4th

Skewness: 0.001143814 0.0004868033. 0.005033493. 0.00861003

Kurtosis: 3.001915 3.004057 3.03291 3.021

Conclusions

These results, using 152 milliard four digit hexadecimal numbers give /dev/urandom a fairly thorough test, showing it is, indeed, highly stochastic.

The experiments using xor, show that xoring a sequence of numbers, with itself, produces another random sequence that is significantly different, including having a different shape, but is still a good pseudo-random sequence.

The latter two sequences do show an increased skewness, and a greater distance from a standard deviation. In a sense, suggesting they have become more stochastic.

It looks as if it would be interesting to try a final experiment, with, the 4th sequence, but, maybe, taking it in batches of 80 numbers.

Wilderness is Paradise enow

Monday 5 February 2024

Exploring the 'randomness' of /dev/urandom, and xored derivatives