Building Test Batteries Based on Analyzing Random Number Generator Tests within the Framework of Algorithmic Information Theory

Ryabko, Boris

doi:10.3390/e26060513

Open AccessArticle

Building Test Batteries Based on Analyzing Random Number Generator Tests within the Framework of Algorithmic Information Theory

by

Boris Ryabko

^1,2

¹

Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russia

²

Institute of Informatics and Computer Engineering, Siberian State University of Telecommunications and Informatics, Novosibirsk 630102, Russia

Entropy 2024, 26(6), 513; https://doi.org/10.3390/e26060513

Submission received: 10 April 2024 / Revised: 8 June 2024 / Accepted: 11 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue Complexity, Entropy and the Physics of Information II)

Download Versions Notes

Abstract

The problem of testing random number generators is considered and a new method for comparing the power of different statistical tests is proposed. It is based on the definitions of random sequence developed in the framework of algorithmic information theory and allows comparing the power of different tests in some cases when the available methods of mathematical statistics do not distinguish between tests. In particular, it is shown that tests based on data compression methods using dictionaries should be included in test batteries.

Keywords:

random number generator; statistical test; algorithmic information theory; battery of tests; data compression; universal coding; Hausdorff dimension

1. Introduction

Random numbers play an important role in cryptography, gambling, Monte Carlo methods and many other applications. Nowadays, random numbers are generated using so-called random number generators (RNGs), and the “quality” of the generated numbers is evaluated using special statistical tests [1]. This problem is so important for applications that there are special standards for RNGs and for so-called test batteries, that is, sets of tests. The current practice for using an RNG is to verify the sequences it generates with tests from some battery (such as those recommended by [2,3] or other standards).

Many statistical tests are designed to test some deviations from randomness described as classes of random processes (e.g., Bernoulli process with unequal probabilities 0 and 1, Markov chains with some unknown parameters, stationary ergodic processes, etc.) [1,2,3,4,5].

A natural question is: how do we compare different tests and, in particular, create a suitable battery of tests? Currently, this question is mostly addressed experimentally: possible candidate tests are applied to a set of known RNGs and the tests that reject more (“bad”) RNGs are suitable candidates for the battery. In addition, researchers try to choose independent tests (i.e., those that reject different RNGs) and take into account other natural properties (e.g., testing speed, etc.) [1,2,3,4]. Obviously, such an approach depends significantly on the set of selected tests and RNGs pre-selected for consideration. It is worth noting that at present there are dozens of RNGs and tests, and their number is growing fast, so the recommended batteries of tests are rather unstable (see [4]).

It is clear that increasing the number of tests in a battery increases the total testing time or, conversely, if testing time is limited, increasing the number of tests causes the length of the binary sequence being examined to decrease and therefore the power of any battery test is reduced. Therefore, it is highly desirable to include in the battery powerful tests designed for different deviations from randomness.

The goal of this paper is to develop a theoretical framework for test comparison and illustrate it by comparing some popular tests. The main idea of the proposed approach is based on the definition of randomness developed in algorithmic information theory (AIT). Apparently, it is natural to use this theory, since it is the only mathematically correct theory that formally defines what a random binary sequence is, and by definition any RNG should generate such sequences. Similar to AIT, we extend the notion of “random sequence” to any statistical test T, and then compare the “size” of the set of random sequences corresponding to different tests. More precisely, let

R_{T_{1}}

and

R_{T_{2}}

be random sequences according to

T_{1}

and

T_{2}

. Then, if

dim (R_{T_{1}} ∖ R_{T_{2}})

> 0

, then

T_{1}

accepts a large set of sequences as random, whereas

T_{2}

rejects these sequences as non-random. So, in this sense, a

T_{1}

test cannot replace

T_{2}

in a battery of tests (here dim is the Hausdorff dimension.).

Based on this approach, we give some practical recommendations for building test batteries. In particular, we recommend including in the test batteries a test based on a dictionary data compressor, like Lempel–Ziv codes [6], grammar-based codes [7] and some others.

The rest of this paper consists is organized as follows. The next part contains definitions and preliminary information, the third part is a comparison of the test performance on Markov processes with different memories and general stationary processes, and the fourth part investigates tests based on Lempel–Ziv data compressors. The fifth part is a brief conclusion; some of the concepts used in this paper are given in the Appendix A.

2. Definitions and Preliminaries

2.1. Hypothesis Testing

In hypothesis testing, there is a main hypothesis

H_{0} = {

the sequence x is random} and an alternative hypothesis

H_{1} = \neg H_{0}

. (In the probabilistic approach,

H_{0}

is that the sequence is generated by a Bernoulli source with equal probabilities 0 and 1.) A test is an algorithm for which the input is the prefix

x_{1} \dots x_{n}

(of the infinite sequence

x_{1}, \dots, x_{n}, \dots

) and the output is one of two possible words: random or non-random (meaning that the sequence is random or non-random, respectively).

Let there be a hypothesis

H_{0}

, some alternative

H_{1}

, let T be a test and

τ

be a statistic, that is, a function on

{0, 1}^{n}

which is applied to a binary sequence

x = x_{1} \dots x_{n}

. Here and below

{0, 1}^{n}

is the set of all n-bit binary words,

{0, 1}^{\infty}

is the set of all infinite words

x_{1} x_{2} \dots, x_{i} \in {0, 1}

.

By definition, Type I error occurs if

H_{0}

is true and

H_{0}

is rejected; the significance level is defined as the probability of the Type I error. Denote the critical region of the test T for the significance level

α

by

{\bar{C}}_{T} (α, n)

and let

C_{T} (α, n)

= {0, 1}^{n} ∖ {\bar{C}}_{T} (α, n) .

Recall that, by definition,

H_{0}

is rejected if and only if

x \in {\bar{C}}_{T} (α, n)

and, hence,

| {\bar{C}}_{T} (α, n) | \leq 2^{n} α,

(1)

see [8]. We also apply another natural limitation. We consider only tests T such that for all n and

α_{1} < α_{2}

{\bar{C}}_{T} (α_{1}, n) \subset {\bar{C}}_{T} (α_{2}, n)

. (Here and below,

| X |

is the number of elements X if X is a set, and the length of X, if X is a word.)

A finite sequence

x_{1} \dots x_{n}

is considered random for a given test T and the significance level

α

if it belongs to

C_{T} (α, n)

.

2.2. Batteries of Tests

Let us consider a situation where the randomness testing is performed by conducting a battery of statistical tests for randomness. Suppose that the battery

\hat{T}

contains a finite or countable set of tests

T_{1}, T_{2}, \dots

and

α_{i}

is the significance level of i-th test,

i = 1, 2, \dots

. If the battery is applied in such a way that the hypothesis

H_{0}

is rejected when at least one test in the battery rejects it, then the significance level

α

of this battery satisfies the following inequality:

α \leq \sum_{i = 1}^{\infty} α_{i},

(2)

because

P (A + B) \leq P (A) + P (B)

for any events A and B (This inequality is a simple extension of the so-called Bonferroni correction, see [9]).

It will be convenient to formulate this inequality in a different way. Suppose there is some

α \in (0, 1)

and a sequence

ω

of non-negative

ω_{i}

such that

\sum_{i = 1}^{\infty} ω_{i} \leq 1

. For example, we can define the following sequence

ω^{*}

:

ω_{i}^{*} = 1 / (i (i + 1)) i = 1, 2, \dots .

(3)

If the significance level

T_{i}

equals

α ω_{i}

, then the significance level of the battery

\hat{T}

is not grater than

α

. (Indeed, from (2) we obtain

\sum_{i = 1} α_{i} = \sum_{i = 1} (α ω_{i})

= α \sum_{i = 1} ω_{i} \leq α

.) Note that this simple observation makes it possible to treat a test battery as a single test.

2.2.1. Random and Non-Random Infinite Sequences

Kolmogorov complexity is one of the central notations of algorithmic information theory (AIT), see [10,11,12,13,14,15,16,17,18]. We will consider the so-called prefix-free Kolmogorov complexity

K (u)

, which is defined on finite binary words u and is closely related to the notion of randomness. More precisely, an infinite binary sequence

x = x_{1} x_{2} \dots

is random if there exists a constant C such that

n - K (x_{1} \dots x_{n}) < C

(4)

for all n, see [19]. Conversely, the sequence x is non-random if

\forall C > 0 \exists n_{C} n_{C} - K (x_{1} \dots x_{n_{C}}) \geq C

In some sense, Kolmogorov complexity is the length of the shortest lossless prefix-free code, that is, for any (algorithmically realisable) code f there exists a constant

c_{f}

for which

K (u) \leq | f (u) | + c_{f}

[10,11,12,13,14,15,16]. Recall that a code f is lossless if there is a mapping

f^{- 1}

such that for any word u

f^{- 1} (f (u)) = u

and f is prefix-free (or unprefixed) if for any words

u, v

,

f (u)

is not a prefix of

f (v)

and

f (v)

is not a prefix of

f (u)

.

Let f be a lossless prefix-free code defined for all finite words. Similarly to (4), we call it random with respect to f if there is a constant

C_{f}

such that

n - | f (x_{1} \dots x_{n}) | < C_{f}

(5)

for all n. We call this difference the statistic corresponding to f and define

τ_{f} (x_{1} \dots x_{n}) = n - | f (x_{1} \dots x_{n}) | .

(6)

Similarly, the sequence x is non-random with respect to f if

\forall C > 0 \exists n_{C} n_{C} - | f (x_{1} \dots x_{n_{C}} | \geq C .

(7)

Informally, x is random with respect to f if the statistic

τ_{f}

is bounded by some constant on all prefixes

x_{1} \dots x_{n}

and, conversely, x is non-random if

τ_{f}

is unbounded when the prefix length grows.

Based on these definitions, we can reformulate the concepts of randomness and non-randomness in a manner similar to what is customary in mathematical statistics. Namely, for any

α \in (0, 1)

we define the set

{y = y_{1} \dots y_{n} : τ_{f} (y) \geq - log α}

. It is easy to see that (1) is valid and, therefore, this set represents the critical region

{\bar{C}}_{T} (α, n)

, where the test T is as follows:

T = {x_{1} \dots x_{n}

:

τ_{f} (x_{1} \dots x_{n}) < α}

.

Based on these consideration, (6) and the definitions of randomness (4), (5) we give the following definition of randomness and non-randomness for the statistic

τ_{f}

and corresponding test

T_{f}

. An infinite sequence

x = x_{1} x_{2} \dots

is random according to the test

T_{f}

if there exists such

α > 0

that for any integer n and this

α

the word

x_{1} \dots x_{n}

is random (according to the

T_{f}

test). Otherwise, the sequence x is non-random.

Note that we can use the statistic

τ_{f} = n - | f (x_{1} \dots x_{n}) |

with the critical value

t_{α} = n - log (1 / α) - 1

,

α \in (0, 1)

, see [20,21]. So, there is no need to use the density distribution formula and it greatly simplifies the use of the test and makes it possible to use this test for any data compressor f.

It is important to note that there are tests developed within the AIT that can be used to test RNG [22,23].

2.2.2. Test Performance Comparison

For test T, let us define the set

R_{T}

of all infinite sequences that are random for T.

We use this definition to compare the “effectiveness” of different tests as follows. The test

T_{1}

is more efficient than

T_{2}

if the size of the difference

R_{T_{2}} ∖ R_{T_{1}}

is not equal to zero, where the size is measured by the Hausdorff dimension.

Informally, the “smallest” set of random sequences corresponds to a test based on Kolmogorov complexity (4) (corresponding set

R_{K}

contains “truly” random sequences). For a given test

T_{1}

we cannot calculate the difference

R_{T_{1}} ∖ R_{K}

because the statistic (4) is noncomputabele, but in the case of two tests

T_{1}

and

T_{2}

, where

dim (R_{T_{2}} ∖ R_{T_{1}}) > 0

, we can say that the set of sequences random according to

T_{2}

contains clearly non-random sequences. So, in some sense,

T_{1}

is more efficient than

T_{2}

. (Recall that we only consider computable tests.)

The definition of the Hausdorff dimension is given in the Appendix A, but here we briefly note that we use the Hausdorff dimension for it as follows: for any binary sequence

x_{1} x_{2} \dots

we define a real number

σ (x) = 0 .

x_{1} x_{2} \dots

and for any set of infinite binary sequences S we denote the Hausdorff dimension of

σ (S)

by

dim S

. So, a test

T_{1}

is more efficient than

T_{2}

(formally

T_{1} ⪰ T_{2}

) if

dim (R_{T_{2}} ∖ R_{T_{1}}) > 0

. Obviously, information about a test’s effectiveness can be useful to developers of the test’s batteries.

Also note that the Hausdorff dimension is widely used in information theory. Perhaps the first such use was due to Eggleston [24] (see also [25,26]), and later the Hausdorff dimension found numerous applications in AIT [27,28,29].

2.2.3. Shannon Entropy

In RNG testing, one of the popular alternative hypotheses (

H_{1}

) is that the considered sequence generated by Markov process of memory (or connectivity)

m, m > 0,

(S_{m})

, but the transition probabilities are unknown. (

S_{0}

, i.e.,

m = 0

, corresponds to the Bernoulli process). Another popular and perhaps the most general

H_{1}

is that the sequence is generated by a stationary ergodic process (

S_{\infty}

) (excluding

H_{0}

).

Let us consider the Bernoulli process

μ \in S_{0}

for which

μ (0) = p, μ (1) = q, (p + q = 1) .

By definition, the Shannon entropy

h (μ)

of this process is defined as

h (μ) = - (p log p + q log q)

[30]. For any stationary ergodic process

ν \in S

the entropy of order k is defined as follows:

h_{k} (ν) = E_{ν} (- \sum_{u \in {0, 1}^{k}} (ν (0 / u) log (0 / u) + ν (1 / u) log ν (1 / u))),

where

E_{ν}

is the mathematical expectation according to

ν

,

ν (z / u)

is the conditional probability

ν (x_{i + 1} = z | x_{i - k} \dots x_{i} = u)

, it does not depend on i due to stationarity [30].

It is known in Information Theory that for stationary ergodic processes (including

S_{\infty}

and

S_{m}, m \geq 0)

h_{k} \geq h_{k + 1}

for

k \geq 0

and there exists the limit Shannon entropy

h_{\infty} (ν) = lim h_{k} (ν)

. Besides, for

ν \in S_{m}

h_{\infty} = h_{m}

[30].

Shannon entropy plays an important role in data compression because for any lossless and prefix-free code, the average codeword length (per letter) is at least as large as the entropy, and this limit can be reached. More precisely, let

ϕ

be a lossless, prefix-free code defined on

{0, 1}^{n}, n > 0

, and let

ν \in S

. Then, for any

ϕ

,

ν

, and codewords of average length

E_{n} (ϕ, ν) = \frac{1}{n} \sum_{u \in {0, 1}^{n}} ν (u) | ϕ (u) |

(8)

E_{n} (ϕ, ν) \geq h (ν)

. In addition, there are codes

ϕ_{1}, ϕ_{2}, \dots

such that

{lim}_{n \to \infty} E_{n} (ϕ_{n}, ν) = h (ν)

[30].

2.2.4. Typical Sequences and Universal Codes

The sequence

x_{1} x_{2} \dots

is typical for the measure

μ \in S_{\infty}

if for any word

y_{1} \dots y_{r}

{lim}_{t \to \infty} N_{x_{1} \dots x_{t}} (y_{1} \dots y_{r}) / t = μ (u)

, where

N_{x_{1} \dots x_{t}} (y_{1} \dots y_{r})

is the number of occurrences of a word

y_{1} \dots y_{r}

in a word

x_{1} \dots x_{t}

.

Let us denote the set of all typical sequences as

T_{μ}

and note that

μ (T_{μ}) = 1

[30]. This notion is deeply related to information theory. Thus, Eggleston proved the equality

dim T_{μ} = h (μ)

for Bernoulli processes (

μ \in S_{0}

) [24], and later this was generalized for

μ \in S_{\infty}

[26,28].

By definition, a code

ϕ

is universal for a set of processes S if for any

μ \in S

and any

x \in T_{μ}

lim_{n \to \infty} | ϕ (x_{1} \dots x_{n}) | / n = h_{\infty} (μ) .

(9)

In 1968, R. Krichevsky [31] proposed a code

κ_{m}^{t} (x_{1} \dots x_{t})

,

m \geq 0, t

is an integer, whose redundancy, i.e., the average difference between the code length and Shannon entropy, is asymptotically minimal. This code and its generalisations are described in the Appendix A, but here we note the following main property. For any stationary ergodic process

μ

, that is,

μ \in S_{\infty}

and typical

x \in T_{μ}

,

lim_{t \to \infty} | κ_{m}^{t} (x_{1} \dots x_{t}) | / t = h_{m} (μ),

(10)

see [32].

Currently there are many universal codes which are based on different ideas and approaches, among which we note the PPM universal code [33], the arithmetic code [34], the Burrows–Wheeler transform [35], which is used along with the book-stack (or MTF) code [36,37,38], and some others [39,40,41].

The most interesting for us is the class of grammar-based codes suggested by Kieffer and Yang [7,42] which includes the Lempel–Ziv (LZ) codes [6] (note that perhaps the first grammar-based code was described in [43]).

The point is that all of them are universal codes and hence they “compress” stationary processes asymptotically to entropy and therefore cannot be distinguishable at

S_{\infty}

. On the other hand, we show that grammar-based codes can distinguish “large” sets of sequences as non-random beyond

S_{\infty}

.

2.2.5. Two-Faced Processes

The so-called two-faced processes are described in [20,21] and their definitions will be given in Appendix A. Here, we note some of their properties: the set of two-faced processes

Λ_{s} (p)

of order

s, s \geq 1

, and probability

p, p \in (0, 1)

, contains the measures

λ

from

S_{s}

such that

h_{0} (λ) = h_{1} (λ) = \dots = h_{s - 1} (λ) = 1,

h_{s} (λ) = h_{\infty} (λ) = - (p log p + (1 - p) log (1 - p)) .

(11)

Note that they are called two-faced because they appear to be truly random if we look at word frequencies whose length is less than s, but are “completely” non-random if the word length is equal to or greater than s (and p is far from

1 / 2

).

3. Comparison of the Efficiency of Tests for Markov Processes with Different Memories and General Stationary Processes

We now describe the statistical tests for Markov processes and stationary ergodic processes as follows. By (6), statistical definitions are as follows:

τ_{K_{m}^{t}} (x_{1} \dots x_{n}) = n - | {\hat{κ}}_{m}^{t} (x_{1} \dots x_{n}) |,

τ_{R^{t}} (x_{1} \dots x_{n}) = n - | {\hat{ρ}}^{t} (x_{1} \dots x_{n}) |

where

{\hat{κ}}_{m}^{t}

and

{\hat{ρ}}^{t}

are universal codes for

S_{m}

and

S_{\infty}

defined in the Appendix A, see (A4) and (A5). We also denote the corresponding tests by

T_{K_{m}}^{t}

and

T_{R}^{t}

. The following statement compares the performance of these tests.

Theorem 1.

For any integers

m, s

and

t = m s

T_{K_{m}}^{t} ⪯ T_{K_{m + 1}}^{t}, T_{K_{m}}^{t} ⪯ T_{K_{R}}^{t} .

Moreover,

dim (T_{K_{m}}^{t} ∖ T_{K_{m + 1}}^{t}) = 1

.

Proof.

First, let us say a few words about the scheme of the proof. If we apply the

T_{K_{m}}^{t}

test to typical sequences of a two-faced process

λ \in T_{Λ_{m + 1 (p)}},

p \neq 1 / 2

, they will appear random since

h_{m} (λ) = 1

. So, the set of random sequences

R_{T_{K_{m}}^{t}}

(i.e., random sequences according to

T_{K_{m}}^{t}

test) contains the set of the typical sequences

T_{Λ_{m + 1 (p)}}

for which

dim (T_{Λ_{m + 1 (p)}})

equals the limit Shannon entropy

- (p log p + (1 - p) log (1 - p))

. Hence,

dim (R_{T_{K_{m}}^{t}}) \geq

dim (T_{Λ_{m + 1 (p)}}) =

- (p log p + (1 - p) log (1 - p))

.

On the other hand, typical sequences of a two-faced process

λ \in T_{Λ_{m + 1 (p)}}, p \neq 1 / 2

are not random according to

T_{K_{m + 1}}^{t}

since

h_{m + 1} (λ)

= - (p log p + (1 - p) log (1 - p)) < 1

(11) and the test.

T_{K_{m}}^{t}

“compresses” them till the Shannon entropy

- (p log p + (1 - p) log (1 - p))

. So,

dim (R_{T_{K_{m}}^{t}} ∖ R_{T_{K_{m + 1}}^{t}}) \geq

dim (R_{T_{K_{m}}^{t}})

\geq - (p log p + (1 - p) log (1 - p))

. Then

{sup}_{p \in (0, 1 / 2)}

dim (T_{K_{m}}^{t} ∖ T_{K_{m + 1}}^{t}) = 1

.

More formally, consider a typical sequence x of

T_{Λ_{m + 1 (p)}}

,

p \neq 1 / 2

. So,

{lim}_{t \to \infty} - \sum_{u \in {0, 1}^{m + 1}}

(N_{x_{1} \dots x_{t}} (u) / t) log (N_{x_{1} \dots x_{t}} (u) / t) = h_{λ} (m) = 1,

see (11), where the first equality is due to typicality, and the second to the property of two-faced processes (11).

From here and (A1), (A4) we obtain

E_{λ} (1 / n) | {\hat{κ}}_{m}^{t} (x_{1} \dots x_{n}) |

= 1 + ϵ

, where

ϵ > 0

. From this and typicality we can see that

{lim}_{n \to \infty} | {\hat{κ}}_{m}^{t} (x_{1} \dots x_{n}) | / n

= 1 + ϵ

. Hence, there exists such

n_{δ}

that

1 + ϵ - δ < | {\hat{κ}}_{m}^{t} (x_{1} \dots x_{n}) | / n < 1 + ϵ + δ

, if

n > n_{δ}

. So

n - | {\hat{κ}}_{m}^{t} (x_{1} \dots x_{n}) |

\leq n - (n + ϵ - δ)

. So, if we take

δ = ϵ / 2

, we can see that for

n > n_{δ}

n - | {\hat{κ}}_{m}^{t} (x_{1} \dots x_{n}) |

is negative. From this and the definition of randomness (5), we can see that typical sequences from

T_{Λ_{m + 1 (p)}}

are random according to

{\hat{κ}}_{m}^{t} (x_{1} \dots x_{n})

, i.e.,

T_{K_{m}}^{t}

. From this and (A6), we obtain

T_{K_{m + 1}}^{t} ⪯ T_{R}^{t}

. □

4. Effectiveness of Tests Based on Lempel-Ziv Data Compressors

In this part we will describe a test that is more effective than

T_{R}^{t}

and

T_{K_{m}}^{t}

for any m.

First, we will briefly describe the LZ77 code based on the definition in [44]. Suppose there is a binary string

σ^{*}

that is encoded using the code LZ77. This string is represented by a list of pairs

(p_{1}; l_{1}) \dots (p_{s}; l_{s})

. Each pair

(p_{i}; l_{i})

represents a string, and the concatenation of these strings is

σ^{*}

. In particular, if

p_{i} = 0

, then the pair represents the string

l_{i}

, which is a single terminal. If

p_{i} \neq 0

, then the pair represents a portion of the prefix of

σ^{*}

that is represented by the preceding

i - 1

pairs; namely, the

l_{i}

terminals beginning at position

p_{i}

in

σ^{*}

; see ([44] part 3.1). The length of the codeword depends on the encoding of the sub-words

p_{i}, l_{i}

which are integers. For this purpose we will use a prefix code C for integers, for which for any integer m

| C (m) | = log m + 2 log log (m + 1) + O (1) .

(12)

Such codes are known in information theory; see, for example, ([30] part 7.2). Note that C is the prefix code and, hence, for any

r \geq 1

the codeword

C (p_{1}) C (l_{1}) \dots C (p_{r}) C (l_{r})

can be decoded to

(p_{1}; l_{1}) \dots (p_{r}; l_{r})

. There is the following upper bound for the length of the LZ77 code [30,44]: for any word

w_{1} w_{2} \dots . w_{m}

| c o d e_{L Z} (w_{1} w_{2} \dots w_{m}) | \leq m (1 + o (1)),

(13)

if

m \to \infty

.

We will now describe such sequences that, on the one hand, are not typical for any stationary ergodic measure and, on the other hand, are not random and will be rejected by the suggested test. Thus, the proposed model allows us to detect non-random sequences that are not typical for for any stationary processes.On the other hand, those sequences are recognized tests based on LZ77 as non-random. To do this, we take any random sequence

x = x_{1} x_{2} \dots

(that is, for which (4) is valid) and define a new sequence

y (x) = y_{1} y_{2} \dots

as follows. Let for

k = 0, 1, 2, \dots

u_{k} = x_{2^{2^{k}} - 1} x_{2^{2^{k}}} x_{2^{2^{k}} + 1} \dots x_{2^{2^{k + 1}} - 2}

y (x) = u_{0} u_{0} u_{1} u_{1} u_{2} u_{2} u_{3} u_{3} \dots

(14)

For example,

u_{0} = x_{1} x_{2}

,

u_{1} = x_{3}

x_{4}

…

x_{14}

,

u_{2} = x_{15} \dots x_{254}

,

y (x) = x_{1} x_{2}

x_{1} x_{2} x_{3} x_{4} \dots x_{14}

x_{3} x_{4} \dots x_{14}

x_{15} \dots x_{254}

x_{15} \dots x_{254}

….

The idea behind this sequence is quite clear. Firstly, it is obvious that the word y cannot be typical for a stationary ergodic source and, secondly, when

u_{0} u_{0} u_{1} u_{1} \dots u_{k} u_{k}

is encoded the second subword

u_{k}

will be encoded by a very short word (about

O (log | u_{k} |))

, since it coincides with the previous word

u_{k}

. So, for large k the length of the encoded word

L Z (u_{0} u_{0} u_{1} u_{1} \dots u_{k} u_{k})

will be about

| u_{0} u_{0} u_{1} u_{1} \dots u_{k} u_{k} | (1 / 2 + o (1))

. So,

lim {inf}_{n \to \infty} | L Z (y_{1} y_{2} \dots y_{n}) | / n = 1 / 2

. Hence, it follows that

dim ({y (x) : x i s r a n d o m}) = 1 / 2 .

(15)

Here, we took into account that x is random and,

dim {x : x i s r a n d o m} = 1

, see [28].) So, having taken into account the definitions of non-randomness (6) and (7), we can see that

y (x)

is non random according to statistics

τ = n - | L Z (y_{1} \dots y_{n}) |

. Denote this test by

T_{L Z}

.

Let us consider the test

T_{K_{m}}^{t}

,

m, t

are integers. Having taken into account that the sequence x is random, we can see that

{lim}_{t \to \infty}

| κ_{m}^{t} (x_{i} x_{i + 1} \dots x_{i + t} | / t = 1 .

So, from from (A4) we can see that for any n

| {\hat{κ}}_{m}^{t} (x_{1} \dots x_{n} | / t = 1 + o (1)

. The same reasoning is true for the code

{\hat{ρ}}^{t}

.

We can now compare the size of random sequence sets across different tests as follows:

R_{T_{K_{m}}^{t}} ∖ R_{T_{L Z}} \supset {y (x) : x i s r a n d o m} .

Taking into account (15), we can see that

dim (R_{T_{K_{m}}^{t}} ∖ R_{T_{L Z}}) \geq 1 / 2 .

Likewise, the same is true for the

T_{R}

test. From the latest inequality we obtain the following.

Theorem 2.

For any random (according to (4)) sequence x the sequence

y (x)

is non-random for the test

T_{L Z}

, whereas this sequence is random for tests

T_{R}^{t}

and

T_{K_{m}}^{t}

. Moreover,

T_{R}^{t} ⪯ T_{L Z}

and

T_{K_{m}}^{t} ⪯ T_{L Z}

for any

m, t

.

Comment. The sequence

y (x)

is constructed by duplicating parts of x. This construction can be slightly modified as follows: instead of duplication (as

u_{i} u_{i}

), we can use

u_{i} u_{i}^{γ}

, where

u_{i}^{γ}

contains

γ | u |

the first letters of u,

γ < 1 / 2

. In this case,

dim (R_{T_{K_{m}}^{t}} ∖ R_{T_{L Z}}) \geq 1 - γ

and, therefore,

sup_{γ \in (0, 1 / 2)} dim (R_{T_{K_{m}}^{t}} ∖ R_{T_{L Z}}) = 1 .

5. Conclusions

Here, we describe some recommendations for the practical testing of RNGs, based on the method of comparing the power of different statistical tests. Based on Theorem 1, we can recommend to use several tests

T_{K_{s}^{t}}

, based on the analysis of occurrence frequencies of words of different length s. In addition, we recommend using tests for which s depends on the length n of the sequence under consideration. For example,

s_{1} = O (log log n)

),

s_{2} = O (\sqrt{log n}

), etc. They can be included in the test battery directly or as the “mixture”

T_{R}

with several non-zero

β

coefficients, see (A2) in the Appendix A.

Theorem 2 shows that it is useful to include tests based on dictionary data compressors such as the Lempel–Ziv code. In such a case we can use the statistic

τ_{L Z} = n - | L Z (y_{1} \dots y_{n}) |

with the critical value

t_{α} = n - log (1 / α) - 1

,

α \in (0, 1)

, see [20,21]. Note that in this case, there is no need to use the density distribution formula, which greatly simplifies the use of the test and makes it possible to use a similar test for any grammar-based data compressor.

Funding

This study was completed with the support of the state assignment of SibSUTIS (reg. no. 071-03-2024-008).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The author declare no conflicts of interest.

Appendix A

Appendix A.1. Hausdorff Dimension

Let

A \subset [0, 1], ρ > 0

. A family of sets S is called a

ρ

-covering A if

(i) S is finite or countable, (ii) any

σ \subset [0, 1]

and its length is not greater than

ρ

and (iii)

\cup_{σ \in S} σ \supset A

. Let

l (α, A, ρ) = inf \sum_{σ \in S} d i a m {(σ)}^{α},

where the infimum is taken over all

ρ

-coverings. Then, Hausdorff dimension

dim (A)

is determined by the equality

dim (a) = inf_{α} lim_{ρ \to 0} l (α, A, ρ) = 0 = sup_{α} lim_{ρ \to 0} l (α, A, ρ) = \infty .

Appendix A.2. Krichevsky Universal Code and Twice-Universal Code

Krichevsky in [31] described the following measure

K_{0}

and universal code

κ_{0}

for Bernoulli processes, which in the case of the binary alphabet looks like

K_{0}^{t} (x_{1} x_{2} \dots x_{t}) = \prod_{i = 0}^{t - 1} \frac{N_{x_{1} \dots x_{i}} (x_{i + 1}) + 1 / 2}{i + 1},

κ_{0}^{t} (x_{1} x_{2} \dots x_{t}) = ⌈ - log K_{0} (x_{1} x_{2} \dots x_{t}) ⌉ .

Then, he generalized them for Markov chains of memory

m,

m > 0

[32], as follows:

K_{m}^{t} (x_{1} \dots x_{t}) = \{\begin{matrix} \frac{1}{2^{t}} \\ if t \leq m \\ \frac{1}{2^{t}} \prod_{i = m}^{t - 1} \frac{N_{x_{1} \dots x_{i}} (x_{i + 1 - m} \dots x_{i + 1}) + 1 / 2}{N_{x_{1} \dots x_{i - 1}} (x_{i + 1 - m} \dots x_{i}) + 1} \\ if t > m, \end{matrix}

κ_{m}^{t} (x_{1} \dots x_{t}) = ⌈ - log K_{m}^{t} (x_{1} \dots x_{t}) ⌉

, see [32]. For example,

K_{0}^{5} (01010) = \frac{1 / 2}{1} \frac{1 / 2}{2} \frac{132}{3} \frac{3 / 2}{4} \frac{5 / 2}{5},

K_{1}^{5} (01010) = \frac{1}{2} \frac{1 / 2}{1} \frac{1 / 2}{1} \frac{3 / 2}{2} \frac{3 / 2}{2} .

The code

κ_{m}^{t}

is universal for a set of processes

S_{m}

, and, for any

ν \in S_{m}

h_{m} (ν) < E_{ν} (κ_{m}^{t}, ν) \leq h_{m} (ν) + 2^{m} log t / (2 t) + O (1 / t)

(A1)

Refs. [31,32]. (This code is optimal in the sense that the redundancy, that is

2^{m} log t / (2 t) + (1 / t)

, is asymptotically minimal [31,32].)

One of the first universal codes for the set of all stationary ergodic processes

S_{\infty}

was proposed in [45]. For this code, the measure

ρ

and the code length R are defined as follows:

R^{t} (x_{1} \dots x_{t}) = \sum_{i = 0}^{\infty} β_{i} K_{i}^{t} (x_{1} x_{2} \dots x_{t}),

(A2)

ρ^{t} (x_{1} \dots x_{t}) = ⌈ - log R^{t} (x_{1} x_{2} \dots x_{t}) ⌉,

where

\sum_{i = 0}^{\infty} β_{i} = 1

and

\forall i : β_{i} > 0

. Obviously, for any j

- log \sum_{i = 0}^{\infty} β_{i} K_{i}^{t} (x_{1} x_{2} \dots x_{t}) = - log β_{j} K_{j}^{t} (x_{1} x_{2} \dots x_{t}) +

- log (1 + \sum_{i = 0, i \neq j}^{\infty} β_{i} K_{i}^{t} (x_{1} x_{2} \dots x_{t}) / (β_{j} K_{j}^{t} (x_{1} x_{2} \dots x_{t}))

\leq - log β_{j} K_{j}^{t} (x_{1} x_{2} \dots x_{t}) .

Hence,

ρ^{t} (x_{1} \dots x_{t}) \leq ⌈ - log β_{j} - log K_{j}^{t} (x_{1} x_{2} \dots x_{t}) ⌉ \leq ⌈ - log β_{j} ⌉

+ ⌈ - log K_{j}^{t} (x_{1} x_{2} \dots x_{t}) ⌉ = ⌈ - log β_{j} ⌉ + | κ_{j}^{t} (x_{1} x_{2} \dots x_{t}) | .

(A3)

This code is called twice universal [45] because it can be used to compress data when both the process memory and the probability distribution are unknown.

Usually, when using universal codes, the sequence

x_{1} \dots x_{n}

is encoded in parts as follows:

{\hat{κ}}_{m}^{t} (x_{1} \dots x_{n}) = κ_{m}^{t} (x_{1} \dots x_{t}) κ_{m}^{t} (x_{t + 1} \dots x_{2 t}) \dots . κ_{m}^{t} (x_{n - t + 1} \dots x_{n})

(A4)

(for brevity, we assume that

n / t

is an integer). Let us similarly define

{\hat{ρ}}^{t} (x_{1} \dots x_{n}) = ρ^{t} (x_{1} \dots x_{t}) ρ^{t} (x_{t + 1} \dots x_{2 t}) \dots . ρ^{t} (x_{n - t + 1} \dots x_{n})

(A5)

Taking into account the definition of

κ_{j}^{t} (x_{1} x_{2} \dots x_{t})

and Equations (4), (A4) and (A5) we obtan that for any integer j

{\hat{ρ}}^{t} (x_{1} \dots x_{n}) \leq {\hat{κ}}_{j}^{t} (x_{1} \dots x_{n}) + O (n / t) .

(A6)

Appendix A.3. Two-Faced Processes

Let us first consider several examples of two-faced Markov chains. Let a matrix of transition probabilities

T_{1}

be as follows:

,

where

ν \in (0, 1)

(i.e.,

P {x_{i + 1} = 0 | x_{i} = 0} = ν

,

P {x_{i + 1} = 0 | x_{i} = 1} = 1 - ν, \dots

). The “typical” sequences for

ν = 0.9

and

ν = 0.1

can be as follows:

0000000000 111111111 0000000000 1111111 0 \dots,

01010101 1010101010 010101010101010101 1010 \dots .

(Here, the gaps correspond to state transitions.) Of course, these sequences are not truly random. On the other hand, the frequencies of 1s and 0s go to

1 / 2

due to the symmetry of the matrix

T_{1}

.

Define

(Here

P {x_{i + 1} = 0 | x_{i} = 0, x_{i - 1} = 0} = ν

,

P {x_{i + 1} = 0 | x_{i} = 0, x_{i - 1} = 1} = 1 - ν, \dots

.)

Now, we can define a transition matrix with two-faced Markov chains with different memory as follows.

The

k + 1

-order transition matrix

T_{k + 1} = T_{k} \hat{T_{k}}

,

{\hat{T}}_{k + 1} = \hat{T_{k}} T_{k}

,

k = 2, 3, \dots

. T In order to define the process

x_{1} x_{2} \dots

the initial probability distribution needs to be specified. We define the initial distribution of the processes

T_{k}

and

{\bar{T}}_{k}

,

k = 1, 2, \dots

, to be uniform on

{0, 1}^{k}

, i.e.,

P {x_{1} \dots x_{k} = u} = 2^{- k}

for any

u \in {0, 1}^{k}

.

The following statement from [20,21] describes the main properties of the processes defined above.

Claim. Let a sequence

x_{1} x_{2} \dots

be generated by the process

T_{k}

(or

{\bar{T}}_{k}

),

k \geq 1

and u be a binary word of length k. Then, if the initial state obeys the uniform distribution over

{0, 1}^{k}

, then

(i): For any $j \geq 0$

$P (x_{j + 1} \dots x_{j + k} = u) = 2^{- | u |} .$

(A7)
(ii): For each $ν \in (0, 1)$ , the k-order Shannon entropy ( $h_{k}$ ) of the processes $T_{k}$ and ${\bar{T}}_{k},$ equals 1 bit per letter, whereas the limit Shannon entropy ( $h_{\infty}$ ) equals $- (ν {log}_{2} ν + (1 - ν) {log}_{2} (1 - ν)) .$

References

L’Ecuyer, P. Random Number Generation; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
L’Ecuyer, P.; Simard, R. TestU01: AC library for empirical testing of random number generators. ACM Trans. Math. Softw. 2007, 33, 22. Available online: http://simul.iro.umontreal.ca/testu01/tu01.html (accessed on 10 June 2024). [CrossRef]
Rukhin, A.; Soto, J.; Nechvatal, J.; Smid, M.; Barker, E.; Leigh, S.; Levenson, M.; Vangel, M.; Banks, D.; Heckert, A.; et al. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2010. [Google Scholar]
Hurley-Smith, D.; Patsakis, C.; Hernez-Castro, J. On the unbearable lightness of FIPS 140–2 randomness tests. IEEE Trans. Inf. Forensics Secur. 2020, 17, 3946–3958. [Google Scholar] [CrossRef]
Ryabko, B. Asymptotically most powerful tests for random number generators. J. Stat. Plan. Inference 2022, 217, 1–7. [Google Scholar] [CrossRef]
Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef]
Yang, E.-H.; Kieffer, J.C. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. i. without context models. IEEE Trans. Inf. Theory 2000, 46, 755–777. [Google Scholar] [CrossRef]
Kendall, M.; Stuart, A. The Advanced Theory of Statistics; Volume 2: Inference and Relationship; Hafner Publishing Company: New York, NY, USA, 1961. [Google Scholar]
Mittelhammer, R.C.; Judge, G.G.; Miller, D.J. Econometric Foundations; Cambridge University Press: Cambridge, UK, 2000; pp. 73–74. [Google Scholar]
Hutter, M. Algorithmic information theory. Scholarpedia 2007, 2, 2519. [Google Scholar] [CrossRef]
Li, M.; Vitányi, P. An Introduction to Kolmogorov Complexity and Its Applications; Springer: New York, NY, USA, 2008. [Google Scholar]
Calude, C.S. Information: The algorithmic paradigm. In Formal Theories of Information: From Shannon to Semantic Information Theory and General Concepts of Information; Springer: Berlin/Heidelberg, Germany, 2009; pp. 79–94. [Google Scholar]
Downey, R.; Hirschfeldt, D.R.; Nies, A.; Terwijn, S.A. Calibrating randomness. Bull. Symb. Log. 2006, 12, 411–491. [Google Scholar] [CrossRef]
Merkle, W.; Miller, J.S.; Nies, A.; Reimann, J.; Stephan, F. Kolmogorov–loveland randomness and stochasticity. Ann. Pure Appl. Log. 2006, 138, 183–210. [Google Scholar] [CrossRef]
V’yugin, V. On Nonstochastic Objects. Probl. Peredachi Inf. 1985, 21, 3–9. [Google Scholar]
Vereshchagin, N. Algorithmic Minimal Sufficient Statistics: A New Approach. Theory Comput. Syst. 2015, 58, 463–481. [Google Scholar] [CrossRef]
Zenil, H. A review of methods for estimating algorithmic complexity: Options, challenges, and new directions. Entropy 2020, 22, 612. [Google Scholar] [CrossRef]
Zenil, H.; Kiani, N.A.; Tegnér, J. Low-algorithmic-complexity entropy-deceiving graphs. Phys. Rev. 2017, 96, 012308. [Google Scholar]
Downey, R.G.; Reimann, J. Algorithmic randomness. Scholarpedia 2007, 2, 2574. [Google Scholar] [CrossRef]
Ryabko, B.Y.; Monarev, V.A. Using information theory approach to randomness testing. J. Stat. Plan. Inference 2005, 133, 95–110. [Google Scholar] [CrossRef]
Ryabko, B.; Fionov, A. Cryptography in the Information Society; World Scientific Publishing: Hackensack, NJ, USA, 2020; p. 280. [Google Scholar]
Soler-Toscano, F.; Zenil, H.; Delahaye, J.P.; Gauvrit, N. Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines. PLoS ONE 2014, 9, e96223. [Google Scholar] [CrossRef]
Zenil, H.; Hernández-Orozco, S.; Kiani, N.A.; Soler-Toscano, F.; Rueda-Toicen, A.; Tegnér, J. A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity. Entropy 2018, 20, 605. [Google Scholar] [CrossRef]
Eggleston, H.G. The fractional dimension of a set defined by decimal properties. Q. J. Math. 1949, 1, 31–36. [Google Scholar] [CrossRef]
Billingsley, P. Hausdorff dimension in probability theory. Ill. J. Math. 1960, 4, 187–209. [Google Scholar] [CrossRef]
Billingsley, P. Ergodic Theory and Information; Wiley: New York, NY, USA, 1965. [Google Scholar]
Lutz, J.H. The dimensions of individual strings and sequences. Inform. Comput. 2003, 187, 49–79. [Google Scholar] [CrossRef]
Reimann, J. Information vs. Dimension: An Algorithmic Perspective. In Structure and Randomness in Computability and Set Theory; World Scientific: Singapore, 2020; pp. 111–151. [Google Scholar]
Tadaki, K. Partial randomness and dimension of recursively enumerable reals. In International Symposium on Mathematical Foundations of Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; pp. 687–699. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
Krichevsky, R. A relation between the plausibility of information about a source and encoding redundancy. Probl. Inform. Transm. 1968, 4, 48–57. [Google Scholar]
Krichevsky, R. Universal Compression and Retrival; Kluwer Academic Publishers: Norwell, MA, USA, 1993. [Google Scholar]
Cleary, J.; Witten, I. Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 1984, 32, 396–402. [Google Scholar] [CrossRef]
Rissanen, J.; Langdon, G.G. Arithmetic coding. IBM J. Res. Dev. 1979, 23, 149–162. [Google Scholar] [CrossRef]
Burrows, M.; Wheeler, D.J. A Block-Sorting Lossless Data Compression Algorithm. 1994. Available online: http://www.eecs.harvard.edu/~michaelm/CS222/burrows-wheeler.pdf (accessed on 10 June 2024).
Ryabko, B.Y. Data compression by means of a “book stack”. Probl. Inf. Transm. 1980, 16, 265–269. [Google Scholar]
Bentley, J.; Sleator, D.; Tarjan, R.; Wei, V. A locally adaptive data compression scheme. Commun. ACM 1986, 29, 320–330. [Google Scholar] [CrossRef]
Ryabko, B.; Horspool, N.R.; Cormack, G.V.; Sekar, S.; Ahuja, S.B. Technical correspondence. Commun. ACM 1987, 30, 792–797. [Google Scholar]
Louchard, G.; Szpankowski, W. Average profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm. IEEE Trans. Inf. Theory 1995, 41, 478–488. [Google Scholar] [CrossRef]
Drmota, M.; Reznik, Y.; Szpankowski, W. Tunstall code, Khodak variations, and random walks. IEEE Trans. Inf. Theory 2010, 56, 2928–2937. [Google Scholar] [CrossRef]
Reznik, Y.A. Coding of Sets of Words. In Proceedings of the 2011 Data Compression Conference, Snowbird, UT, USA, 29–31 March 2011; IEEE: Piscataway, NJ, USA, 2011. [Google Scholar]
Kieffer, J.C.; Yang, E.-H. Grammar-based codes: A new class of universal lossless source codes. IEEE Trans. Inf. Theory 2000, 46, 737–754. [Google Scholar] [CrossRef]
Kurapova, E.V.; Ryabko, B.Y. Application of Formal Grammars for Encoding Information Sources. Probl. Inform. Transm. 1995, 31, 23–26. [Google Scholar]
Charikar, M.; Lehman, E.; Liu, D.; Panigrahy, R.; Prabhakaran, M.; Rasala, A.; Sahai, A.; Shelat, A. Approximating the smallest grammar: Kolmogorov complexity in natural models. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, Montreal, QC, Canada, 19–21 May 2002; pp. 792–801. [Google Scholar]
Ryabko, B. Twice-universal coding. Probl. Inf. Transm. 1984, 3, 173–177. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ryabko, B. Building Test Batteries Based on Analyzing Random Number Generator Tests within the Framework of Algorithmic Information Theory. Entropy 2024, 26, 513. https://doi.org/10.3390/e26060513

AMA Style

Ryabko B. Building Test Batteries Based on Analyzing Random Number Generator Tests within the Framework of Algorithmic Information Theory. Entropy. 2024; 26(6):513. https://doi.org/10.3390/e26060513

Chicago/Turabian Style

Ryabko, Boris. 2024. "Building Test Batteries Based on Analyzing Random Number Generator Tests within the Framework of Algorithmic Information Theory" Entropy 26, no. 6: 513. https://doi.org/10.3390/e26060513

APA Style

Ryabko, B. (2024). Building Test Batteries Based on Analyzing Random Number Generator Tests within the Framework of Algorithmic Information Theory. Entropy, 26(6), 513. https://doi.org/10.3390/e26060513

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Building Test Batteries Based on Analyzing Random Number Generator Tests within the Framework of Algorithmic Information Theory

Abstract

1. Introduction

2. Definitions and Preliminaries

2.1. Hypothesis Testing

2.2. Batteries of Tests

2.2.1. Random and Non-Random Infinite Sequences

2.2.2. Test Performance Comparison

2.2.3. Shannon Entropy

2.2.4. Typical Sequences and Universal Codes

2.2.5. Two-Faced Processes

3. Comparison of the Efficiency of Tests for Markov Processes with Different Memories and General Stationary Processes

4. Effectiveness of Tests Based on Lempel-Ziv Data Compressors

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Hausdorff Dimension

Appendix A.2. Krichevsky Universal Code and Twice-Universal Code

Appendix A.3. Two-Faced Processes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI