1. Introduction
Important secret variables, such as encryption keys, salts, generation of primes, and stack canaries, are generated by a random number generator (RNG). The problem with RNGs is that they can leak this important information to attackers. In recent decades, this problem has been revealed in various platforms ranging from early versions of Netscape’s secure sockets layer (SSL) [
1] to smartphone environments (e.g., Androids). If weak random numbers are generated, the private key can be illegally recovered in the public key cryptosystem [
2,
3,
4]. Furthermore, predictable random numbers can be generated if the RNG has insufficient noise sources, such as boot time.
For example, several articles [
5,
6,
7,
8] pointed out that the important parameters (e.g., PreMasterSecret) can be exposed in the embedded system, Androids, and OpenSSL from predictable random numbers generated at boot time because collecting noise sources is limited. In addition, the Bitcoin wallet was attacked in the elliptic curve digital signature algorithm (ECDSA) process because the Java-based RNG (SecureRandom class) is vulnerable [
9]. Recoverable random numbers were also leveraged in a backdoor in the international standard of the dual elliptic curve deterministic random bit generator (Dual EC DRBG) [
10]. Checkoway et al. [
11] showed that utilizing the backdoor is of practical use. A systematic analysis on the security of linux random number generator (LRNG) was initiated by Guttermann et al. [
12]. In their work, a concrete structure and the entropy collecting process from noise sources in LRNG were investigated. It was additionally shown that
complexity is required to restore the state of the entropy pool from generated random numbers. Recently, Heinger et al. [
3] studied the internet-wide vulnerability of RNG. In transport layer security (TLS) as well as secure shell (SSH), numerous certificates were collected and several identical keys were found by extracting common primes using the greatest common divisor (GCD) algorithm. Moreover, Kim et al. [
7] showed that PreMasterSecret can be recovered by
complexity in the handshake process of OpenSSL in the Android. Their attacks can be possible because predictable random numbers are generated from insufficient entropy at boot time. Kaplan et al. [
8] attacked the Android KeyStore using a stack buffer-overflow vulnerability (CVE-2014-3100) [
13] by leveraging the above-mentioned RNG problem. For CVE-2014-3100 to succeed, it must bypass the stack canary that serves as the defense for a stack buffer-overflow attack. However, the stack canary can be predictable, provided that it is generated by RNG along with the mentioned problem.
The underlying cause of these results can be divided into two features. First, when random numbers are requested from the non-blocking pool via /dev/urandom or get_random_bytes(), there is no entropy transfer from the input pool to the non-blocking pool if the entropy counter of the input pool does not reach the threshold (192 bits). This means that the process of generating random numbers in the non-blocking pool is deterministic. The second is that estimating entropy of LRNG is conservative. Consequently, noise sources are inefficiently used and it has to spend a lot of time for harvesting entropy from noise sources until the entropy counter exceeds the transfer threshold (192 bits). Moreover, without entropy transfer [
6], it makes generating random numbers from the non-blocking pool deterministic. In particular, because of limited noise sources in the embedded environments, this feature is closely related to LRNG security. In addition, as observed in [
7,
8], the predictability of random numbers generated at the initial boot time can be feasible because it is not easy to collect sufficient noise sources from booting.
In November 2015, Google released the source code for an Internet of Things (IoT)-specific operating system called Brillo [
14]. It is evident in the source code that Brillo is based on the Linux kernel (version 3.10); moreover, the structure of RNG is identical to the previous version. This implies that Brillo still has problems identical to those of LRNG. To verify this point, we analyze the behavior of LRNG from the boot start. According to the results, we conclude that the entropy counter in the input pool at boot time does not exceed the threshold (192 bits), and the sequence is consistent for collecting noise sources (input) and generating random numbers (output). Experimentally, for random numbers of 700 bytes during boot time, we determine that it can be recovered with the probability of 90% within the cost of
. Some of the 700 bytes are used as a stack canary to prevent a stack buffer-overflow attack, which is a potential vulnerability.
3. Analyzing Features of LRNG during Boot Process
3.1. Strategy and Environments for Analysis
The goal of this analysis is to identify the behavior of LRNG during boot time of Brillo. Through this analysis, we collect the necessary data, such as values of noise source, entropy counters, and sequences of input and output. The analysis process is divided into seven steps (
Figure 4). The analysis is conducted at the Edison board (Arduino Kit) [
18]. Edison board was the only released board when we started our analysis. In 2017, four development boards are available (
http://elinux.org/Android_Brillo_Internals). The kernel is compiled with GCC 4.9 version in Ubuntu 14.04 Long Term Support (LTS). The files created by compiling the kernel are ported using Intel Flash Phone Tool Lite (5.3.2.0) (
Table 1). Other necessary settings follow the instructions of [
19].
The path of source code of LRNG is located in /dev/char/random.c. First, we insert several codes (e.g., printk()function) in the random.c to check behavior of LRNG in the boot process. From these codes, the necessary information is recorded in the boot log.
In the case of Linux and Android, a logging system exists to record all boot processes. All boot records are stored in a file (syslog). However, Brillo does not currently support the logging system. Instead, the boot log is saved in the dmesg ring buffer of 64 KB. Therefore, it must move them to the non-volatile storage before the power is turned off. We connect to Brillo using Android Debug Bridge (ADB). By repeating the boot, we obtain many boot logs and extract necessary information.
3.2. Two Features of Generating Random Numbers in Brillo
By analyzing the behavior of LRNG during the boot process in Brillo, we found two features for generating random numbers. Using these two features, it is possible to recover random numbers generated at boot time within a practical time frame. The first feature is that the entropy counter of the input pool is less than the transfer threshold (192 bits). When this condition is satisfied, the process of generating random numbers is almost deterministic because there is no entropy transfer. The second feature is that the pattern of input and output is consistent for each boot process. Owing to this feature, the cost of recovery is considerably reduced.
3.2.1. First Feature: Entropy Counter of the Input Pool at Boot Time is Less than 192 Bits (Insufficient Entropy)
We insert printk() to record the entropy counter in the kernel source code before collecting noise sources and after producing random numbers. Then, the modified kernel is booted 10,000 times, thereby producing 10,000 graphs of the entropy counter. These graphs have a similar pattern. It involves an average of 28 s to complete boot up.
Figure 5 indicates a typical graph for the entropy counter. The entropy counter is less than 192 bits during boot time. Therefore, generating random numbers by /dev/urandom or get_random_bytes() is “almost” deterministic because there is no entropy transfer from the input pool to the non-blocking pool.
The expression “almost deterministic” means that there is an exceptional noise source directly injected into the non-blocking pool even though all noise sources are accumulated in the input pool. Owing to this noise source, the state of the non-blocking pool is not fully deterministic. Slight randomness occurs from this exceptional noise source. This noise source serves to inject the device-specific information (add_device_randomness()), which is mainly inputted at boot time. This is assumed to prevent the deterministic state for the non-blocking pool during boot time.
Figure 6 indicates the code of add_device_randomness(). For direct input in the non-blocking pool,
buf, time(
cycles, jiffies) are used.
buf is a fixed value, such as device name and serial number.
jiffies is also fixed values in the analysis point (1). Therefore,
cycles is the only noise source for analysis. If the entropy counter is less than 192 bits, the randomness of the non-blocking depends only on
cycles. Even though add_device_randomness() provides random data directly injected into the non-blocking pool, it is not reflected by the entropy counter.
Note that similar phenomena are observed in the Android environments [
6] as well. In [
6], the authors analyzed the entropy counter of the boot time in some Android smartphones such as Nexus 4, Nexus 7, and Galaxy Nexus. They found that the entropy counters are quickly increased before Android smartphones finish the boot sequence. The difference of hardware devices causes the different result. Android mainly uses disk I/O as the entropy source. Once LRNG collects entropy from this source, its entropy estimator calculates the entropy and increases the entropy counter. As a result, the entropy counter reaches the threshold quickly and the entropy is transferred into the non-blocking pool. However, Brillo only uses entropy from device information and time, which does not increase the entropy counter at all, because Brillo does not have a disk, keyboard, or a mouse.
3.2.2. Second Feature: Order of Inputting Noise Source and Outputting Random Numbers are Consistent (Identical Pattern)
For the non-blocking pool, the identical pattern means that the sequences of noise source input and random number output are consistent. When the timing of input and output are recorded in a timeline, the time axis can be divided into several sections based on the time stamps of
input.
Figure 7 represents the timeline for the sequence of input and output. The following notations indicate indices for
input,
output, and
interval.
: i-th input entropy source,
: j-th output random number,
: a section between s,
When the boot starts, the three entropy pools of LRNG are initialized to all zeros in the kernel initialization phase. The first random numbers (8 bytes) are produced from this state. Next, the first noise source (add_device_randomness()) is collected as input. Because this noise source is device information, it is directly injected into the non-blocking pool. Then, subsequent random numbers of 24 bytes are produced three times (8 bytes × 3). Then, a noise source (device information) is injected again and random numbers of 668 bytes are consecutively produced 86 times (
Table 2).
In the boot process, this sequence is consistent until the 101st random numbers are produced (identical pattern). It is maintained until approximately the 4.6 s point. After this time, it is difficult to predict the in–out sequence because the operating system is switched to multi-core environments (i.e., race condition). As the Edison board has a dual-core CPU, another CPU starts to access LRNG from this point.
Table 2 summarizes the identical pattern in the timeline of
Figure 8. Meanwhile,
,
,
, and
become stack canaries, while the remaining values continue to be analyzed.
The first feature (insufficient entropy) is unique to LRNG. This feature is also satisfied in Brillo and makes LRNG generate random numbers without any entropy transfer. The second feature (identical pattern) is a unique one that is observed during boot time of Brillo. By combining the two features, recovering random numbers between and depends on cycles. Therefore, cycles is the most important factor to recover random numbers in these intervals.
3.3. Success Probability of Recovery
is the only noise source, which is directly injected into the non-blocking pool, to generate random numbers (e.g., /dev/urandom or get_random_bytes()) at boot time. Even though the boot process is routine, the values of is somewhat random. The randomness of comes from some frequency differences between hardware devices such as processor, cache memory, disk, etc. In the aspect of an attacker, observing these states is very difficult. However, the values of can be observable; thus, we regard as a random variable. By modeling the distribution for , we obtain the attacker’s success probability of recovery and the cost of attack. In this subsection, we focus on random numbers of 700 bytes in , , and and describe probabilistic models for and .
For and , we collected sample values of 10,000 by iterative bootings. From their histograms, the distributions are estimated as exponential distributions.
3.3.1. Probabilistic Models of and
A random variable whose probability density function
f is given by, for some
,
is called an exponential random variable with parameter
λ. The cumulative distribution of
X, the exponential random variable with parameter
λ, is given by
Because an arrival time of the first event follows an exponential distribution,
also can be regarded as an arrival time when a noise source is injected into the entropy pool (non-blocking pool). The exponential distribution is used to describe the time between events in a Poisson process in which events occur continuously and independently at a constant average rate [
20]. Thus, we can reasonably assume that
is an exponential random variable for some appropriate value of
λ. In order to lead the value of
λ, we estimate the expectation of the random variable
X by the average of several sample values, and we apply the fitting process of MATLAB [
21] (version R2016b) for converting a histogram to the probability density function of
X:
Let
and
denote two random variables of
and
, respectively. From sample values collected by iterative bootings, we consider the dominant parts of supports of
and
. For a random variable
X, the support is defined by the closure of the set containing all possible values
x of
X such that
,
, where
is the probability density function of
X. Then, the supports of
and
are estimated as follows:
From
Figure 9, we can obtain the fact that
and
are clearly non-overlapped. Thus, we suppose that two random variables
and
are independent. In fact, for our 10,000 sample values of
and
, we have obtained that
, for several
. By shifting the starting points of
and
to zero, we define the corresponding random variables
and
by
and
, respectively. Then,
and
are modelized by exponential distributions with parameter
and
, respectively. We can estimate the values of
and
from the 10,000 sample values by
That is,
and
. By using distribution fitting tool of MATLAB, we confirm several properties of the distributions. Fitting results show that the sample values converge on the exponential distributions (
Figure 10).
Since
and
are independent by the reasonable assumption,
and
are also independent. Therefore, for all
, the success probability of recovery is given by
Before we introduce the probabilistic models, we have checked whether the supports and parameter
λs depend on hardware factors in the same model. We have uploaded Brillo on three Edison boards and tested each support and distribution on each board.
Table 3 shows that the influence of hardware dependency is negligible and supports of our model are convincing.
3.3.2. Success Probability of Recovery and Cost of Attack
From Equation (
1),
Table 4 indicates several trade-offs between success probabilities and costs of the attack. We define the cost of attackers by the number of operations (operating the extract function) for an attacker to obtain and confirm a correct random number. In fact, the values of
vary by nanosecond resolution on the Edison device. However, an attacker can observe only millisecond precision on the outside, and then the attacker has to guess the nanosecond out of
, equivalent to
.
For example, if the attacker observes a generated random number with values of (ms) and (ms) in the (e.g., ). Then, he will guess correct and by generating random numbers from and . However, before he guesses the values, he has to set a success probability of recovery and the maximum cost because he does not know how he has to operate extract unit. If the attacker selects the values of and from and , he can obtain the success probability of 0.90 (). On the other hand, he needs the maximum cost of () to search with nanosecond precision. In this case, he finds the correct values of and with () trials. In the next section, we simulate this scenario.
4. Experimental Results for Success Probabilities and Costs
In this section, we make a scenario to demonstrate the success probability of recovery. We benchmark a case of an attack in [
7]. As a discriminant, the leak of random numbers is necessary to ensure that the random numbers are successfully recovered. Therefore, we assume that an attacker can obtain several random numbers, which are exposed to the outside, at least one time. In order to compare the leak value with the guessing value, the attacker performs an exhaustive search using the
extract unit. If the attacker finds the right value, he can obtain the state of the non-blocking pool. In the other words, he obtains the right value of
and
. From these values, he also recovers all of random numbers in
,
, and
.
Algorithm 1 represents the procedure to recover the random numbers. This algorithm receives some input parameters: a leak random number as a discriminant (
d), success probabilities (
), parameters of exponential distributions (
), starting points of searching space (
), and an index of the leak random number (
i). From these parameters, Algorithm 1 returns several results: values of
cycles (
), an internal state of the non-blocking pool, and an array of the random sequence in the
intervals (
), where
indicates
i-th output.
Algorithm 1: Recovery of random numbers from an output leak. |
|
For example, if
is used in a nonce value of ClientHello of SSL,
becomes the PreMasterSecret. It is encrypted with the public key of the server and then transmitted from the client to the server. After this communication, both the client and server generate the shared master key. Because
is exposed in the public channel, an attacker can obtain
and compromise the state of the non-blocking pool by using the leak value as a discriminant.
Figure 11 depicts this scenario.
According to this scenario, we implemented a program to recover random numbers of 700 bytes between and using with and . As the results, it requires 12 h to find the right value with approximate trials of the extract unit (SHA-1 × 2 + mixing function × 1). It is not a real-time analysis on a single PC. However, the attack program can be drastically accelerated with parallel computing.
6. Conclusions
LRNG has several security problems while operating on a PC and smartphone. We investigated whether identical problems occur in the Linux-kernel-based operating system, Brillo. We observed two features of LRNG when operating during Brillo boot time. Random numbers of 700 bytes can be recovered with the probability of 0.90 by the cost of . In conclusion, the entropy of random numbers of 700 bytes is approximately 43 bits. This means that structural improvements and theoretical analysis of its security are required. Various methods can be proposed to improve security.
In the future, we will study LRNG in three perspectives. The first will be to analyze unknown usage of remaining random numbers. The results can be used to find several vulnerabilities for the init process and cryptographic libraries. Secondly, we are planning to consider brand-new boards and their hardware properties later because these boards may expose different features with Edison. Lastly, we will study efficient use of entropy sources in terms of the design. In order to overcome LRNG inefficiency and conservatism, several efficient methods are needed to enable optimal usage of the noise sources.