Side-Channel Attack of Lightweight Cryptography Based on MixColumn: Case Study of PRINCE

Xue, Jizheng; Jiang, Xiaowen; Li, Peng; Xi, Wei; Xu, Changbao; Huang, Kai

doi:10.3390/electronics12030544

Open AccessArticle

Side-Channel Attack of Lightweight Cryptography Based on MixColumn: Case Study of PRINCE

by

Jizheng Xue

¹,

Xiaowen Jiang

^1,*

,

Peng Li

²,

Wei Xi

²,

Changbao Xu

³ and

Kai Huang

⁴

¹

College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China

²

Digital Grid Research Institute, China Southern Power Grid, Guangzhou 510670, China

³

Electric Power Research Institute, Guizhou Power Grid Co., Ltd., Guiyang 550002, China

⁴

School of Micro-Nano Electronics, Zhejiang University, Hangzhou 310027, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(3), 544; https://doi.org/10.3390/electronics12030544

Submission received: 5 December 2022 / Revised: 13 January 2023 / Accepted: 13 January 2023 / Published: 20 January 2023

(This article belongs to the Special Issue Advances of Electronics Research from Zhejiang University)

Download

Browse Figures

Versions Notes

Abstract

:

Lightweight cryptography is implemented in unrolled architecture generally, which has the characteristics of low latency and high real-time performance but also faces the threat of Side-Channel Attack (SCA). Different from traditional loop architecture, the unrolled architecture requires separate protection against SCA in each round. This leads to the unrolled architecture that is very sensitive to the number of rounds that need to be protected against SCA. In this paper, we propose an optimized method for the chosen-input attack that can effectively increase the number of rounds of differential propagation and recover the key from the fourth round of unrolled PRINCE for the first time. This research also evaluates the hardware overhead and performance of two types of Threshold implementation (TI) for PRINCE. The experimental results indicate that TI imposes substantial hardware overhead on the circuit, therefore a specified number of protection rounds is required.

Keywords:

PRINCE; side-channel attack; unrolled architecture; MixColumn

Graphical Abstract

1. Introduction

As the theme of smart gradually comes to life, such as Internet of Things (IoT), mobile computing, big data, blockchain, robotic systems, digital forensic, industrial control systems, connected and automated vehicles (CAVs), and the vital integration of cybersecurity [1], they have attracted considerable interest in recent years. In the field of IoT, the demand for resource-limited devices such as sensor nodes, Radio Frequency Identification (RFID) tags, actuators, etc. is also increasing day by day [2]. These devices generally have extremely few resources. As an example, standard RFID has roughly 10,000 GE, of which about 2000 GE are used for security. For CAVs, this technology needs to transfer data seamlessly and in real-time while protecting the security of the data [3]. Due to the limitations of resources and real-time responsiveness, traditional cryptography cannot meet the demand, while lightweight cryptography can solve this problem well. Lightweight cryptography has the characteristics of low power consumption, small area and low latency. Lightweight ciphers include a class of low-latency ciphers such as PRINCE [4], MANTIS [5] and QARMA [6]. Due to their low latency, these ciphers can be implemented in the unrolled architecture, which concentrates all operations into a single clock. As a result, they have better real-time responsiveness. In [6], the research reports that the delay-area (or delay-power) product of AES [7] is approximately 40 times larger than the respective values of PRINCE. Under the optimal implementation scheme for the area, the area of AES is about 817% of the area of PRINCE, and the power consumption is 766% of the power consumption of PRINCE. At a frequency of 100 kHz, the throughput of PRINCE is 533.3 kpbs, while that of AES is only 56.64 kpbs [2]. In terms of low latency, small area, and low power consumption, traditional cryptography can no longer meet the needs. Therefore, it is necessary to study lightweight cryptography.

In practical applications, cryptography needs to consider countermeasures against Side-Channel Attacks (SCA). Researchers have successfully implemented SCA on the hardware circuit of traditional cryptography in [8,9,10,11,12,13]. Their attack points are focused on the times when registers are updated before and after. Traditional cryptography is generally implemented in loop architecture due to large path delays, which forces designers of circuits to use a large number of registers to store intermediate data, and the update of the registers is closely related to the clock. Registers generate large amounts of dynamic power consumption when they are updated, which is easily captured and utilized by attackers. Since the unrolled architecture lacks clocks and registers, it is challenging to pinpoint an exact attack location on the power consumption curve. Second, the unrolled architecture has a long critical path with a lot of glitch activities in between, which will cause the collected power consumption curve to have a low signal-noise ratio (SNR). In [14], the research shows that unrolled DES [15] has a certain resistance to DPA [16] and CPA with the constraint of clearing the datapath after each encryption. In [17], the research shows that unrolled MAC-PHOTON [18] can resist a first-order CPA attack.

Compared with the SCA on the loop architecture, there is less research in the SCA field of unrolled architecture. However, studies have shown that there are leak points in it. In [19,20], the authors successfully improved the efficiency of the CPA attack by using a t-test [21] to locate the Points-of-Interest (POI) inside the power consumption curve. In [22], the authors successfully implemented DPA on unrolled PRINCE in the first round. In [23], the authors proposed an improved correlation frequency analysis (CFA) [24] attack, which makes it feasible to extract first-order side-channel leakages from combinational logic in the initial rounds of unrolled datapaths. In [25], the authors provided a method for selecting plaintexts for recovery of the key through side-channel analysis. However, this attack method succeeds in a limited number of rounds. The difference in inputs will be masked by the algorithm, and the difference will not be observed after a certain number of rounds. In [26], the authors proposed a leakage model based on differential inputs. All the aforementioned techniques aim to increase the SNR of the power consumption or the effectiveness and precision of the analysis phase. Nevertheless, there is no study on why they cannot attack deeper rounds.

In [27], researchers also investigated the difficulty of implementing countermeasures against SCA on unrolled architecture. Different from the traditional loop architecture, the unrolled architecture has no registers. Therefore, it cannot implement the traditional Threshold Implementation (TI) in [28,29,30], and each round of computing logic is independent of each other, which means that each round of protection needs to be implemented independently. In [27], the experiment showed that the critical-path delay increases to 147%, the area increases to 441%, the throughput decreases to 68%, and the power consumption increases to 255% when TI is implemented on the first and last rounds of PRINCE. In [31], the authors implemented DPA attacks on the first four rounds of GIFT [32] and evaluated the cost of TI. According to experimental findings, TI for an unrolled GIFT causes the area to increase to 3157% of the original value and the frequency to drop to 62.8% of the original value. As can be seen, the cost associated with protection schemes is high for lightweight cryptography and requires special care in the number of rounds that need to be protected.

In [33], the experiment compared unrolled combinational hardware implementations of six lightweight block ciphers. We choose PRINCE [4] which is specially designed for low-latency cipher as the case of our study. In order to improve the number of rounds of differential propagation, we propose an optimized method for the chosen-input attack, a method that prevents the algorithm from masking the difference of inputs in the first few rounds. The method can detect the presence of the difference in a deeper round. In the case of PRINCE, our method makes sure that the difference is not enlarged and masked by the algorithm in the first round and that the difference is easily discernible in the fourth round, as opposed to [25], whose difference in the fourth round has been almost masked by the algorithm and cannot be distinguished. In the paper, we implemented PRINCE with TI. The experimental findings show the enormous area cost of TI for the unrolled architecture, highlighting the need for a thorough investigation of the maximum number of attack rounds.

The main contributions of this paper are as follows:

We propose an optimized method for a chosen-input attack that can effectively increase the number of rounds of differential propagation.
We implement CPA on PRINCE implemented of unrolled architecture in the fourth round.
We evaluate the resource costs associated with achieving various degrees of TI for PRINCE.

The remainder of this paper is organized as follows. Section 2 reviews the research on lightweight cryptography. In particular, we introduce the PRINCE algorithm, a typical low-latency cipher implemented in unrolled architecture. Section 3 introduces the new method for the chosen-input attack considered in this paper. We introduce the power consumption model used for the attack and describe our leakage model and leakage point in detail. Then, we describe the chosen-input attack method and introduce CPA and DPA. At the end of the section, we present PRINCE countermeasures. In Section 4, we demonstrate the possibility and limitation of the above attacks through five sets of experiments. In addition, the countermeasures’ hardware overhead and protection effectiveness are examined. Section 5 summarizes our results and discusses directions for future work.

2. Related Work

In this section, we briefly describe PRINCE, a typical low-latency block cipher, which is proposed by Borgho et al. at the ASIACRYPT 2012 annual meeting [4], and it has the following characteristics:

Encryption and decryption can be realized in a single clock cycle.
Hardware circuit low latency can adapt to high clock environment.
Hardware cost is low (much lower than the unrolled version of AES or PRESENT [34]).
Encryption and decryption share a set of hardware circuits.

PRINCE has a very low hardware implementation cost and latency, and can be widely used in resource-constrained environments. It is a 64-bit block cipher with a 128-bit key. It has a 12-step round function that includes a key addition, a Sbox-layer, a linear layer, and the addition of a round constant in each round. The PRINCE implemented in unrolled architecture is shown in Figure 1, and the algorithm flow is shown in Table 1.

Give E and D as the encryption and decryption operations, respectively, their definitions are found in Table 1. The following expressions apply:

C = E_{(K_{0} | | K_{0}^{^{'}} | | K_{1})} (P)

(1)

P = D_{(K_{0}^{^{'}} | | K_{0} | | K_{1}^{^{'}})} (C)

(2)

We find that

D_{(K_{0} | | K_{0}^{^{'}} | | K_{1})} (.) = E_{(K_{0} | | K_{0}^{^{'}} | | K_{1})} (.)

by Equations (1) and (2), and

K_{1}^{^{'}} = K [64 : 127] \oplus α

, where

α

is the 64-bit constant

α

= 0xc0ac29b7c97c50dd. Thus, for decryption one only has to do a very cheap change to the key and afterward reuse the exact same circuit. In this paper, we only analyze the encryption process.

The data analysis method used in this paper has been successfully applied to traditional cryptography. In [35,36], the authors successfully implemented CPA and DPA attacks on DES [15]. In [37], Lu et al. demonstrated the first and second-order differential power analysis on AES. The above-mentioned attacks on traditional cryptography fix the input in a particular way and then calculate the intermediate values in the inner loop, but the direct application of the above-mentioned attack methods is not feasible for the unrolled architecture. Due to the fact that unrolled architecture has no registers or clocks, the SNR of the power consumption traces is low, and the correlation with the operation sequence of the algorithm is weak, it’s difficult to implement CPA or DPA directly on unrolled architecture. There are reports that algebraic side-channel analysis (ASCA) [38] and soft analytical side-channel analysis (SASCA) [39] can be used to attack the inner rounds of block cipher algorithms. However, the above report is implemented on a microcontroller that processes one byte at a time, which is different from the unrolled architecture, which executes 64-bit data in parallel at one time and completes twelve rounds of iterations in one cycle. This implementation does not allow an attacker to simply obtain any independent byte-function distribution.

The unrolled architecture does more than one round of iterative operations in a single cycle. This lets the key do the deep diffusion all at once. At this time, it is necessary to make stronger assumptions about multi-bit to discover sensitive information. The experimental results show that the correlation power analysis of Hamming distance (HD) models and Hamming weight (HW) models can be resisted if the data path is cleared after each operation [14]. However, in [27], the authors performed a first-order SCA using a non-specific t-test (also known as a fixed versus random t-test) [40] and found a fairly strong first-order leak on the PRINCE implemented in unrolled architecture without countermeasures. The t-test could only detect the presence of leakage but not give any impression of whether the leak was exploitable. In [19,20], the author significantly reduces the number of power consumption traces required to achieve CPA by selecting Points-of-Interest (POI) within the power traces on unrolled architecture. In [25], the authors proposed an extended attack with partially fixed input values to improve the SNR between the first and second rounds of the power traces, but the depth of the CPA attack is also limited to the second round. In [23], the authors proposed an improved CFA [24] attack, which makes it feasible to extract first-order side-channel leakages from combinational logic in the initial rounds of unrolled datapaths. In [26], the authors were able to deepen the attack by using the intermediate values of the first round (i.e., the difference in switching), which showed up as a side channel leakage during the processing of the inner round. However, they were only able to recover all of the keys in the third round, and only 1/16 of the keys were recovered in the fourth round.

The chosen-input attack proposed in this paper can generate a difference of only 1 bit at the MixColumn output in the first round. The minimum difference in the first round decreases from 3 nibbles to 1 nibble as compared to [26], necessitating more rounds of iterations to completely mask this difference. Since only the input from the first round is used as the computational element, we are able to obtain the complete key information from the fourth round regardless of the protection of the first three rounds. We performed a total of five sets of experiments to validate our approach, and in the next section, we present a detailed case study of the PRINCE implemented in unrolled architecture to illustrate why this leakage exists in the unrolled implementation and how many rounds are affected.

3. Side Channel Leakage on PRINCE

In this section, we introduce our principle of attack in detail as well as experimental methods. We introduce the power consumption model, the leakage model and the leakage point are introduced as well. Then we describe the attack method, which is a chosen-input attack.

3.1. Leakage Model

Common power consumption models are mainly the HW model and the HD model. The HW model, denoted as

H W (X)

, is the number of bits at “1” in the internal data structure. The HD model counts the number of different bits in the two data structures, denoted as

H D (X_{1}, X_{2}) = H W (X_{1} \oplus X_{2})

. It has a good mapping relationship in the ASIC field because the basic device of ASIC is registered. The changing data generate large dynamic power consumption during the update process of the registers, while the unchanged data only generate small static power consumption. Therefore, the HD is a good approximation for the power consumption of ASIC. Moreover, our attack method is based on the difference in the input plaintext, not the input plaintext itself. Therefore, we choose the HD model as the power model. The following expression is obtained:

γ_{i, j}^{(r)} = H D (p_{2 i, j}^{(r)}, p_{2 i + 1, j}^{(r)}) = H W (p_{2 i, j}^{(r)} \oplus p_{2 i + 1, j}^{(r)}) = H W (Δ p_{2 i, j}^{(r)})

(3)

in which

γ

is the power consumption, p is plain text,

p_{2 i}

and

p_{2 i + 1}

is the pair of Input-Differential data (the construction method is described in detail in Section 3.2), r is the round number of the observed S-box, i is the count flag of plain texts, and j is the attack position. Equation (3) estimates the HD associated with the leakage model.

In PRINCE, 4 bits are taken as a unit and denoted as a nibble, and four consecutive nibbles form a halfword (16 bits). Therefore, there are sixteen nibbles and four halfwords in the data structure of each round, which are recorded as nibble

[n]

and halfword

[m]

, respectively, where

n \in [0, 15]

and

m \in [0, 3]

, see Figure 2 for details.

Because the method proposed attacks three nibbles at a time, the 64-bit data is divided into eight attack positions, see Table 2 for details.

For the attack point of the leakage model, we choose the output of the S-box in each round, because the S-box is the only nonlinear device in the entire algorithm, and because of its nonlinear characteristics, it has uneven differential properties, which shows that different Input-Differential data produce a very distinguishable Output-Differential data in S-box. It is conducive to attacker analysis.

The power consumption of the actual hardware circuit is mainly composed of dynamic power consumption and static power consumption. Static power consumption is generally related to the state maintained by the circuit. If the state of each device in the circuit remains unchanged, the static power consumption remains unchanged, while the dynamic power consumption occurs when the state of the device changes. In the loop architecture, a large amount of dynamic power consumption is generated when registers are updated, and in the unrolled circuit, the part of Input-Differential data is fixed due to the fact that the unrolled architecture has no registers and clock, and only the bits in the attack position are changed so that a part of devices in the circuit are in “static” state. This allows the circuit to generate only the dynamic power due to the differential bits [25]. For example, we construct a 64-bit random plaintext

P_{0, 7}

at first, put it to the device under test, and keep the input data unchanged. At this time, the circuit is in a stable state and has no dynamic power consumption. Then we construct a differential input

P_{1, 7}

(assuming that only the nibble [13:15] is changed), which is fed into the device under test. Since the first twelve nibbles are not changed, the first twelve S-boxes and their associated logic in the first round do not generate dynamic power consumption, and at this time only the No. 13 to No. 15 S-boxes and their associated logic in the circuit are changed, generating large dynamic power consumption (see Figure 3). The difference can be transmitted to the next round, but it is constantly masked by the algorithm module diffusion in the later rounds. The differential changes in the later rounds gradually converge to the average value (32 bits), and the correlation with the input data gradually weakens (see Figure 4 and Figure 5). Figure 4 shows the HD of the S-box for each round when the difference of Input-Differential data is 1 bit [26]. It can be seen from Figure 4 that the differential deviation provided by [26] has tended to the mean value (32 bits) in the fourth round, and the key information cannot be obtained from it.

In PRINCE, RC

_{r}

-add (RC[r]), KEY

_{i}

-add (K

_{i}

) and Shift Row (SR) do not affect the differential characteristics. Therefore, we focus on the SubCell and MixColumn modules. The cipher uses one 4-bit S-box. The definition of the S-box in hexadecimal notation is given in Table 3.

In the entire algorithm, the S-box is the most important module for determining the differential path. From Table 3, it can be seen that the S-box of PRINCE is a bijective S-box, which means that any input is mapped to a unique output and vice versa. In [4], the authors emphasize the differential properties considered in the design of the S-box to keep the maximum differential value within 1/4, but our attack method does not depend on the goodness of the differential properties of the S-box itself but only uses the bijective mapping of the S-box. We cannot calculate the output of S-boxes in the first round accurately since we do not know the key. However, since the key is fixed, we can traverse the output of the S-box in the first round by changing the input plaintext.

In the MixColumn layer, the 64-bit state is multiplied by a

64 \times 64

matrix M. We recall from the specification of PRINCE [4] that the 64-bit linear transformation M is defined as Equation (4), and the definitions of

\hat{M^{(0)}}

and

\hat{M^{(1)}}

are in Equation (5). In Equation (6) we find the definitions of

M_{0}

,

M_{1}

,

M_{2}

and

M_{3}

. In hardware, this matrix multiplication is implemented with the rerouting and the XOR layer shown in Figure 2.

The MixColumn of PRINCE takes reference from the MixColumn of AES, but the MixColumn of PRINCE does not use a multiplication operation similar to AES in order to achieve lightweight blending. As a result, each bit output of the MixColumn layer is only affected by the output of the three upper S-boxes. In particular, if three outputs of S-boxes within the same halfword only have 1 bit changed at the same position, the 16-bit output of MixColumn layer within a halfword will produce only 1 bit changed after MixColumn layer. Given that the attack is carried out by scanning subsets of 12 bits in the plain-text (three S-boxes at a time), there is a chance that the outputs collide at the MixColumn layer, producing this expected 1-bit change; see Figure 2.

M = [\begin{matrix} \hat{M^{(0)}} & 0 & 0 & 0 \\ 0 & \hat{M^{(1)}} & 0 & 0 \\ 0 & 0 & \hat{M^{(1)}} & 0 \\ 0 & 0 & 0 & \hat{M^{(0)}} \end{matrix}]

(4)

\begin{matrix} \hat{M^{(0)}} = [\begin{matrix} M_{0} & M_{1} & M_{2} & M_{3} \\ M_{1} & M_{2} & M_{3} & M_{0} \\ M_{2} & M_{3} & M_{0} & M_{1} \\ M_{3} & M_{0} & M_{1} & M_{2} \end{matrix}], \hat{M^{(1)}} = [\begin{matrix} M_{1} & M_{2} & M_{3} & M_{0} \\ M_{2} & M_{3} & M_{0} & M_{1} \\ M_{3} & M_{0} & M_{1} & M_{2} \\ M_{4} & M_{1} & M_{2} & M_{3} \end{matrix}] \end{matrix}

(5)

\begin{matrix} M_{0} = [\begin{matrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}], M_{1} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}], \\ M_{2} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}], M_{3} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}] \end{matrix}

(6)

We analyze the depth of such differential propagation. As seen in Figure 5, such differences still have significant deviations in the fourth round of S-box (the mean value is about 27 bits) by controlling three inputs of S-boxes in the first round. Compared with the method of [26], the method in this paper effectively increases the depth of difference propagation.

The attack method proposed may have difficulties with the SCA of standard round ciphers, because the S-box of theirs is generally 8 bits, and their MixColumn module not only uses multiplication operations but also operates on more elements than lightweight cryptography, which makes the space of hypothetical key too large when performing analytical calculations if multiple S-boxes are controlled at the same time. However, it is easy for lightweight cryptography. In order to pursue low latency and small area, the S-boxes of lightweight algorithms are generally small, and the MixColumn operation is relatively simple. Even if three S-boxes are controlled at the same time, the data to be analyzed is only 12 bits. Because the space of a single hypothetical key is only

2^{12} = 4096

, our attack method is successful.

3.2. Chosen-Input Attack

In this section, we describe our attack method in detail, including how to build a differential pair as well as the implementation of CPA and DPA. The attack method is divided into two phases: the data collection phase and the data analysis phase. In the data collection phase, only one attack position (j) needs to be controlled and the rest of the data is not concerned. However, to improve the HD observation, we suggest using random data in these positions. The data collection phase is described as follows:

Generate random 64-bit plaintext $p_{2 i, j}$ .
Generate 12-bit random differential values $x_{2 i, j}$ . Then it is placed in the attack position (j), and the rest of the positions are supplemented with 0 for 64-bit data. For example, when $j = 7$ , then $x_{2 i, j}$ = $0 x 0000$ _0000_0000_ $0 x x x$ , (x means random number).
Generate the corresponding differential plaintext $p_{2 i + 1, j} = p_{2 i, j} \oplus x_{2 i, j}$ .
Input $p_{2 i, j}$ to the device under test, and input $p_{2 i + 1, j}$ after the device is stabilized. Record this power consumption $Γ_{i, j}$ and corresponding plaintext differential pairs $p_{2 i, j}$ and $p_{2 i + 1, j}$ .
Repeat steps 1–4 multiple times for the same attack position j.
Change the attack position (j) and repeat steps 1–5 for several times.

As shown in Figure 3 (

i = 0

,

j = 7

), the random plaintext is

p_{0, 7}

= 0x1867_ffc0_4ce5_2bab, the random difference value is

x_{0, 7}

= 0xf50, the free position is filled with 0 to form the input difference

x_{0, 7}

= 0x0000_0000_0000_0f50, and calculate

p_{1, 7} = p_{0, 7} \oplus x_{0, 7}

= 0x1867_ffc0_4ce5_24fb. Because we need to measure the circuit’s power consumption following the input of two consecutive differential plaintexts, we must first input

p_{0, 7}

and wait for the circuit to become stable before inputting

p_{1, 7}

. We input

p_{0, 7}

at time

t_{0}

, followed by

p_{1, 7}

at time

t_{1}

, and then we record the power consumption during the time interval T1.

In the data analysis stage, we focus on CPA and DPA. It is difficult to implement DPA in unrolled architecture directly, because the unrolled architecture processes all the data in one cycle, the correlation between the power traces and the power model is very poor, which makes the DPA effect unsatisfactory. CPA has a higher utilization of the power consumption traces by calculating the correlation between each point on the power consumption traces and the power consumption model, which makes CPA have good results on the unrolled architecture, but the calculation of CPA is more complex, which is closely related to the number of sampling points. Then we describe how DPA and CPA are implemented in detail. DPA is described as follows:

Determine the distinguisher. In this paper, we select the mean value distinguisher (6 bits) due to the total amount of 12-bit HD.
Determine the attack position (j).
Make a 12-bit hypothetical key $g k_{j}$ .
Calculate the HD of the S-box with the hypothetical key and the input plaintext pairs in the first round. The calculation formula is as follows:

$H D_{S} (p_{2 i, j}, p_{2 i + 1, j}) = H W (S (p_{2 i, j} \oplus g k_{j}) \oplus S (p_{2 i + 1, j} \oplus g k_{j}))$

(7)

in which S represents SubCell. Equation (7) allows us to calculate the HD at the output of the first S-box.
Calculate the mean value of the corresponding single power consumption trace, and if $H D > 6$ , add the mean value of the power consumption traces to the set TH, otherwise add it to the set TL.
Repeating the steps 4–5, traverse all the difference pairs to obtain the distinguished sets TH and TL, calculate the subtraction difference between the mean values of the two sets, and record the absolute value as the difference value of the hypothetical key.
Repeat steps 3–6, traverse all the hypothetical keys to obtain the difference values of all hypothetical keys. The key of the DPA attack at position j is the hypothetical key with the maximum difference value.
Repeat steps 2–7, traverse all attack positions and combine the results of each attack position to get the final key of the DPA.

It can be seen from the above that DPA analyzes the whole power consumption traces, but the power consumption model used is limited to HD of the first round. Therefore, the power consumption model cannot map the whole power consumption well in unrolled architecture, and in order to implement DPA, we require more power traces than CPA.

Before introducing CPA, we need to define the gain function. There is a correlation between the output of the MixColumn in the first round and the output of the S-box in the following rounds. In view of the fact that the HD of the MixColumn layer is only twelve cases (ignore HD = 0), we have established a relationship between these twelve values of HD of MixColumn and HD of S-box in the following rounds through a large number of tests. Find the results in Figure 6 and Table 4. In Equation (8) we find the definitions of

η

.

η = \frac{H {\bar{D}}_{S}^{r}}{H {\bar{D}}_{M}^{1}}

(8)

in which r is a round number and

r \in [1, 12]

, but only

η

of the first five rounds are taken into account in this paper,

H {\bar{D}}_{M}^{1}

is the HD average of the MixColumn in the first round,

H {\bar{D}}_{S}^{r}

is the HD average of the S-boxes in round r.

CPA is described as follows:

Determine the attack position (j).
Make a 12-bit hypothetical key $g k_{j}$ .
Calculate the HD of MixColumn with the hypothetical key and the input plaintext pairs in the first round. The calculation formula is as follows:

$H D_{M} (p_{2 i, j}, p_{2 i + 1, j}) = H W (M (S (p_{2 i, j} \oplus g k_{j})) \oplus M (S (p_{2 i + 1, j} \oplus g k_{j})))$

(9)

in which S represents SubCell and M represents MixColumn, Equation (9) allows us to calculate the HD at the output of the first MixColumn.
Using the gain function, calculate the HD of the S-box of each round (HD $_{S}$ ).
Calculate the correlation $ρ$ between $γ_{i, j}$ and the corresponding power consumption traces $Γ_{i, j}$ . To illustrate the correlation between $γ_{i, j}$ and $Γ_{i, j}$ , we use the Pearson correlation coefficient. The calculation formula is as follows:

$ρ_{j} = \frac{\sum_{i} (γ_{i, j} - {\bar{γ}}_{j}) (Γ_{i, j} - {\bar{Γ}}_{i, j})}{\sqrt{\sum_{i} (γ_{i, j} - {\bar{γ}}_{j})} \sqrt{\sum_{i} (Γ_{i, j} - {\bar{Γ}}_{i, j})}}$

(10)
Repeat steps 2–5, get the correlation coefficients of all hypothetical keys, and then find the key with the largest correlation coefficient from them, which is the key of CPA at the j position.
Repeat steps 1–6, traverse all attack positions and combine the results of each attack position to get the final key of CPA.

In Equation (10),

{\bar{γ}}_{j} ¯

means average value of

γ_{i, j}

and

{\bar{Γ}}_{j} ¯

means average value of

Γ_{i, j}

. From the above, it can be seen that CPA performs a correlation calculation for each power consumption point. Therefore, CPA has a better attack effect compared with DPA, but the computational complexity is also higher.

We actually get

K_{0} \oplus K_{1}

when successfully using CPA or DPA to obtain the key. At this time, if we want to get the 128-bit original key K, we have two solutions. (1) Perform SCA on the decryption stage in the same way. Similarly, we can obtain

(K_{0}^{^{'}} \oplus K_{1} \oplus α)

, and we can easily obtain separate

K_{0}

and

K_{1}

, because

K_{0}^{^{'}} = (K_{0} > > > 1) \oplus (K_{0} > > 63)

and

α

is known. (2) Attack

K_{1}

in each round using the known

(K_{0} \oplus K_{1})

to restore the original key K [25].

3.3. Countermeasures

This section introduces mask-based countermeasures and evaluates their hardware overhead and level of protection. In 2006, Nikova et al. proposed a countermeasure based on secret-sharing with multi-party computation, known as a TI [29]. Even when glitching is present, the TI approach has proven to be secure. For defense against higher-order DPA attacks, Bilgin et al. implemented higher-order TI in [41].

The TI scheme used in this paper for PRINCE is referenced in [27,42]. Refs. [43,44] provides a thorough categorization of S-boxes from 3 to 4 bits, with the S-box of PRINCE having an algebraic degree of 3. This implies that there will be at least 3 + 1 = 4 components for the directly shared TI scheme of PRINCE. We have implemented two types of TI. The first one uses four components directly to participate in the operation, but this does not meet the uniformity property of TI, so we must add random numbers. As shown in Figure 7.

In [45], the S-box of PRESENT is decomposed of the authors, who lower its algebraic degree from 3 to 2, which greatly decreases the TI cost but raises the originally shared functions from 1 to 3. Registers must be inserted in the middle of each shared function to avoid glitching. After reading about the TI approaches in [27,42,45], we made the TI protection architecture shown in Figure 8. The scheme satisfies the uniformity property of TI, so we do not need an extra random number. For detailed information, the interested reader is referred to the original articles [27,29,45].

To evaluate whether the countermeasures are effective, we choose the non-characteristic t-test [46,47], a technique that was shown in [27,48,49] to be effective in observing the extent of leakage.

4. Experiment

The experiments in this paper are completed by Electronic Design Automation (EDA) tools, and the experimental architecture is shown in Figure 9 and Table 5.

In order to prove that the leakage point proposed in this paper can also be detected under the condition that the first three rounds are protected. As shown in Figure 10, five sets of experiments are set up in this paper, which are as follows:

Group A: All twelve rounds calculated by hardware.
Group B: Pre-calculate the first round by software and pass the pre-calculated result to the hardware to complete the next eleven rounds.
Group C: Pre-calculate the first two rounds by software and pass the pre-calculated result to the hardware to complete the next ten rounds.
Group D: Pre-calculate the first three rounds by software and pass the pre-calculated result to the hardware to complete the next nine rounds.
Group E: Pre-calculate the first four rounds by software and pass the pre-calculated result to the hardware to complete the next eight rounds.

The side-channel information of the previous rounds can be completely shielded by software, thus proving that the leakage of the side-channel information comes from the hardware circuit without protection. The experimental steps are as follows:

Generate netlist, Standard Delay Format (SDF) and Synopsys Design Constraints (SDC) according to Resistor Transistor Logic (RTL) by Design Compiler (DC) with SMIC 55 nm.
Construct differential pairs following Section 3.2.
Simulate with differential pairs, SDF, and netlist to generate Fast Signal DataBase (FSDB) simulation waveform file by Verilog Compiled Simulator (VCS).
Simulate the power consumption with the FSDB and SDC, and obtain the power consumption traces by PrimeTime PX (PTPX).
Repeat steps 2–4 to obtain a sufficient number of power consumption traces and differential pairs.
Run CPA or DPA to analyze power consumption traces and difference pairs, then obtain the key.
Change the RLT code, repeat steps 1–6, and observe the experimental results of each RTL version.

The experimental results of the five groups of experiments are shown in the Figure 11, Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16, among which (A) group of experiments performed DPA and CPA at the same time, and the remaining four groups only performed CPA. In (A), both DPA and CPA were successful, and the traces of the correct key were significantly more prominent compared with other hypothetical keys. In (B), (C), (D), the trace for the correct key is significantly higher compared with the other hypothetical keys. In (E), the ranking of correct keys was found not to be significantly prominent and the correlations of all hypothetical keys were below 0.2, indicating that the bias caused by the input difference was already difficult to observe in the fifth round. In order to verify the security of the fifth round, we performed CPA on different numbers of power consumption traces. It can be seen from Figure 16 that with the increase in the number of power consumption traces, the ranking of the correct key has no upward trend, which means that no matter how many power consumption traces there are, the correct key information cannot be obtained.

To demonstrate the impact of the method in this paper, we added a set of tests using the same experimental settings as group D but the method used in [25,26]. Figure 17 illustrates the experimental findings, and it shows that the approach in [25,26] cannot extract the key in the fourth round since the ranking of true keys does not improve as the number of tests grows. From the experiments, we observed that the number of power consumption traces required for an attack also increases with the increase in the number of attack rounds. With the increase in the number of attack round, the leakage degree of key information gradually decreases, which is in line with the trend in Figure 6.

We assessed the hardware overhead and performance of two types of TI schemes, as shown in Table 6, to determine the effect of the countermeasure on the unrolled architecture. Scheme 1 represents the architecture shown in Figure 7, and Scheme 2 represents the architecture shown in Figure 8.

Table 6 shows that when the number of protected rounds climbs, the hardware overhead of the circuit increases significantly. Scheme 1 has a substantially greater throughput than Scheme 2, despite having much larger cells and more random numbers needed. We prioritized the area. Hence, for Scheme 2, a t-test was carried out. The test results are displayed in Figure 18. The red horizontal lines represent the points where the range of

[- 4.5, 4.5]

. The result indicates that the t-value during the masked rounds stays within the range of

[- 4.5, 4.5]

and that later rounds result in larger t-values. However, as discussed in the prior article, the power consumption data of rounds 5 to 10 cannot be used anymore, making the first 4 rounds’ protection a superior cost-security trade-off.

To summarize, we have sorted out the SCA on PRINCE in unrolled architecture in recent years in Table 7.

5. Conclusions

In this paper, we presented an optimized method for the chosen-input attack for unrolled architecture based on MixColumn. The method prevents the algorithm from masking the difference in inputs in the first few rounds so that the attacker can observe the difference in deeper rounds. We are able to get the key back in the fourth round of PRINCE by using the methods in this paper. The existence and validity of this side-channel leakage were demonstrated through five sets of experiments involving PRINCE implemented on EDA tools. In order to demonstrate the expense of TI for the unrolled architecture, this study also compares two types of TI for the unrolled PRINCE. According to experimental findings, each round of protection significantly increases hardware overhead. Due to the leakage generated by S-box and MixColumn, we expect it to also be found in other low-latency block ciphers, such as MANTIS. However, the approach suggested in the article both widens the hypothesized key space and raises the computing complexity of the analysis stage.

We will continue to study SCA on lightweight cryptography for deeper rounds in the future. At the same time, the protection of the unrolled architecture has been an outstanding matter, and the research on the countermeasures against SCA will also be very interesting.

Author Contributions

Conceptualization, J.X., X.J. and K.H.; Methodology, K.H.; Validation, J.X.; Formal analysis, J.X.; Investigation, J.X. and X.J.; Resources, P.L., W.X., C.X. and K.H.; Writing—original draft, J.X.; Writing—review & editing, J.X. and X.J.; Supervision, X.J., P.L., W.X., C.X. and K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key R&D Program of China (2020YFB0906000, 2020YFB0906001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank Chunyi Hu, Jihu Liang, Jiajie Mao and Shite Zhu for their beneficial suggestions and comments. Furthermore, we would like to thank the reviewers for helping us to improve the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Javed, A.R.; Shahzad, F.; ur Rehman, S.; Zikria, Y.B.; Razzak, I.; Jalil, Z.; Xu, G. Future smart cities requirements, emerging technologies, applications, challenges, and future aspects. Cities 2022, 129, 103794. [Google Scholar] [CrossRef]
Thakor, V.A.; Razzaque, M.A.; Khandaker, M. Lightweight Cryptography Algorithms for Resource-Constrained IoT Devices: A Review, Comparison and Research Opportunities. IEEE Access 2021, 9, 28177–28193. [Google Scholar] [CrossRef]
Javed, A.R.; Usman, M.; Rehman, S.U.; Khan, M.U.; Haghighi, M.S. Anomaly detection in automated vehicles using multistage attention-based convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4291–4300. [Google Scholar] [CrossRef]
Borghoff, J.; Canteaut, A.; Güneysu, T.; Kavun, E.B.; Knezevic, M.; Knudsen, L.R.; Leander, G.; Nikov, V.; Paar, C.; Rechberger, C.; et al. PRINCE—A low-latency block cipher for pervasive computing applications. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Beijing, China, 2–6 December 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 208–225. [Google Scholar]
Beierle, C.; Jean, J.; Kölbl, S.; Leander, G.; Sim, S.M. The SKINNY Family of Block Ciphers and Its Low-Latency Variant MANTIS. In Proceedings of the Annual Cryptology Conference, Santa Barbara, CA, USA, 14–18 August 2016. [Google Scholar]
Avanzi, R. The QARMA block cipher family. Almost MDS matrices over rings with zero divisors, nearly symmetric even-mansour constructions with non-involutory central rounds, and search heuristics for low-latency s-boxes. IACR Trans. Symmetric Cryptol. 2017, 2017, 4–44. [Google Scholar] [CrossRef]
NIST. Specification for the Advanced Encryption Standard (AES). In FIPS-197; NIPS: Gaithersburg, MD, USA, 2001. [Google Scholar]
Quisquater, J.J.; Samyde, D. ElectroMagnetic Analysis (EMA): Measures and Counter-Measures for Smart Cards; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Gandolfi, K.; Mourtel, C.; Olivier, F. Electromagnetic Analysis: Concrete Results. In Proceedings of the Cryptographic Hardware and Embedded Systems—CHES, Paris, France, 14–16 May 2001. [Google Scholar]
Dhananjay, K.; Salman, E. Charge Based Power Side-Channel Attack Methodology for an Adiabatic Cipher. Electronics 2021, 10, 1438. [Google Scholar] [CrossRef]
Morales Romero, J.d.J.; Reyes Barranca, M.A.; Tinoco Varela, D.; Flores Nava, L.M.; Espinosa Garcia, E.R. SCA-Safe Implementation of Modified SaMAL2R Algorithm in FPGA. Micromachines 2022, 13, 1872. [Google Scholar] [CrossRef] [PubMed]
Mangard, S.; Oswald, E.; Popp, T. Power Analysis Attacks: Revealing the Secrets of Smart Cards; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008; Volume 31. [Google Scholar]
Zhou, F.; Zhang, B.; Wu, N.; Bu, X. The design of compact SM4 encryption and decryption circuits that are resistant to bypass attack. Electronics 2020, 9, 1102. [Google Scholar] [CrossRef]
Bhasin, S.; Guilley, S.; Sauvage, L.; Danger, J.L. Unrolling cryptographic circuits: A simple countermeasure against side-channel attacks. In Proceedings of the Cryptographers’ Track at the RSA Conference, San Francisco, CA, USA, 1–5 March 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 195–207. [Google Scholar]
Pub, F. Data encryption standard (des). In FIPS PUB; NIPS: Gaithersburg, MD, USA, 1999; pp. 46–583. [Google Scholar]
Kocher, P.; Jaffe, J.; Jun, B. Differential power analysis. In Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 15–19 August 1999; Springer: Berlin/Heidelberg, Germany, 1999; pp. 388–397. [Google Scholar]
Nalla Anandakumar, N. SCA Resistance Analysis on FPGA Implementations of Sponge Based MAC-PHOTON. In Proceedings of the International Conference for Information Technology and Communications, Bucharest, Romania, 11–12 June 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 69–86. [Google Scholar]
Guo, J.; Peyrin, T.; Poschmann, A. The PHOTON family of lightweight hash functions. In Proceedings of the Annual Cryptology Conference, Santa Barbara, CA, USA, 14–18 August 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 222–239. [Google Scholar]
Yli-Mäyry, V.; Homma, N.; Aoki, T. Improved power analysis on unrolled architecture and its application to PRINCE block cipher. In Proceedings of the Lightweight Cryptography for Security and Privacy, Bochum, Germany, 10–11 September 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 148–163. [Google Scholar]
Yli-Mäyry, V.; Homma, N.; Aoki, T. Power analysis on unrolled architecture with points-of-interest search and its application to PRINCE block cipher. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2017, 100, 149–157. [Google Scholar] [CrossRef]
Welch, B.L. The generalization of ‘STUDENT’S’problem when several different population varlances are involved. Biometrika 1947, 34, 28–35. [Google Scholar] [PubMed]
Takemoto, S.; Nozaki, Y.; Yoshikawa, M. Differential power analysis using chosen-plaintext for unrolled PRINCE. In Proceedings of the 2018 International Conference on Robotics, Control and Automation Engineering, Beijing, China, 26–28 December 2018; pp. 152–155. [Google Scholar]
Chawla, N.; Singh, A.; Rahman, N.M.; Kar, M.; Mukhopadhyay, S. Extracting side-channel leakage from round unrolled implementations of lightweight ciphers. In Proceedings of the 2019 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), McLean, VA, USA, 5–10 May 2019; pp. 31–40. [Google Scholar]
Schimmel, O.; Duplys, P.; Boehl, E.; Hayek, J.; Bosch, R.; Rosenstiel, W. Correlation power analysis in frequency domain. In Proceedings of the COSADE 2010 First International Workshop on Constructive SideChannel Analysis and Secure Design, Darmstadt, Germany, 4–5 February 2010. [Google Scholar]
Yli-Mäyry, V.; Homma, N.; Aoki, T. Chosen-input side-channel analysis on unrolled light-weight cryptographic hardware. In Proceedings of the 2017 18th International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 14–15 March 2017; pp. 301–306. [Google Scholar]
Yli-Mäyry, V.; Ueno, R.; Miura, N.; Nagata, M.; Bhasin, S.; Mathieu, Y.; Graba, T.; Danger, J.L.; Homma, N. Diffusional Side-Channel Leakage From Unrolled Lightweight Block Ciphers: A Case Study of Power Analysis on PRINCE. IEEE Trans. Inf. Forensics Secur. 2020, 16, 1351–1364. [Google Scholar] [CrossRef]
Moradi, A.; Schneider, T. Side-channel analysis protection and low-latency in action. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Hanoi, Vietnam, 4–8 December 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 517–547. [Google Scholar]
Nikova, S.; Rijmen, V.; Schläffer, M. Secure hardware implementation of nonlinear functions in the presence of glitches. J. Cryptol. 2011, 24, 292–321. [Google Scholar] [CrossRef]
Nikova, S.; Rechberger, C.; Rijmen, V. Threshold implementations against side-channel attacks and glitches. In Proceedings of the International Conference on Information and Communications Security, Raleigh, NC, USA, 4–7 December 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 529–545. [Google Scholar]
Bonnecaze, A.; Liardet, P.; Venelli, A. AES side-channel countermeasure using random tower field constructions. Des. Codes Cryptogr. 2013, 69, 331–349. [Google Scholar] [CrossRef] [Green Version]
Satheesh, V.; Shanmugam, D. Secure realization of lightweight block cipher: A case study using GIFT. In Proceedings of the International Conference on Security, Privacy, and Applied Cryptography Engineering, Kanpur, India, 15–19 December 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 85–103. [Google Scholar]
Banik, S.; Pandey, S.K.; Peyrin, T.; Sasaki, Y.; Sim, S.M.; Todo, Y. GIFT: A Small Present (Full version). In International Conference on Cryptographic Hardware and Embedded Systems; Springer: Cham, Switzerland, 2017; pp. 321–345. [Google Scholar]
Maene, P.; Verbauwhede, I. Single-cycle implementations of block ciphers. In Proceedings of the Lightweight Cryptography for Security and Privacy, Bochum, Germany, 10–11 September 2015; Springer: Berlin/Heidelberg, Germany, 2016; pp. 131–147. [Google Scholar]
Bogdanov, A.; Knudsen, L.R.; Leander, G.; Paar, C.; Vikkelsoe, C. PRESENT: An ultra-lightweight block cipher. In Proceedings of the Cryptographic Hardware and Embedded Systems—CHES 2007, 9th International Workshop, Vienna, Austria, 10–13 September 2007. [Google Scholar]
Lv, J. On two DES implementations secure against differential power analysis in smart-cards. Inf. Comput. 2006, 204, 1179–1193. [Google Scholar] [CrossRef] [Green Version]
Ren, Y.; Wu, L.; Li, H.; Li, X.; Zhang, X.; Wang, A.; Chen, H. Key recovery against 3DES in CPU smart card based on improved correlation power analysis. Tsinghua Sci. Technol. 2016, 21, 210–220. [Google Scholar] [CrossRef] [Green Version]
Lu, J.; Pan, J.; Hartog, J.d. Principles on the security of AES against first and second-order differential power analysis. In Proceedings of the International Conference on Applied Cryptography and Network Security, Beijing, China, 22–25 June 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 168–185. [Google Scholar]
Renauld, M.; Standaert, F.X. Algebraic side-channel attacks. In Proceedings of the International Conference on Information Security and Cryptology, Beijing, China, 12–15 December 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 393–410. [Google Scholar]
Veyrat-Charvillon, N.; Gérard, B.; Standaert, F.X. Soft analytical side-channel attacks. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Kaoshiung, Taiwan, 7–11 December 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 282–296. [Google Scholar]
Cooper, J.; DeMulder, E.; Goodwill, G.; Jaffe, J.; Kenworthy, G.; Rohatgi, P. Test vector leakage assessment (TVLA) methodology in practice. In Proceedings of the International Cryptographic Module Conference, Ottawa, ON, Canada, 20–22 September 2013; Volume 20. [Google Scholar]
Bilgin, B.; Gierlichs, B.; Nikova, S.; Nikov, V.; Rijmen, V. Higher-order threshold implementations. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Kaoshiung, Taiwan, 7–11 December 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 326–343. [Google Scholar]
Božilov, D.; Knežević, M.; Nikov, V. Optimized threshold implementations: Securing cryptographic accelerators for low-energy and low-latency applications. J. Cryptogr. Eng. 2022, 12, 15–51. [Google Scholar] [CrossRef]
Bilgin, B.; Nikova, S.; Nikov, V.; Rijmen, V.; Stütz, G. Threshold implementations of all 3 × 3 and 4 × 4 S-boxes. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems, Leuven, Belgium, 9–12 September 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 76–91. [Google Scholar]
Bilgin, B.; Nikova, S.; Nikov, V.; Rijmen, V.; Tokareva, N.; Vitkup, V. Threshold implementations of small S-boxes. Cryptogr. Commun. 2015, 7, 3–33. [Google Scholar] [CrossRef] [Green Version]
Poschmann, A.; Moradi, A.; Khoo, K.; Lim, C.W.; Wang, H.; Ling, S. Side-channel resistant crypto for less than 2300 GE. J. Cryptol. 2011, 24, 322–345. [Google Scholar] [CrossRef]
Ding, A.A.; Chen, C.; Eisenbarth, T. Simpler, faster, and more robust t-test based leakage detection. In Proceedings of the International Workshop on Constructive Side-Channel Analysis and Secure Design, Graz, Austria, 14–15 April 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 163–183. [Google Scholar]
Standaert, F.X. How (not) to use welch’s t-test in side-channel security evaluations. In Proceedings of the International Conference on Smart Card Research and Advanced Applications, Montpellier, France, 12–14 November 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 65–79. [Google Scholar]
Schneider, T.; Moradi, A. Leakage assessment methodology. J. Cryptogr. Eng. 2016, 6, 85–99. [Google Scholar] [CrossRef]
Durvaux, F.; Standaert, F.X.; Del Pozo, S.M. Towards easy leakage certification: Extended version. J. Cryptogr. Eng. 2017, 7, 129–147. [Google Scholar] [CrossRef]

Figure 1. Unrolled architecture of PRINCE.

Figure 2. Schematic diagram of PRINCE leakage model.

Figure 3. Simulation waveform for

P_{0, 7}

and

P_{1, 7}

.

Figure 3. Simulation waveform for

P_{0, 7}

and

P_{1, 7}

.

Figure 4. The diffusion of difference by the attack method with 4-bit changed (using 200,000 sets of data).

Figure 5. The diffusion of difference by the attack method with 12-bit changed (using 200,000 sets of data).

Figure 6. Mean value diagram of HD of S-box of each round corresponding to HD of MixColumn in the first round.

Figure 7. Directly TI of PRINCE.

Figure 8. TI for decomposing S-box of PRINCE.

Figure 9. Experimental environment and experimental flow.

Figure 10. Five implementations of PRINCE in unrolled architecture.

Figure 11. Group A: CPA result of experiment of all 12 rounds calculated by hardware.

Figure 12. Group A: DPA result of experiment with all 12 rounds calculated by hardware.

Figure 13. Group B: CPA result of experiment with 1 round calculated by software.

Figure 14. Group C: CPA result of experiment with 2 rounds calculated by software.

Figure 15. Group D: CPA result of experiment with 3 rounds calculated by software.

Figure 16. Group E: CPA result of experiment with 4 rounds calculated by software.

Figure 17. CPA attack on the fourth round of PRINCE using the method in [25,26].

Figure 18. t-test for protection with Scheme 2 for 4 rounds.

Table 1. Algorithm flow of PRINCE.

Input: 64-bit plaintext (P) and 128-bit key (K)
Output: 64-bit ciphertext (C)
Key expansion: Construct 64-bit whitening keys $K_{0}$ , $K_{0}^{^{'}}$ , round key $K_{1}$
and reflection key $K_{1}^{^{'}}$
$K_{0} = K [0 : 63]$
$K_{0}^{^{'}} = (K_{0} > > >^{(1)} 1) \oplus (K_{0} > >^{(2)}$ 63)
$K_{1} = K [64 : 127]$
$K_{1}^{^{'}} = K [64 : 127] \oplus α^{(3)}$
Encryption:Generate ciphertext C	Decryption:Generate plaintext P
$M D^{(4)} = P \oplus K_{0} \oplus K_{1} \oplus R C [0]^{(8)}$	$M D = C \oplus K_{0}^{^{'}} \oplus K_{1}^{^{'}} \oplus R C [0]$
for $r = 1; r < = 5; r + +$ do	for $r = 1; r < = 5; r + +$ do
$M D = S^{(5)} (M D)$	$M D = S (M D)$
$M D = M^{(6)} (M D)$	$M D = M (M D)$
$M D = S R^{(7)} (M D)$	$M D = S R (M D)$
$M D = M D \oplus R C [r] \oplus K_{1}$	$M D = M D \oplus R C [r] \oplus K_{1}^{^{'}}$
endif	endif
$M D = S (M D)$	$M D = S (M D)$
$M D = M (M D)$	$M D = M (M D)$
$M D = S^{- 1}^{(9)} (M D)$	$M D = S^{- 1} (M D)$
for $r = 6; r < = 10; r + +$ do	for $r = 6; r < = 10; r + +$ do
$M D = M D \oplus R C [r] \oplus K_{1}$	$M D = M D \oplus R C [r] \oplus K_{1}^{^{'}}$
$M D = S R^{- 1}^{(10)} (M D)$	$M D = S R^{- 1} (M D)$
$M D = M (M D)$	$M D = M (M D)$
$M D = S^{- 1} (M D)$	$M D = S^{- 1} (M D)$
endif	endif
$C = M D \oplus R C [11] \oplus K_{1} \oplus K_{0}^{^{'}}$	$P = M D \oplus R C [11] \oplus K_{1}^{^{'}} \oplus K_{0}$

(1) >>> means right rotation through 64 bits. (2) >> means right shift. (3) α means reflection coefficient, and it is optional. The authors recommend α = 0xc0ac29b7c97c50dd in [4]. (4) MD means middle data. (5) S means SubCell. (6) M means MixColumn. (7) SR means shift row. (8) RC means round constant. (9)

S^{- 1}

means invSubCell. (10)

{S R}^{- 1}

means inverse shift row.

Table 2. Explanation table of attack position.

Attack Position (j)	Nibble Number (n)	Halfword Number (m)	Plaintext Bit
0	nibble[0:2]	halfword[0]	p[0:11]
1	nibble[1:3]	halfword[0]	p[4:15]
2	nibble[4:6]	halfword[1]	p[16:27]
3	nibble[5:7]	halfword[1]	p[20:31]
4	nibble[8:10]	halfword[2]	p[32:43]
5	nibble[9:11]	halfword[2]	p[36:47]
6	nibble[12:14]	halfword[3]	p[48:59]
7	nibble[13:15]	halfword[3]	p[52:63]

Table 3. Definition of S-box.

x	0	1	2	3	4	5	6	7	8	9	a	b	c	d	e	f
S[x]	b	f	3	2	a	c	9	1	6	7	8	0	e	5	d	4

Table 4. Gain table of HD of MixColumn in the first round and HD of S-box in the following rounds.

	Round 1	Round 2	Round 3	Round 4	Round 5
η
$H D_{M}$
$Δ$ 1	3.00	1.75	6.83	25.15	31.60
$Δ$ 2	1.67	1.74	6.86	15.25	16.01
$Δ$ 3	1.46	1.55	5.96	10.49	10.64
$Δ$ 4	1.30	1.44	5.48	7.96	7.99
$Δ$ 5	1.11	1.31	4.92	6.38	6.40
$Δ$ 6	1.02	1.21	4.45	5.33	5.33
$Δ$ 7	0.90	1.11	4.04	4.57	4.57
$Δ$ 8	0.79	1.03	3.66	4.00	4.00
$Δ$ 9	0.71	0.95	3.35	3.56	3.56
$Δ$ 10	0.60	0.88	3.06	3.20	3.20
$Δ$ 11	0.45	0.81	2.80	2.91	2.91
$Δ$ 12	0.33	0.75	2.59	2.66	2.67

H D_{S}^{r}

= η ∗

H D_{M}

.

Table 5. Experimental environment details.

Technologies of synthesis	SMIC 55 nm
Sample point	500 32-bit samples/trace
Sampling frequency	40 GSa/s
Synthesis Tool	syn-vL-2016.03-SP1
Functional simulation tool	vcs-mx-v0-2018.09-SP2
Power simulation tool	pt-vQ-2019.12-SP5
Data analytics tool	python 3.7

Table 6. Countermeasure evaluation and comparison.

		Cell/GE	Frequency/MHz	Throughput/Mbps	Random Numbers
Unprotected		5005	89	5696	No Need
Scheme 1	round 1	20223	87.6	5606	Need, 192 bits
	round 2	28,344	87.6	2803.2	Need, 2 × 192 bits
	round 3	40,906	100.8	2150.4	Need, 3 × 192 bits
	round 4	49,224	101.1	1617.6	Need, 4 × 192 bits
Scheme 2	round 1	9445	84.1	1794.1	No need
	round 2	13,589	100	1066.7	No need
	round 3	16,981	101	718.2	No need
	round 4	20,446	101	538.7	No need

Table 7. Comparison with other literature.

Literature	Power	Key	MTD of	Success Rate per Round
	Module	Space	First Round	Round 1	Round 2	Round 3	Round 4
[19]	4-bit HD	$2^{4} = 16$	-	16/16	-	-	-
[20]	4-bit HD	$2^{4} = 16$	-	16/16	-	-	-
[22]	4-bit HD	$2^{4} = 16$	-	16/16	-	-	-
[23]	4-bit HW/HD	$2^{4} = 16$	-	16/16	-	-	-
[25]	4-bit HD	$2^{4} = 16$	20/4-bit key	16/16	16/16	-	-
[26]	4-bit HD	$2^{4} = 16$	20/4-bit key	16/16	16/16	16/16	1/16
This article	12-bit HD	$2^{12} = 4096$	150/4-bit key	16/16	16/16	16/16	16/16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xue, J.; Jiang, X.; Li, P.; Xi, W.; Xu, C.; Huang, K. Side-Channel Attack of Lightweight Cryptography Based on MixColumn: Case Study of PRINCE. Electronics 2023, 12, 544. https://doi.org/10.3390/electronics12030544

AMA Style

Xue J, Jiang X, Li P, Xi W, Xu C, Huang K. Side-Channel Attack of Lightweight Cryptography Based on MixColumn: Case Study of PRINCE. Electronics. 2023; 12(3):544. https://doi.org/10.3390/electronics12030544

Chicago/Turabian Style

Xue, Jizheng, Xiaowen Jiang, Peng Li, Wei Xi, Changbao Xu, and Kai Huang. 2023. "Side-Channel Attack of Lightweight Cryptography Based on MixColumn: Case Study of PRINCE" Electronics 12, no. 3: 544. https://doi.org/10.3390/electronics12030544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Side-Channel Attack of Lightweight Cryptography Based on MixColumn: Case Study of PRINCE

Abstract

1. Introduction

2. Related Work

3. Side Channel Leakage on PRINCE

3.1. Leakage Model

3.2. Chosen-Input Attack

3.3. Countermeasures

4. Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI