Next Article in Journal
Adaptive Weighted High Frequency Iterative Algorithm for Fractional-Order Total Variation with Nonlocal Regularization for Image Reconstruction
Next Article in Special Issue
Versatile Voltage-Mode Biquadratic Filter and Quadrature Oscillator Using Four OTAs and Two Grounded Capacitors
Previous Article in Journal
Real-Time Analysis of Online Sources for Supporting Business Intelligence Illustrated with Bitcoin Investments and IoT Smart-Meter Sensors in Smart Cities
Previous Article in Special Issue
An Area-Efficient and Programmable 4 × 25-to-28.9 Gb/s Optical Receiver with DCOC in 0.13 µm SiGe BiCMOS
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Design of Compact SM4 Encryption and Decryption Circuits That Are Resistant to Bypass Attack

1
College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2
Science and Technology on Electronic Information Control Laboratory, Chengdu 610036, China
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(7), 1102; https://doi.org/10.3390/electronics9071102
Submission received: 18 May 2020 / Revised: 1 July 2020 / Accepted: 4 July 2020 / Published: 6 July 2020
(This article belongs to the Special Issue Advanced Integrated Circuits Technology)

Abstract

:
In order to achieve the purpose of defending against side channel attacks, a compact SM4 circuit was designed based on the mask and random delay technique, and the linear transformation module was designed with random insertion of the pseudo operation method. By analyzing the glitch data generated by the S-box of SM4 with different inputs, the security against glitch attacks was confirmed. Then, the DPA (Differential Power Analysis) was performed on the designed circuit. The key could not be successfully obtained even in the case of 100,000 power curves, so that the safety of SM4 against DPA is verified. Finally, using Synopsys DC (Design Compiler, Mountain View, CA94043DC, USA) to synthesize the designed circuit, the results show that the area of the designed circuit in the SMIC 0.18 process is 82,734 μm2, which is 48% smaller than results reported in other papers.

1. Introduction

The SM4 algorithm is a block symmetric cipher algorithm announced by Chinese National Cipher Management Committee Office in January 2006 and it has been widely used in various fields of information security in China, such as wireless local area network (WLAN), WLAN Authentication and Privacy Infrastructure (WAPI), storage device and the smart card system. As the SM4 algorithm is mostly used in high-speed and resource constrained applications, it is very necessary to design and implement the compact circuit of SM4. As a standard cipher algorithm, SM4 has been widely used in the field of information security for its short build time and low memory requirements [1,2,3]. However, Differential Power Analysis (DPA), which has been developed in recent years, has brought great challenges to the security of SM4 circuits. DPA is a typical SCA (side channel attack) method which performs a correlation analysis by collecting the power consumption of the operation. According to the correlation between sensitive information in the operation and the instantaneous power consumption of the CMOS circuit, DPA can quickly recover the key of SM4. It has the advantages of simple implementation, high efficiency, and short attack time. Therefore, it has posed a serious threat to the security of an integrated circuit. The goal of this work is to study compact SM4 circuits resistant to SCA for resource constrained applications.
Since Kocher proposed PA (Power Analysis, USA) technology in 1998, the research on PA and defense measures of cryptographic circuits have increased [4]. In recent years, ML (machine learning, USA) and PCA (principal component analysis) have also been applied to PA because of the large amount of data needed to be statistically analyzed [5,6]. Based on the principle of PA, the correlation between the power consumption of the circuit and the intermediate value of the cryptographic operation can be weakened or shielded, so that the attacker cannot recover the key through the power consumption information. The defense strategy of PA includes the masking method and randomization method.

1.1. Masking Method

Because the intermediate value of cryptographic algorithm operations has certain correlation with power consumption, the masking method has become the most commonly used defense method and is widely used in algorithms such as AES (Advanced Encryption Standard), and SM4 [1,7,8]. The principle of the masking method is to allow the key to be “hidden” by using a random or fixed mask value before the plaintext and key data are calculated. The intermediate value after the operation with the plaintext is a random amount. In this way, the power consumption related to the intermediate value is also random. An attacker cannot define a distinguishing function. It is difficult to collect useful power consumption curves. It is impossible to analyze the key based on the power consumption curve.
Akkar et al. [1,8,9] proposed an AES cryptographic circuit based on the random data masking method, but it does not really eliminate the vulnerability that can be attacked in the circuit because of the finite field inversion, which is the only non-linear part in its S-box that is not masked randomly. As a result, the circuit may be subjected to a differential Power Analysis or a high-order differential Power Analysis. A first-order masking-based countermeasure to defeat DPA and CPA (correlation power analysis) for SM4 was proposed in [7]. However, there was no area optimization of mask S-boxes in this work. There was also no analysis of the area resource in the experimental results. Tan Ruineng et al. [10] used the multiplication mask method to implement the mask S-box of SM4. Because the implementation of the multiplication mask is complicated and there are “zero value attack” defects, it cannot completely resist DPA attacks. In [11], for the first time, the author took the inverse part of the finite field as a whole, introduced a new quantity “∞”, and defined Inv(X) over GF(28) ∪ ∞, which is to replace “∞” with “0”. Using this method to re-mask implementation can resist DPA. Reference [12] implemented the S-box of AES using Boolean mask method, which successfully resisted DPA attacks, but there was a problem of excessive circuit resource overhead. Mangard et al. [13] proposed an attack method for AES circuits that have implemented mask defense, namely glitch attacks. By analyzing the number of glitches in the S-box of AES, the correlation between the intermediate value and power consumption was successfully cracked. Liang et al. [14] designed a masked S-box of SM4 using composite field masking method, which can resist first-order DPA attacks, but it also cannot resist glitch attacks. The paper [15] designed a SM4 cryptographic circuit with a full-mask round transform structure based on the random mask method, but the problem of excessive resource overhead still exists. Therefore, how to reduce the circuit resource is still a content that needs to be researched while ensuring the safety of the circuit.

1.2. Randomization Method

The randomization method aims to destroy the correlation between the median value and power consumption by increasing the redundant power consumption or random noise. Randomization methods usually include random pseudo operations inserting and random delay. Herbst et al. [12] designed an AES cryptographic circuit based on randomization method. Since the AES algorithm has a width of 128 bits and the execution order of the bytes is not related, the operation process can be randomized during the operation. Inserting pseudo-operations disturbs the power consumption information of the circuit and achieves the purpose of resisting power consumption attacks. Kocher et al. [16] adopted a random insertion delay method for the clock of the cryptographic circuit, which destroyed the fixedness of the median operation time point and prevented the attacker from finding the power consumption data position corresponding to the attack point. This randomization method has the advantages of small circuit area and simple implementation, and is commonly used for resisting power consumption attacks.
In general, the masking method is strong in security and easy to implement, but it has the disadvantage of being unable to resist glitch attacks. Although the randomization method has a weak ability to defend against attacks, its resource overhead is small. In order to achieve a circuit resistant to DPA for resource-constrained applications, this work will design and implement a compact SM4 encryption and decryption circuit based on masking and randomization method.
The rest of this manuscript is organized as follows: Section 2 introduces the overall structure design of SM4. Section 3 details the sub-module design of SM4. Section 4 analyzes the security of the proposed SM4 to resist glitch attacks and DPA. In Section 5, we study the synthesized results of the SM4 circuit. Section 6 summarizes the conclusions of this work.

2. Overall Structure Design

Because the attacker can use the power consumption information of the cryptographic circuit at any time to crack the original key of the circuit, it is not safe for the entire SM4 encryption and decryption circuit to take defensive measures against only some modules of the circuit. The security design should protect SM4 circuit during each operation of the algorithm. From [1], the attacked positions of SM4 are the input/output of the S-box, and the output of the linear transformation. Thus, the attackers should choose the input/output of the S-box as the position of Power Analysis. Reference [1,10] respectively selected the input and output of the S-box to attack the SM4 encryption circuit, and successfully obtained the key. This paper intends to use a masking method to protect the T-box. Because the composite field mask S-box is easy attacked by glitch attacks, this paper further proposes the random delay method to change the time delay of the input data of the composite field mask S-box. The linear transformation performs different cyclic shifts on the output of the S-box to achieve the purpose of diffusion. Therefore, the attack of this position, a byte distinguishing function method cannot be adopted, only a word (32 bit) attack method can be used to implement the attack. In this case, the attacker needs to try at least 232 data guesses, and the attack is very difficult. Therefore, we use other parallel structures in the circuit which do not need to work and insert random operations to generate randomized power consumption to disrupt the overall power of the circuit. Because the key expansion and round transformation module of the SM4 encryption and decryption circuit designed in this paper reuses the T-box part, there is no need to protect the key expansion separately. Using the above design ideas, the SM4 encryption and decryption circuit based on the random mask and randomization method is designed, as shown in Figure 1.
Figure 1 includes a mask T-box, a linear transformation L1, a linear transformation L2, and a random mask generation module. Among them, the mask T-box occupies more than 40% of the combined logic overhead of the total circuit, and is an important part of round transformation and key expansion in the SM4 encryption and decryption circuit. The composite field mask S-box is also an important design for the T-box module. Based on the above design ideas, the following section will design a compact SM4 encryption and decryption circuit based on the mask and randomization method.

3. Sub-Module Design

This section will study the design and optimization of the mask T-box, random linear transformation L1, random linear transformation L2, and random mask generation module based on random delay, mask, and random insertion pseudo-operation method.

3.1. Design of the Mask T-Box

The designed circuit structure of the mask T-box is shown in Figure 2, which is mainly divided into three parts: XOR operation module, random delay module and mask S-box. Among them, at the beginning of the T-box operation, the mask MI and the key rk or the fixed parameter CK are operated to mask the intermediate value result. Then the mask is input into the random delay module along with the round data D gained by the exclusive XOR operation to delay the data path. The delayed data enters the mask S-box for calculation. Finally, the mask S-box outputs the non-linearly transformed data S and updated mask MS.
Since the composite field mask S-box cannot resist glitch attacks, we use a composite field mask method based on random delay to improve the security of the S-box. From the principle of the glitch attack, it is known that the number of glitches in the circuit is related to the power consumption of the circuit. When different values are inputted in the S-box, the number of glitches in the S-box operation circuit is different. Thus, the attacker can establish the connection between the input value of the S-box and the power consumption. The key of the circuit was successfully attacked through a differential Power Analysis [13]. Therefore, as long as the correlation between the number of glitches and the input of the S-box is destroyed, the purpose of resisting glitch attacks can be achieved. Based on the above analysis, the structure of the designed random delay module is shown in Figure 3.
In Figure 3, each triangle represents a buffer unit, which is made up of four identical NOT gates connected in a series. The selector determines the delay of the data path by several units and outputs it according to the mask input of the random delay module. Because the round data D to S-box is 32-bit data, it needs 32 random delay modules as shown in Figure 3. Among them, since each selector requires a 3-bit control signal, the 32-bit mask MI can only be used to control 10 selectors. The glitch attack is an attack against a single S-box. Even if the delay between the S-box is the same, the security of the S-box against the glitch attack will not be affected. We use the lower 24 bits of the mask MI to control the outputs of the 32 selectors of the data path.
The most complicated part of the T-box is the mask S-box. Based on the composite field mask method, the mask S-box module structure is shown in Figure 4. Because the round data to S-box is 32-bit data, it needs 4 mask S-box modules, as shown in Figure 4.
In Figure 4, MA indicates 8-bit mask input, A indicates 8-bit input data, and E indicates 8-bit output data. After the mask affine transformation module, mask field mapping module, composite field mask inversion module, and mask composite affine transformation module of the SM4 S-box, the masks are MB, MC, MD, and ME, respectively. In the case of determining prime polynomials over GF((24)2) and GF(24), the mask design and optimization of each module in Figure 4 will be studied next.

3.1.1. Optimization Design of Mask Affine Transformation Module

Suppose A is the input of the mask affine transformation module, B is the output of the mask affine transformation module, and MA and MB are the mask input and output of the mask affine transformation module, respectively. B and MB can be expressed in the forms shown in Equation (1).
{ B = T A + V = T ( M A + X ) + V M B = T M A
where X is the input data without mask operation.
T = [ 1 1 0 1 0 0 1 1 1 1 1 0 1 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 0 1 1 1 ] ,   V = [ 1 1 0 1 0 0 1 1 ]
The DACSE (delay-aware common sub-expression elimination) optimization method is used to substitute Equation (2) into Equation (1), and the circuit logic expressions of the mask output and output data of the optimized mask affine transformation module are shown in (3) and (4), respectively.
M B = {   m B 7 = ( m 1 m 0 ) z 7 m B 6 = ( m 3 z 0 ) z 4 m B 5 = ( m 5 m 2 ) z 7   m B 4 = z 2 z 6 m B 3 = z 5 z 6 m B 2 = z 3 z 8 m B 1 = ( m 3 z 5 ) z 2 m B 0 = z 4 z 8 , { z 0 = m 7 m 6 z 1 = m 7 m 2 z 2 = m 6 m 1 z 3 = m 4 m 3 z 4 = m 5 m 0 z 5 = m 2 m 0 , { z 6 = m 5 z 3 z 7 = m 4 z 0 z 8 = m 1 z 1
B = { b 7 = ( x 1 x 0 ) p 7 b 6 = ( x 3 p 0 ) p 4 b 5 = ( x 5 x 2 ) p 7 b 4 = p 2 p 6 b 3 = p 5 p 6 b 2 = p 3 p 8 b 1 = ( x 3 p 5 ) p 2   b 0 = p 4 p 8 , { p 0 = x 7 x 6 p 1 = x 7 x 2 p 2 = x 6 x 1 p 3 = x 4 x 3 p 4 = x 5 x 0 p 5 = x 2 x 0 p 6 = x 5 p 3 p 7 = x 4 p 0 p 8 = x 1 p 1
where the symbol ⊕ represents XOR operation, the symbol ⊙ represents XNOR operation.

3.1.2. Optimized Design of Mask Field Mapping Module

Assume that B is the input of the mask field mapping module, C is the output of the mask field mapping module, and MB and MC are the mask input and output of the mask field mapping module, respectively. According to (5), C can be expressed as the Equation (6).
{ B = M B + X C = M B = M ( M B + X ) M C = M M B
C = { c 7 = b 3 p 6 c 6 = p 3 p 0 c 5 = b 7 p 4 c 4 = p 4 c 3 = p 0 p 1   c 2 = p 6 c 1 = p 5 b 2 c 0 = b 0 p 5 , { p 0 = b 7 b 1 p 1 = b 6 b 5 p 2 = b 4 b 2 p 3 = b 2 b 3 , { p 4 = b 5 p 3 p 5 = b 1 p 1 p 6 = p 1 p 2
The mask field mapping module performs the same field mapping operation on the masked data B and the mask MB, so it can be directly implemented using two field mapping modules.

3.1.3. Optimized Design of Mask Composite Affine Transformation Module

Similar to the mask affine transformation module, suppose MD and ME are the mask input and output of the mask composite affine transformation module, respectively, and D and E are the input and output of the mask composite affine transformation module, respectively. The mask output ME and output data E of the mask composite affine transformation module can be deduced, as shown in the Equation (7)
M E = C M D E = C D + V
C = T M 1 = [ 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 0 1 0 0 0 1 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 ]
Substituting (8) into (7) and optimizing with the DACSE optimization method, the circuit logic expressions of the mask operation and data operation of the composite affine transformation module are shown in (9) and (10), respectively.
M E = { m E 7 = z 4 m 2 m E 6 = z 4 ( m 3 m 5 )   m E 5 = z 3 m 5 m E 4 = z 3   z 1 m E 3 = ( m 7 m 0 ) z 1   m E 2 = ( m 7 m 3 ) m E 1 = z 2 m E 0 = z 4 , { z 0 = m 6 m 1 z 1 = m 6 m 5 z 2 = m 4 m 0 z 3 = m 2 m 4 , { z 4 = z 0 z 2
Z = { z 7 = p 4 n 2 z 6 = p 4 ( n 3 n 5 ) z 5 = p 3 n 5 z 4 = p 3   p 1 z 3 = ( n 7 n 0 ) p 1 z 2 = ( n 7 n 3 ) z 1 = p 2 z 0 = p 4 , { p 0 = n 6 n 1 p 1 = n 6 n 5 p 2 = n 4 n 0 p 3 = n 2 n 4 p 4 = p 0 p 2

3.1.4. Optimal Design of the Inverse Operation of the Mask GF((24)2)

The constituent modules of the GF((24)2) inversion circuit are all operations over GF((24)2). The four mask operations over GF(24) are designed in detail below.
(a)
Mask GF(24) Add Operation
Since the mask operation is consistent with the masked data operation, only two GF(24) add operations can be used to achieve the masked GF(24) add operation. The output data and mask of the masked GF(24) add operation are shown in (11) and (12), respectively.
Z = X + Y = { z 3 = x 3 y 3 z 2 = x 2 y 2 z 1 = x 1 y 1 z 0 = x 0 y 0 , X = ( x 3 , x 2 , x 1 , x 0 ) , Y = ( y 3 , y 2 , y 1 , y 0 ) , Z = ( z 3 , z 2 , z 1 , z 0 ) .
M Z = M X + M Y = { m Z 3 = m X 3 m Y 3 m Z 2 = m X 2 m Y 2 m Z 1 = m X 1 m Y 1 m Z 0 = m X 0 m Y 0 , M X = ( m X 3 , m X 2 , m X 1 , m X 0 ) , M Y = ( m Y 3 , m Y 2 , m Y 1 , m Y 0 ) , M Z = ( m Z 3 , m Z 2 , m Z 1 , m Z 0 ) .
Among them, X and Y represent the input data of the mask GF(24) add operation, and MX and MY represent the input masks of the mask GF(24) add operation. Z and MZ represent the output data and output mask of the masked GF(24) add operation.
(b)
Mask GF(24) Multiplication
Suppose that X’ and Y’ represent the input data of the mask GF(24) multiplication operation after demasking, and Z’ is the output data of the mask GF(24) multiplication operation after demasking, the Equation (13) can be obtained.
Z = X Y = ( X + M X ) ( Y + M Y ) = X Y + X M Y + Y M X + M X M Y
If the output mask of the module is MZ and the output data is Z, M Z = Z Z . In order to avoid exposing the intermediate results with direct operation of the input data and the mask, the mask operation equation and data operation equation of the mask GF(24) multiplication operation are obtained, as shown in (14).
{ Z = X ( ( Y + M X ) + M Y ) M Z = M X ( ( X + M Y ) + Y )
(c)
Mask GF(24) Squared Constant Operation
Suppose that X and X’ represent the masked and non-mask input data of the masked GF(24) squared constant operation, MX is the input mask. The calculation of Z’, which is the output data of Mask GF(24) squared constant operation, is shown in (15).
Z = X 2 × k = ( X + M X ) 2 × k = X 2 × k + M X 2 × k
It can be known from Equation (15) that this operation is composed of two GF(24) squared constant operations. When a constant matrix is used, the GF(24) squared constant operation has no area overhead, so the mask GF(24) squared constant operation also requires no resource overhead.
(d)
Mask GF(24) Inversion Operation
Suppose that X and X’ represent the masked and non-mask input data of the masked GF(24) inversion operation, MX is the input mask. The calculation of Z’, which is the output data of Mask GF(24) inversion operation, is shown in (16).
Z = X 14 = ( X 2 + M X 2 ) ( X 2 + M X 2 ) 2 ( ( X 2 + M X 2 ) 2 ) 2 = ( X 2 + M X 2 ) ( X 4 + M X 4 ) ( X 8 + M X 8 ) = X 14 + X 12 M X 2 + X 10 M X 4 + X 8 M X 6 + X 6 M X 8 + X 4 M X 10 + X 2 M X 12 + M X 14
If the operation is directly implemented in the manner of Equation (16), multiple GF(24) multiplication and addition modules need to be consumed. This not only causes a huge area overhead, but also increases the critical path delay of the circuit. If the mask output of the module is MZ and the data output is Z, then exists. After analysis, in order to avoid the direct operation of the input data and input mask to expose the intermediate results, the mask operation formula and data operation formula of the inversion operation of the mask GF(24) are obtained, as shown in Equation (17).
{ Z = X 8 Q M Z = M X 8 Q , Q = ( X 6 + X 4 M X 2 + X 2 M X 4 + M X 6 )
Since the expression in (17) still requires multiple GF(24) multiplication operations, the expression can be simplified and set to obtain the simplified expression, as shown in (18).
Q = { q 3 = a 2 ( m 1 + m 3 ) + ( a 1 + m 3 ) ( m 2 + m 3 ) + a 3 ( m 2 + m 1 ) + m 1 ( a 0 + m 3 )      + a 1 ( a 2 + a 0 ) + m 1 ( m 2 + m 0 ) + a 3 ( a 1 + a 2 ) + ( a 3 + a 1 m 0 ) q 2 = a 2 ( m 0 + m 3 ) + a 3 ( m 2 + m 1 ) + a 0 ( m 2 + m 3 ) + ( a 1 m 3 + a 3 m 0 )      + a 2 ( a 2 + a 0 ) + ( a 3 a 1 + m 1 m 3 ) + a 3 ( a 2 + a 0 ) + ( m 2 + m 3 ) ( m 2 + m 0 ) q 1 = a 0 ( m 2 + m 3 ) + ( a 0 m 1 + a 2 m 0 ) + a 1 ( m 0 + m 3 ) + a 3 ( m 0 + m 1 )      + a 1 ( a 3 + a 1 ) + a 0 ( a 1 + a 2 ) + ( a 3 a 0 + m 0 m 2 ) + ( m 0 + m 1 ) ( m 1 + m 3 ) q 0 = a 0 ( m 2 + m 1 ) + ( a 1 + m 0 ) ( m 2 + m 0 ) + a 2 ( m 1 + m 3 ) + ( a 2 m 0 + a 3 m 2 )      + a 2 ( a 3 + a 1 ) + a 0 ( a 1 + a 2 ) + m 2 ( m 1 + m 3 ) + ( a 0 + m 1 m 0 )
Let T = X 8 , M T = M X 8 , we can get the optimized Q output, as shown in Equation (19).
{ Q = { q 3 = F a m m 213 X a m 13 X m 23 F a m m 312 ( A a m 01 A m 13 )      a 1 X a 02 m 1 X m 02 a 3 X N a 12 a 1 m 0 q 2 = a 2 X m 03 F a m m 312 F a m m 023 ( a 1 m 3 a 3 m 0 )      X a 23 X a 02 ( a 1 a 3 A m 13 ) X m 23 X m 02 q 1 = F a m m 023 ( A a m 01 A a m 20 ) a 1 X m 03 a 3 X m 01      a 1 X a 13 a 0 X a 12 ( a 0 a 3 m 0 m 2 ) X m 01 X m 13 q 0 = a 0 X m 12 X a m 10 X m 02 F a m m 213 ( A a m 20 a 3 m 2 )      a 2 X a 13 a 0 X N a 12 m 2 X m 13 m 0 m 1 T = X 8 = { t 3 = X a 13 t 2 = a 3 t 1 = X a 23 t 0 = X a 03 , M T = M X 8 = { m T 3 = X m 13 m T 2 = m 3 m T 1 = X m 23 m T 0 = X m 03 , { { X m 13 = m 1 m 3 X m 23 = m 2 m 3 X m 03 = m 0 m 3 X m 12 = m 1 m 2 X m 02 = m 0 m 2 X m 01 = m 0 m 1 { X a 13 = a 1 a 3 X a 03 = a 0 a 3 X a 23 = a 2 a 3 X a 02 = a 0 a 2 X N a 12 = a 1 a 2 X a 12 = a 1 a 2 , { X a m 13 = a 1 m 3 X a m 03 = a 0 m 3 X a m 10 = a 1 m 0 A a m 01 = a 0 m 1 A a m 20 = a 2 m 0 A m 13 = m 1 m 3 { F a m m 213 = a 2 X m 13 F a m m 023 = a 0 X m 23 F a m m 312 = a 3 X m 12
It can be known from Equation (19) that the mask GF(24) multiplication operation requires two GF(24) multiplication operations in addition to the above operations.
The mask GF((24)2) inversion circuit mainly includes two mask GF(24) adds, three mask GF(24) multiplications, one mask GF(24) squared constant and one mask GF(24) inversion. Based on the SMIC 0.18 μm process, the comparison before and after optimization of the mask GF((24)2) inversion operation circuit is shown in Table 1.
After optimized design, the area overhead of the mask GF((24)2) inversion circuit implemented in this section is 231AXOR + 159AAND + 1AXNOR, which is 8298.81 μm2, and its critical path delay is 19TXOR + 4TAND. Before optimization, 414XOR gates and 322 AND gates are required, with an area of 15,302.36 μm2, and the critical path delay is 21TXOR + 4TAND. Compared with the situation before optimization, the critical path delay of the circuit is reduced by 8.3%, and the circuit area is reduced by about 45.83%.
Based on the above analysis and the SMIC 0.18 μm process, the mask S-box implemented in this section is synthesized. The comparison between before and after the optimization is shown in Table 2.
The SM4 S-box based on composite field mask implemented in this paper consumes 322 AND gates and 580 XOR gates before optimization. The area is 19,719.62 μm2 and the critical path delay is 30TXOR + 4TAND. After optimization, the critical path delay is reduced by 12.1%, and the circuit area is 7AXNOR + 159AAND + 322AXOR, which is equal to 10,870.98 μm2, which is a reduction of 44.87%.

3.2. Design of Random Linear Transformation

According to the analysis above, the attack point of the SM4 cryptographic circuit is generally input or output of the S-box. Since the linear transformation part needs to be in the form of a word attack, the attack is difficult. Therefore, we use the method of randomly inserting pseudo operations to defend against power consumption attacks on the module. Since the round transformation and key expansion are performed serially, the linear transformation L2 of the round transformation is idle when the key expansion part of the calculation is performed. When the round transformation module is performed, the key expansion linear transformation L1 is also idle. Considering the idle modules, the structure of the designed random linear transformation module is shown in Figure 5.
C1 and C2 come from the random mask generation module. When the round transformation operation is being performed, C1 is 0 and C2 is 32′hffffffff. At the same time, the linear transformation with mask data is performed in the random linear transformation L1, which disturbs the power consumption generated by the random linear transformation L2 of the round transformation being performed. Similarly, when the key expansion operation is performed to the random linear transformation operation, the random linear transformation module of the round transformation will also generate corresponding random power consumption, thereby increasing the difficulty of power consumption attack.

3.3. Design of Random Mask Generation Module

The random mask generation module is used to generate the random mask required by the mask T-box, the operands C1 and C2 required by the mask linear transformation module. Therefore, the random mask generation module designed in this section is shown in Figure 6.
Because LFSR (linear feedback shift register) is a common method for generating pseudo-random numbers, a 32-bit LFSR is used in the random mask generation module. Figure 6 contains the ring oscillator operation unit, metastable processing unit, 32-bit LFSR, and selection unit. Among them, the loop oscillator operation unit is used to generate random numbers based on circuit characteristics. The metastable processing module is made up of two D flip-flops connected in series. It can synchronize the data generated by the loop oscillator operation unit to the clock field of the SM4 encryption and decryption circuit and eliminate the metastable state. LFSR relies on the input single-bit random data based on the characteristics of the circuit, so that the output random number tends to be true random in order to ensure the security of the mask output. When the Key_valid signal is 0, the C1 output mask is selected and C2 to output 0; otherwise, C1 outputs 0 and C2 outputs the mask.

4. Security Analysis

This section first performs a security analysis on the random delay mask S-box to verify its ability to resist glitch attacks, and then verifies the anti-DPA attack performance of the compact SM4 encryption and decryption circuit designed in this paper.

4.1. Security Analysis of Random Delay Mask S-box

The principle of the glitch attack is to attack the key based on the correlation between the number of glitches in the S-box operation and the input of the S-box. Therefore, as long as the number of glitches in the S-box operation is random, the glitch attack can be resisted. Table 3 lists the relationship between the input value of the random delay mask S-box and the number of glitches in the case of some mask inputs.
In Table 3, SboxIn represents the input of the random delay mask S-box. MaskIn is used to control the random delay module to generate different delays in different data paths. It can be seen from Table 3 that, under different MaskIn, for the same SboxIn, the number of glitches generated by the S-box operation is different and tends to be random. The random delay mask S-box can effectively resist glitch attacks.

4.2. Security Analysis of SM4 Encryption and Decryption Circuit

To better evaluate the security of SM4 encryption and decryption circuit, there are two sets of tests performed on FPGA and ASIC, separately.
Firstly, the circuit designed in this paper is implemented on FPGA. The differential Power Analysis platform designed in [17] is used for data collection and Power Analysis. During the attack, the middle value of the selected attack is the high 8-bit output of the first round of the byte replacement operation, and the corresponding round key rk0 is 32′h15263748. After analyzing the collected 100,000 power consumption curves through Matlab software, the output Power Analysis results are shown in Figure 7.
At the time of the intermediate value calculation output, the key Kguess corresponding to the DPA curve with the highest peak value is 8′hf7, and the guess key is wrong. Therefore, in the case of collecting 100,000 power consumption curves, the method of using a differential Power Analysis cannot crack the key of the circuit.
Secondly, Synopsys VCS, DC, Prime Time-PX and the other EDA software are used to simulate the operation process of the cryptographic circuit, and calculate the corresponding power consumption data according to the turnover rate of the cryptographic circuit in the simulation process. We take the low 8-bit output of the second round of byte replacement operation as an example. The second round key rk1 is 32′h2937ac24. After analyzing the collected 100,000 power consumption curves, the output Power Analysis results are shown in Figure 8.
At the time of the intermediate value calculation output, the key Kguess corresponding to the DPA curve with the highest peak value is 8′h43, and the guess key is also wrong. Therefore, it is proved again that the random mask and randomization scheme proposed in this paper ensured the security of the compact SM4 encryption and decryption circuit against DPA attacks.

5. Synthesize Results

Based on the Xilinx Zynq-7000 (XC7Z020CLG484) FPGA platform, the circuit designed in this paper is synthesized in the software of Vivado 2017.4, and then the circuit is implemented after adding constraints. Table 4 shows the resource consumption and performance evaluation of the circuit designed in this paper on FPGA. The SM4 encryption and decryption circuit based on random mask and randomization method designed in this paper achieved a throughput of 99.56 Mbps with a resource overhead of 968 LUTs and 536 FFs.
In addition, tools such as Synopsys DC and Prime Time-PX were used to synthesize and sequence the compact SM4 encryption and decryption circuit designed in this paper. Under the SMIC 0.18 μm process, the circuit area consumption was 82,734 μm2. Considering that the area of the same circuit synthesized in different SIMC processes is different, we take NAND gate (9.9792 μm2) as the standard gate and convert the circuit area to the number of NAND gates. The number of NAND gates in this circuit is 8290, and the critical path delay is 8.93 ns. A comparison of the characteristic parameters and resource overhead of the circuits designed in this paper with other circuits is shown in Table 5.
As can be seen from the table, the SM4 encryption circuit designed in this paper has better throughput/gate parameters. At the same time, the circuit in the paper [1] cannot resist glitch attacks, and only supports SM4 encryption operations, so it is not as good as the circuit designed in this paper in terms of security and practicability.

6. Conclusions

This paper focuses on the compact implementation of SM4 encryption and decryption circuits that are resistant to bypass attacks. In view of the inability to resist differential Power Analysis, a SM4 encryption and decryption circuit based on mask and randomization method is proposed. A mask S-box is designed using a composite field masking technique, so that the composite field inverse operation in the mask S-box can be truly masked. The random delay method is used to control the delay of each bit in the input signal to resist glitch attacks. The random linear transformation module is implemented by using a random insertion pseudo operation, which increases the difficulty of DPA to this module. Next, the security of the SM4 S-box against glitch attack is analyzed, and two bypass attack verifications of the designed circuit are performed using Power Analysis platform based on FPGA and ASIC. The attack cannot be successful with 100,000 curves. Finally, based on the SMIC 0.18 μm process, Synopsys DC are used to synthesize the design circuit. The area consumption is 82,734 μm2, which is 48% smaller than other papers. The compact SM4 encryption and decryption circuit based on the inverse operation comparison mechanism implemented in this paper has lower circuit resource overhead and higher security, and is a better implementation solution.

Author Contributions

Conceptualization and Data curation, F.Z.; Resources, B.Z.; Supervision, N.W.; Writing-original draft, F.Z.; Writing-review and editing, X.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Fundamental Research Funds for Central Universities (No. NS2017023 and No. NP2019102), Aeronautical Science Foundation of China (No. 201943052001), Project of Science and Technology on Electronic Information Control Laboratory.

Acknowledgments

The authors would like to thank Jinbao Zhang for his beneficial suggestions and comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chao, P.E.I. A Method of masking SM4 and analysis against DPA attacks. J. Cryptol. Res. 2016, 3, 79–90. [Google Scholar]
  2. Di, W.; Wu, L.; Zhang, X. Key-leakage hardware Trojan with super concealment based on the fault injection for block cipher of SM4. Electron. Lett. 2018, 54, 810–812. [Google Scholar]
  3. Niu, Y.; Jiang, A. The low power design of SM4 cipher with resistance to differential power analysis. In Proceedings of the International Symposium on Quality Electronic Design, Santa Clara, CA, USA, 2–4 March 2015. [Google Scholar]
  4. Yuanman, T. Research on Key Techniques of Design and Implementation of Power Analysis. Ph.D. Thesis, National University of Defense Technology, Changsha, China, 2008. [Google Scholar]
  5. Feng, H.; Lin, W.; Shang, W.; Cao, J.; Huang, W. MLP and CNN-based Classification of Points of Interest in Side-channel Attacks. Int. J. Networked Distrib. Comput. 2020, 8, 108–117. [Google Scholar] [CrossRef]
  6. Wang, Z.; Zhang, W.; Ma, P.; Wang, X.A. Power consumption attack based on improved principal component analysis. In Proceedings of the 2019 International Conference on Broadband and Wireless Computing, Communication and Applications, Antwerp, Belgium, 7–9 November 2019. [Google Scholar]
  7. Bae, D.; Nam, S.; Ha, J. Side channel attack on block cipher SM4 and analysis of masking-based countermeasure. J. Korea Inst. Inf. Secur. Cryptol. 2020, 30, 39–49. [Google Scholar]
  8. Liling, D. The Optimization and Research for AES Cipher Chips with Power Attack Resistance. Master’s Thesis, Nanjing University of Aeronautics and Astronautics, Nanjing, China, 2016. [Google Scholar]
  9. Akkar, M.-L.; Giraud, C. An implementation of DES and AES, secure against some attacks. In Proceedings of the Cryptographic Hardware and Embedded Systems (CHES 2001), Nara, Japan, 28 September–1 October 2001. [Google Scholar]
  10. Ruineng, T.; Yuanyuan, L.; Jiaoling, T. SM4 multi-path multiplicative masking method against side-channel attack. Comput. Eng. 2014, 40, 103–108. [Google Scholar]
  11. Courtois, N.T.; Goubin, L. An algebraic masking method to Prltect AES against power attacks. In Proceedings of the ICISC, Seoul, Korea, 1–2 December 2005. [Google Scholar]
  12. Herbst, C.; Oswald, E.; Mangard, S. An AES smart card implementation resistant to power analysis attacks. In Proceedings of the ACNS, Singapore, 6–9 June 2006. [Google Scholar]
  13. Mangard, S.; Pramstaller, N.; Oswald, E. Successfully attacking masked AES hardware implementations. In Proceedings of the CHES 2005, Edinburgh, UK, 29 August–1 September 2005. [Google Scholar]
  14. Liang, H.; Wu, L.; Zhang, X.; Wang, J. Design of a masked S-box for SM4 Based on composite field. In Proceedings of the 2014 Tenth International Conference on Computational Intelligence and Security, Kunming, China, 15–16 November 2014. [Google Scholar]
  15. Jing, L. Side-Channel Analysis and Implementation of FPGA based Cryptographic Algorithms. Master’s Thesis, Hunan University, Changsha, China, 2011. [Google Scholar]
  16. Kocher, P.; Jaffe, J. Using Unpredictable Information to Minimize Leakage from Smartcards and Other Cryptosystems. U.S. Patent 6,327,661, 4 December 2001. [Google Scholar]
  17. Zhang, Y.; Wu, N.; Zhou, F.; Zhang, J.; Yahya, M.R. A Countermeasure against DPA on SIMON with an Area-Efficient Structure. Electronics 2019, 8, 240. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Schematic diagram of SM4 module resistant to Power Analysis.
Figure 1. Schematic diagram of SM4 module resistant to Power Analysis.
Electronics 09 01102 g001
Figure 2. Mask T-box circuit structure diagram.
Figure 2. Mask T-box circuit structure diagram.
Electronics 09 01102 g002
Figure 3. Random delay module diagram.
Figure 3. Random delay module diagram.
Electronics 09 01102 g003
Figure 4. Composite Field mask S-box circuit structure diagram.
Figure 4. Composite Field mask S-box circuit structure diagram.
Electronics 09 01102 g004
Figure 5. Random linear transformation operation module circuit structure diagram. (a) shows the random linear transformation L1 of the key extension, and (b) shows the random linear transformation L2 of the round transformation.
Figure 5. Random linear transformation operation module circuit structure diagram. (a) shows the random linear transformation L1 of the key extension, and (b) shows the random linear transformation L2 of the round transformation.
Electronics 09 01102 g005
Figure 6. Circuit diagram of random mask generation module.
Figure 6. Circuit diagram of random mask generation module.
Electronics 09 01102 g006
Figure 7. Differential Power Analysis (DPA) curve of the high 8-bit output of the first round of transformation byte replacement operation.
Figure 7. Differential Power Analysis (DPA) curve of the high 8-bit output of the first round of transformation byte replacement operation.
Electronics 09 01102 g007
Figure 8. DPA curve of the low 8-bit output of the second round of transformation byte replacement operation.
Figure 8. DPA curve of the low 8-bit output of the second round of transformation byte replacement operation.
Electronics 09 01102 g008
Table 1. Comparison of area and delay before and after optimization of mask GF((24)2) inversion.
Table 1. Comparison of area and delay before and after optimization of mask GF((24)2) inversion.
-ANDXORXNORValue
Mask addition *2Before optimizationArea-8-212.88 μm2
Delay-1-0.21 ns
After optimizationArea-8-212.88 μm2
Delay-1-0.21 ns
Mask multiplication *3Before optimizationArea6468-2661.32 μm2
Delay15-1.21 ns
After optimizationArea3246-1649.98 μm2
Delay15-1.21 ns
Mask squared constantBefore optimizationArea5042-1783.12 μm2
Delay13-0.79 ns
After optimizationArea---0
Delay---0
Mask inversionBefore optimizationArea80152-5109.52 μm2
Delay29-2.21 ns
After optimizationArea637712914.11 μm2
Delay2701.79 ns
TotalBefore optimizationArea322414-15,302.36 μm2
Delay421-5.05 ns
After optimizationArea15923118289.81 μm2
Delay41904.63 ns
Table 2. Comparison of the area and delay of each module before and after the mask S-box optimization.
Table 2. Comparison of the area and delay of each module before and after the mask S-box optimization.
ModuleANDXORXNORTotal
Mask Affine TransformationBefore optimizationArea-69-1836.09 μm2
Delay-3-0.63 ns
After optimizationArea-3941144.23 μm2
Delay-300.63 ns
Mask field mappingBefore optimizationArea-48-1277.28 μm2
Delay-3-0.63 ns
After optimizationArea-28-745.08 μm2
Delay-2-0.42 ns
Mask composite field inversionBefore optimizationArea322414-15302.36 μm2
Delay421-5.05 ns
After optimizationArea15923118289.81 μm2
Delay41904.63 ns
Mask Composite Affine TransformationBefore optimizationArea-49-1303.89 μm2
Delay-3-0.63 ns
After optimizationArea-242691.86 μm2
Delay-300.63 ns
TotalBefore optimizationArea322580-19719.62 μm2
Delay430-6.94 ns
After optimizationArea159322710870.98 μm2
Delay42706.10 ns
Table 3. Relation table between random delay mask S-box input value and the number of glitches.
Table 3. Relation table between random delay mask S-box input value and the number of glitches.
SboxIn0xff0xaa0x440x56
MaskIn
0xedcba935333026
0xff24dc33393132
0x15689429322131
0xffffff22242629
0x00245634292326
0x380cd942322848
0x6ad0c331242829
Table 4. Resource overhead and performance evaluation on FPGA.
Table 4. Resource overhead and performance evaluation on FPGA.
TypeNameValue
Resource overheadLUT (Look up table)968/53,200 (1.82%)
FF (Register)536/106,400 (0.38%)
Performance evaluationMaximum clock frequency60.67 MHz
Critical path delay16.482 ns
Throughput99.56 MHz/s
Table 5. Comparison of performance parameters and resource overhead of ASIC implementation.
Table 5. Comparison of performance parameters and resource overhead of ASIC implementation.
ParameterThis Article[1][10]
ProcessSMIC 0.18 μmSMIC 0.13 μmSMIC 0.18 μm
Clock frequency (MHz)110.15050
Area (μm2)82,734--
Equivalent gate (gate)829016,00036,000
Throughput (Mb/s)207200200
Throughput/gate (Mb·s-1·kgate-1)24.9712.55.56
Attack cost (number of curves)100,000+500,000+500,000+
Circuit characteristicsResistant to DPA (including Glitch Attack), support encryption and decryptionResistant to DPA, only support encryptionResists DPA, only supports encryption

Share and Cite

MDPI and ACS Style

Zhou, F.; Zhang, B.; Wu, N.; Bu, X. The Design of Compact SM4 Encryption and Decryption Circuits That Are Resistant to Bypass Attack. Electronics 2020, 9, 1102. https://doi.org/10.3390/electronics9071102

AMA Style

Zhou F, Zhang B, Wu N, Bu X. The Design of Compact SM4 Encryption and Decryption Circuits That Are Resistant to Bypass Attack. Electronics. 2020; 9(7):1102. https://doi.org/10.3390/electronics9071102

Chicago/Turabian Style

Zhou, Fang, Benjun Zhang, Ning Wu, and Xiangli Bu. 2020. "The Design of Compact SM4 Encryption and Decryption Circuits That Are Resistant to Bypass Attack" Electronics 9, no. 7: 1102. https://doi.org/10.3390/electronics9071102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop