Author Contributions
Conceptualization, R.C. and B.L.; methodology, R.C. and B.L.; software, R.C.; validation, R.C.; formal analysis, R.C.; investigation, R.C.; resources, R.C.; data curation, R.C.; writing—original draft preparation, R.C.; writing—review and editing, R.C.; visualization, R.C.; supervision, B.L.; project administration, R.C.; funding acquisition, R.C. All authors have read and agreed to the published version of the manuscript.
Figure 1.
The operation flow of SM4 encryption algorithm.
Figure 1.
The operation flow of SM4 encryption algorithm.
Figure 2.
The operation flow of SM4 key expansion algorithm.
Figure 2.
The operation flow of SM4 key expansion algorithm.
Figure 3.
The operation flow of CCM algorithm.
Figure 3.
The operation flow of CCM algorithm.
Figure 4.
The proposed design-space-exploration method.
Figure 4.
The proposed design-space-exploration method.
Figure 5.
Candidate hardware architectures for NLT (NLT1 (a); NLT2 (b); NLT4 (c)).
Figure 5.
Candidate hardware architectures for NLT (NLT1 (a); NLT2 (b); NLT4 (c)).
Figure 6.
Candidate hardware architectures for the RF (shared RF for encryption and key expansion (a); RF for encryption only (b); RF for key expansion only (c)).
Figure 6.
Candidate hardware architectures for the RF (shared RF for encryption and key expansion (a); RF for encryption only (b); RF for key expansion only (c)).
Figure 7.
Candidate architectures for the SM4 layer (OffKS (a); OnKP (b); OnKS (c)).
Figure 7.
Candidate architectures for the SM4 layer (OffKS (a); OnKP (b); OnKS (c)).
Figure 8.
Candidate architectures for SM4-CCM (S1SM4 (a); P2SM4 (b); GB2SM4 (c)).
Figure 8.
Candidate architectures for SM4-CCM (S1SM4 (a); P2SM4 (b); GB2SM4 (c)).
Figure 9.
The naming of design schemes (e.g., c0k0n0s0).
Figure 9.
The naming of design schemes (e.g., c0k0n0s0).
Figure 10.
Demonstration of our experimental flow.
Figure 10.
Demonstration of our experimental flow.
Figure 11.
Throughput comparison of 63 schemes from four perspectives, (a) comparison of design schemes with the same CCM candidate architecture, (b) comparison of design schemes with the same SM4 candidate architecture, (c) comparison of design schemes with the same NLT candidate architecture, and (d) comparison of design schemes with the same SBox candidate architecture (the higher the value, the better).
Figure 11.
Throughput comparison of 63 schemes from four perspectives, (a) comparison of design schemes with the same CCM candidate architecture, (b) comparison of design schemes with the same SM4 candidate architecture, (c) comparison of design schemes with the same NLT candidate architecture, and (d) comparison of design schemes with the same SBox candidate architecture (the higher the value, the better).
Figure 12.
Area (in terms of gate count) comparison of 63 schemes from four perspectives: (a) comparison of design schemes with the same CCM candidate architecture, (b) comparison of design schemes with the same SM4 candidate architecture, (c) comparison of design schemes with the same NLT candidate architecture, and (d) comparison of design schemes with the same SBox candidate architecture (the lower the value, the better).
Figure 12.
Area (in terms of gate count) comparison of 63 schemes from four perspectives: (a) comparison of design schemes with the same CCM candidate architecture, (b) comparison of design schemes with the same SM4 candidate architecture, (c) comparison of design schemes with the same NLT candidate architecture, and (d) comparison of design schemes with the same SBox candidate architecture (the lower the value, the better).
Figure 13.
Power comparison of 63 schemes from four perspectives: (a) comparison of design schemes with the same CCM candidate architecture, (b) comparison of design schemes with the same SM4 candidate architecture, (c) comparison of design schemes with the same NLT candidate architecture, and (d) comparison of design schemes with the same SBox candidate architecture (the lower the value, the better).
Figure 13.
Power comparison of 63 schemes from four perspectives: (a) comparison of design schemes with the same CCM candidate architecture, (b) comparison of design schemes with the same SM4 candidate architecture, (c) comparison of design schemes with the same NLT candidate architecture, and (d) comparison of design schemes with the same SBox candidate architecture (the lower the value, the better).
Figure 14.
Area efficiency of 63 schemes (the higher the value, the better).
Figure 14.
Area efficiency of 63 schemes (the higher the value, the better).
Figure 15.
Power efficiency of 63 schemes (the higher the value, the better).
Figure 15.
Power efficiency of 63 schemes (the higher the value, the better).
Figure 16.
APDP of 63 schemes (the lower the value, the better).
Figure 16.
APDP of 63 schemes (the lower the value, the better).
Figure 17.
Simplified hardware architecture of the optimal design scheme: (a) SM4-CCM with only one SM4 core (S1SM4); (b) SM4 with online key expansion (OnKP); (c) Round function for encryption only (ERF); (d) Round function for key expansion only (KRF); (e) Non-linear transform with 4 DSE Sbox (NLT4)).
Figure 17.
Simplified hardware architecture of the optimal design scheme: (a) SM4-CCM with only one SM4 core (S1SM4); (b) SM4 with online key expansion (OnKP); (c) Round function for encryption only (ERF); (d) Round function for key expansion only (KRF); (e) Non-linear transform with 4 DSE Sbox (NLT4)).
Figure 18.
Proof of concept of the SM4-CCM on a DE10-Standard FPGA Board: (a) Our physical environment: PC+DE10-Standard FPGA Board; (b) Test vectors from RFC 8998; (c) Block diagram of FPGA proof-of-concept; (d) Waveform captured by Logic Analyzer.
Figure 18.
Proof of concept of the SM4-CCM on a DE10-Standard FPGA Board: (a) Our physical environment: PC+DE10-Standard FPGA Board; (b) Test vectors from RFC 8998; (c) Block diagram of FPGA proof-of-concept; (d) Waveform captured by Logic Analyzer.
Table 1.
Notations of the SM4 algorithm and CCM algorithm.
Table 1.
Notations of the SM4 algorithm and CCM algorithm.
Notation | Meaning |
---|
Sbox | It is a substitution table with 256 bytes, the input byte is substituted with the byte from the table. |
Non-Linear Transform (NLT) | The input word (4-byte) is substituted with the bytes from the Sbox. |
Round Function (RF) | The SM4 encryption and key expansion algorithm consist of 32 rounds of iteration, the body of iteration is called round function. |
Key Expansion | It is a routine generating 32 32-bit round keys from a 128-bit master key. |
Inverse Transform | Change the order of input 4 words, . |
Additional Authenticated Data (AAD) | It is non-secret data for encryption/decryption to add an additional integrity and authenticity check on the encrypted data |
CBC-MAC Mode | It is an operation mode of block cipher, used to generate a message authentication code |
Counter Mode (CTR) | It is a counter-based operation mode of block cipher, used to encrypt data |
Table 2.
Names and descriptions of the candidate hardware architectures for each layer.
Table 2.
Names and descriptions of the candidate hardware architectures for each layer.
Layer | NCA | Description of Candidate Architectures |
---|
| LUT | Sbox based on look-up table (LUT) |
Sbox | CFA | Sbox based on composite field arithmetic (CFA) |
| DSE | Sbox based on the decoder–switch–encoder (DSE) architecture |
| NLT1 | Using 1 Sbox to implement NLT |
NLT | NLT2 | Using 2 Sbox to implement NLT |
| NLT4 | Using 4 Sbox to implement NLT |
| KRF | RF for key expansion only |
RF | ERF | RF for encryption only |
| KERF | RF for both key expansion and encryption |
| OffKS | Offline key expansion with only one shared RF (KERF) |
SM4 | OnKP | Online key expansion and using two separate RFs (KRF and ERF) |
| OnKS | Online key expansion, and using only one shared RF (KERF) |
| S1SM4 | Only one SM4, working in CBC-MAC mode and CTR mode by turns |
SM4-CCM | P2SM4 | Two SM4, one is working in CBC-MAC mode, and another is in CTR mode |
| GB2SM4 | One SM4 and one RF, SM4 works in CBC-MAC mode with offline KEYEXP and encryption, and the RF pretends to be an SM4 module working in CTR mode |
Table 3.
The best design schemes under various metrics and the corresponding candidate architectures of each layer.
Table 3.
The best design schemes under various metrics and the corresponding candidate architectures of each layer.
Metrics | The Best Scheme | Metric Value | CCM | SM4 | NLT | SBOX |
---|
Throughput (Mbps) | c1k1n2s{0,1,2} | 281.31 | P2SM4 | OnKP | NLT4 | - |
Area (Gate Count) | c0k2n0s1 | 10754 | S1SM4 | OnKS | NLT1 | CFA |
Power (mW) | c0k2n0s2 | 1.344 | S1SM4 | OnKS | NLT1 | DSE |
Area Efficiency | c0k1n2s1 | 16447 | S1SM4 | OnKP | NLT4 | CFA |
Power Efficiency | c0k1n2s2 | 123.07 | S1SM4 | OnKP | NLT4 | DSE |
APDP | c0k1n2s2 | | S1SM4 | OnKP | NLT4 | DSE |
Table 4.
Breakdown analysis of the optimal scheme c0k1n2s2.
Table 4.
Breakdown analysis of the optimal scheme c0k1n2s2.
Power Group | Internal Power | Switching Power | Leakage Power | Total Power |
---|
Clock Network | 1.067 mW | 0 | 0 | 1.067 mW |
Register | 102.7 W | 20.89 W | 13.84 W | 137.4 W |
Combinational | 199.5 W | 183.0 W | 37.98 W | 420.5 W |
In total | 1.369 mW | 203.9 W | 51.82 W | 1.625 mW |