Efficient Implementation of SPEEDY Block Cipher on Cortex-M3 and RISC-V Microcontrollers †
Abstract
:1. Introduction
1.1. Extended Version of ICISC’21
1.2. Contributions
2. SPEEDY Algorithm
2.1. The Round Function of SPEEDY
2.1.1. ShiftColumns
2.1.2. Mixcolumns
2.1.3. AddRoundKey
2.1.4. AddRoundConstant
3. Proposed Technique
3.1. SPEEDY on ARM Cortex–M3
3.2. SPEEDY on RISC-V(RV32I)
3.3. Bit-Slicing SPEEDY
3.4. SubBox
Algorithm 1 Bit-slicing implementations of S-box in ARMv6 assembly. | |
Input:X0-X5 (r4-r9), temporal register T (r14) | 28: ORN Y2, Y3, Y4 |
Output:Y0-Y5 (r1-r3, r10-r12) | 29: AND Y3, X0, X4 |
30: ORR Y4, X0, X3 | |
|
|
|
|
|
|
3.5. ShiftColumns
Algorithm 2 Bit-slicing implementations of ShiftColumns. |
Input:state[0-5] Output:state[0-5] 1:state[1] = (state[1] << 1) | (state[1] >> 31) 2:state[2] = (state[2] << 2) | (state[2] >> 30) 3:state[3] = (state[3] << 3) | (state[3] >> 29) 4:state[4] = (state[4] << 4) | (state[4] >> 28) 5:state[5] = (state[5] << 5) | (state[5] >> 27) |
Algorithm 3 Bit-slicing implementations of ShiftColumns in ARMv6 assembly. | |
Input:X0-X5 (r1-r3, r10-r12) | 3: MOV Y2, X2, ROR |
Output:Y0-Y5 (r4-r9) | 4: MOV Y3, X3, ROR |
1: MOV Y0, X0 | 5: MOV Y4, X4, ROR |
2: MOV Y1, X1, ROR | 6: MOV Y5, X5, ROR |
3.6. MixColumns
Algorithm 4 Bit-slicing implementations MixColumns in ARMv6 assembly. | |
Input:X0-X5 (r1-r3, r10-r12), | 18: EOR Y2, Y2, X2, ROR |
Y0-Y5 (r4-r9) | |
Output:Y0-Y5 (r4-r9) | 19: EOR Y3, Y3, X3, ROR |
20: EOR Y3, Y3, X3, ROR | |
1: EOR Y0, Y0, X0, ROR 2:EOR Y0, Y0, X0, ROR 3:EOR Y0, Y0, X0, ROR 4:EOR Y0, Y0, X0, ROR | 21: EOR Y3, Y3, X3, ROR 22: EOR Y3, Y3, X3, ROR 23:EOR Y3, Y3, X3, ROR 24: EOR Y3, Y3, X3, ROR |
5: EOR Y0, Y0, X0, ROR | |
6: EOR Y0, Y0, X0, ROR | 25: EOR Y4, Y4, X4, ROR |
26: EOR Y4, Y4, X4, ROR | |
7: EOR Y1, Y1, X1, ROR 8: EOR Y1, Y1, X1, ROR 9: EOR Y1, Y1, X1, ROR 10: EOR Y1, Y1, X1, ROR | 27: EOR Y4, Y4, X4, ROR 28:EOR Y4, Y4, X4, ROR 29: EOR Y4, Y4, X4, ROR 30: EOR Y4, Y4, X4, ROR |
11: EOR Y1, Y1, X1, ROR | |
12: EOR Y1, Y1, X1, ROR | 31: EOR Y5, Y5, X5, ROR |
32: EOR Y5, Y5, X5, ROR | |
13: EOR Y2, Y2, X2, ROR 14: EOR Y2, Y2, X2, ROR 15: EOR Y2, Y2, X2, ROR 16: EOR Y2, Y2, X2, ROR 17: EOR Y2, Y2, X2, ROR |
33: EOR Y5, Y5, X5, ROR 34: EOR Y5, Y5, X5, ROR 35: EOR Y5, Y5, X5, ROR 36: EOR Y5, Y5, X5, ROR |
3.7. AddRoundKey and AddRoundConstant
4. Evaluation
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Leander, G.; Moos, T.; Moradi, A.; Rasoolzadeh, S. The SPEEDY Family of Block Ciphers: Engineering an Ultra Low-Latency Cipher from Gate Level for Secure Processor Architectures. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 510–545. [Google Scholar] [CrossRef]
- Borghoff, J.; Canteaut, A.; Güneysu, T.; Kavun, E.; Knežević, M.; Knudsen, L.; Leander, G.; Nikov, V.; Paar, C.; Rechberger, C.; et al. PRINCE—A Low-Latency Block Cipher for Pervasive Computing Applications. In Proceedings of the ASIACRYPT, Beijing, China, 2–6 December 2012; pp. 208–225. [Google Scholar] [CrossRef] [Green Version]
- Bozilov, D.; Eichlseder, M.; Knežević, M.; Lambin, B.; Leander, G.; Moos, T.; Nikov, V.; Rasoolzadeh, S.; Todo, Y.; Wiemer, F. PRINCEv2: More Security for (Almost) No Overhead. IACR Cryptol. ePrint Arch. 2021, 483–511. [Google Scholar] [CrossRef]
- Beierle, C.; Jean, J.; Kölbl, S.; Leander, G.; Moradi, A.; Peyrin, T.; Sasaki, Y.; Sasdrich, P.; Sim, S.M. The SKINNY Family of Block Ciphers and Its Low-Latency Variant MANTIS. In Proceedings of the Advances in Cryptology—CRYPTO, Santa Barbara, CA, USA, 14–18 August 2016; Robshaw, M., Katz, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; pp. 123–153. [Google Scholar]
- Avanzi, R. The QARMA Block Cipher Family. Almost MDS Matrices Over Rings With Zero Divisors, Nearly Symmetric Even-Mansour Constructions With Non-Involutory Central Rounds, and Search Heuristics for Low-Latency S-Boxes. IACR Trans. Symmetric Cryptol. 2017, 2017, 4–44. [Google Scholar] [CrossRef]
- Papapagiannopoulos, K. High throughput in slices: The case of PRESENT, PRINCE and KATAN64 ciphers. In Proceedings of the International Workshop on Radio Frequency Identification: Security and Privacy Issues, Oxford, UK, 21–23 July 2014; Springer: Berlin/Heidelberg, Germany, 2015; pp. 137–155. [Google Scholar]
- Bao, Z.; Luo, P.; Lin, D. Bitsliced implementations of the PRINCE, LED and RECTANGLE block ciphers on AVR 8-bit microcontrollers. In Proceedings of the International Conference on Information and Communications Security, Beijing, China, 9–11 December 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 18–36. [Google Scholar]
- Reis, T.; Aranha, D.; López, J. PRESENT Runs Fast. In Proceedings of the 19th International Conference, Taipei, Taiwan, 25–28 September 2017; pp. 644–664. [Google Scholar] [CrossRef]
- Adomnicai, A.; Najm, Z.; Peyrin, T. Fixslicing: A New GIFT Representation: Fast Constant-Time Implementations of GIFT and GIFT-COFB on ARM Cortex-M. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020, 2020, 402–427. [Google Scholar] [CrossRef]
- Schwabe, P.; Stoffelen, K. All the AES You Need on Cortex-M3 and M4. In Proceedings of the 23rd International Conference, St. John’s, NL, Canada, 10–12 August 2017; pp. 180–194. [Google Scholar] [CrossRef] [Green Version]
- Kim, H.; Jang, K.; Song, G.; Sim, M.; Eum, S.; Kim, H.; Kwon, H.; Lee, W.K.; Seo, H. SPEEDY on Cortex–M3: Efficient Software Implementation of SPEEDY on ARM Cortex–M3. In Proceedings of the International Conference on Information Security and Cryptology, Seoul, Korea, 1–3 December 2021; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Bernstein, D.J. Cache-Timing Attacks on AES. 2005. Available online: http://cr.yp.to/antiforgery/cachetiming-20050414.pdf (accessed on 10 November 2022).
- Ge, Q.; Yarom, Y.; Cock, D.; Heiser, G. A survey of microarchitectural timing attacks and countermeasures on contemporary hardware. J. Cryptogr. Eng. 2018, 8, 1–27. [Google Scholar] [CrossRef]
- Bogdanov, A.; Eisenbarth, T.; Paar, C.; Wienecke, M. Differential Cache-Collision Timing Attacks on AES with Applications to Embedded CPUs. In Proceedings of the 10th Cryptographers’ Track at the RSA Conference 2010, San Francisco, CA, USA, 1–5 March 2010; Volume 5985, pp. 235–251. [Google Scholar] [CrossRef]
- Waterman, A.; Lee, Y.; Avizienis, R.; Cook, H.; Patterson, D.; Asanovic, K. The RISC-V instruction set. In Proceedings of the 2013 IEEE Hot Chips 25 Symposium (HCS), Stanford, CA, USA, 25–27 August 2013. [Google Scholar] [CrossRef]
- Asanovic, K.; Waterman, A. The RISC-V Instruction Set Manual. In Privileged Architecture, Document Version 20190608-Priv-MSU-Ratified (Vol. 2); RISC-V Foundation: Berkeley, CA, USA, 2019; Available online: https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf (accessed on 10 November 2022).
- Biham, E. A Fast New DES Implementation in Software. In Proceedings of the Fast Software Encryption, 4th International Workshop, FSE ’97, Haifa, Israel, 20–22 January 1997; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 1997; Volume 1267, pp. 260–272. [Google Scholar] [CrossRef] [Green Version]
- May, L.; Penna, L.; Clark, A. An Implementation of Bitsliced DES on the Pentium MMX. In Proceedings of the Australasian Conference on Information Security and Privacy, Brisbane, Australia, 10–12 July 2000; pp. 112–122. [Google Scholar]
- Adomnicai, A.; Peyrin, T. Fixslicing AES-like Ciphers: New bitsliced AES speed records on ARM-Cortex M and RISC-V. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020, 2021, 402–425. [Google Scholar] [CrossRef]
Intel 8th Core i7-8850H | Speed (cpb) |
---|---|
SPEEDY-7-192 encryption reference [9] | 2983 |
6 × 32 reference [9] | 1278 |
bitslice(our) | 852 |
⋯ | ⋯ | |||||||||
⋯ | ⋯ | |||||||||
⋯ | ⋯ | |||||||||
⋯ | ⋯ | |||||||||
⋯ | ⋯ | |||||||||
⋯ | ⋯ | |||||||||
⋯ | ⋯ |
ARM Cortex-M3 | Speed (cpb) | Block Size | Parallel Blocks |
---|---|---|---|
GIFT-128 encryption [9] | 104.1 | 128 | 1 |
AES-128 encryption [19] | 120.4 | 128 | 2 |
SPEEDY-7-192 encryption (reference) | 15,407 | 192 | 1 |
SPEEDY-5-192 encryption (ours) | 65.7 | 192 | 1 |
SPEEDY-6-192 encryption (ours) | 75.2 | 192 | 1 |
SPEEDY-7-192 encryption (ours) | 85.1 | 192 | 1 |
RISC-V | Speed (cpb) | Block Size | Parallel Blocks |
AES-128 encryption [19] | 78.9 | 128 | 8 |
SPEEDY-7-192 encryption (reference) | 18,096 | 192 | 1 |
SPEEDY-5-192 encryption (ours) | 81.9 | 192 | 1 |
SPEEDY-6-192 encryption (ours) | 95.5 | 192 | 1 |
SPEEDY-7-192 encryption (ours) | 109.2 | 192 | 1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, H.; Eum, S.; Sim, M.; Seo, H. Efficient Implementation of SPEEDY Block Cipher on Cortex-M3 and RISC-V Microcontrollers. Mathematics 2022, 10, 4236. https://doi.org/10.3390/math10224236
Kim H, Eum S, Sim M, Seo H. Efficient Implementation of SPEEDY Block Cipher on Cortex-M3 and RISC-V Microcontrollers. Mathematics. 2022; 10(22):4236. https://doi.org/10.3390/math10224236
Chicago/Turabian StyleKim, Hyunjun, Siwoo Eum, Minjoo Sim, and Hwajeong Seo. 2022. "Efficient Implementation of SPEEDY Block Cipher on Cortex-M3 and RISC-V Microcontrollers" Mathematics 10, no. 22: 4236. https://doi.org/10.3390/math10224236
APA StyleKim, H., Eum, S., Sim, M., & Seo, H. (2022). Efficient Implementation of SPEEDY Block Cipher on Cortex-M3 and RISC-V Microcontrollers. Mathematics, 10(22), 4236. https://doi.org/10.3390/math10224236