HAETAE on ARMv8
Abstract
:1. Introduction
1.1. Contribution
1.1.1. Applying State-of-the-Art NTT Implementation Techniques
1.1.2. First Implementation of the HAETAE on 64-bit ARMv8 Processors Using NEON Instructions
2. Background
2.1. KpqC Contest
2.2. HAETAE
2.3. HAETAE on Cortex-M4
2.3.1. Polynomial Arithmetic Optimization on Cortex-M4
2.3.2. Gaussian Sampler Optimization on Cortex-M4
2.4. Target Processor: 64-bit ARMv8 Architecture
2.5. Previous Implementations of Post Quantum Cryptography on 64-bit ARMv8 Processors
3. Proposed Method
3.1. Optimized Implementation of NTT Utilizing NEON Instructions
- is the vector of NTT coefficients.
- is the NTT matrix, where each element is derived from the powers of .
- is the vector of polynomial coefficients.
Algorithm 1 Element-Wise Modular Polynomial Multiplication Utilizing NEON Instruction; (x0: Result of the multiplication operation, , : Inputs for the multiplication operation) | |
// mk_Q | |
1: MOVI.4s v3, #0xfc | 15: MOV x25, #64 |
2: REV16 v3.16b, v3.16b | 16: loop_j: |
3: MOVI.4s v5, #0x01 | |
4: ORR.16b v3, v3, v5 | 17: LD1 {v1.4s}, [x1], #16 |
18: LD1 {v2.4s}, [x2], #16 | |
// mk_Q_Inv | |
5: MOVI.4s v4, #0x38 | // mont_reduce |
6: MOVI.4s v5, #0x0f | 19: SQDMULH v6.4s, v0.4s, v2.4s |
7: REV32 v4.16b, v4.16b | 20: MUL.4s v27, v2, v4 |
8: SHL.4s v5, v5, #16 | 21: MUL.4s v7, v0, v27 |
9: ORR.16b v4, v4, v5 | 22: SQDMULH v16.4s, v7.4s, v3.4s |
10: MOVI.4s v5, #0x04 | 23: SHSUB.4s v6, v6,v16 |
11: REV16 v5.16b, v5.16b | |
12: ORR.16b v4, v4, v5 | 24: ST1 v6.4s, [x0], #16 |
13: MOVI.4s v5, #0x01 | 25: ADD x25, x25, #-1 |
14: ORR.16b v4, v4, v5 | 26: CBNZ x25, loop_j |
3.1.1. Optimized Implementation of NTT
Algorithm 2 Part of NTT utilizing NEON instruction; (: Result of the multiplication operation, , : Inputs for the multiplication operation) | |
.macro len128 | 9: MUL.4s v7, v0, v27 |
1: MOV x15, #32 | 10: SQDMULH v16.4s, v7.4s, v3.4s |
2: LD1R v2.4s, [x1], #4 | 11: SHSUB.4s v6, v6,v16 |
3: loop_i128: | 12: SUB.4s v0, v1, v6 |
13: ST1 v0.4s, [x0] | |
4: LD1 v1.4s, [x0] | 14: ADD x0, x0, #-512 |
5: ADD x0, x0, #512 | 15: ADD.4s v1, v1, v6 |
6: LD1 v0.4s, [x0] | 16: ST1 v1.4s, [x0], #16 |
// mont_reduce | 17: ADD x15, x15, #-1 |
7: SQDMULH v6.4s, v0.4s, v2.4s | 18: CBNZ x15, loop_i128 |
8: MUL.4s v27, v2, v4 | .endm |
3.1.2. Data Reordering as a Permutation
3.2. Optimized Implementation of Inverse NTT
Pre-Computation
Algorithm 3 Part of Inverse NTT utilizing NEON instruction; (: Result of Multiplication Operation, , : Input of Multiplication Operation) | |
.macro len1 | 15: MOV.4s v6, v8 |
1: MOV x13, #32 | 16: ADD.4s v8, v9, v6 |
17: SUB.4s v9, v6, v9 | |
2: loop: | |
3: LD1 {v2.4s}, [x1], #16 | // mont_reduce |
4: MUL.4s v27, v2, v4 | 18: SQDMULH v6.4s, v9.4s, v2.4s |
5: LD1 {v1.2s}, [x0], #8 | 19: MUL.4s v7, v9, v27 |
6: LD1 {v0.2s}, [x0], #8 | 20: SQDMULH v16.4s, v7.4s, v3.4s |
21: SHSUB.4s v6, v6,v16 | |
// data_reordering | |
7: TRN1.4s v17, v1, v0 | 22: ZIP1.4s v0, v8, v6 |
8: TRN2.4s v18, v1, v0 | 23: ZIP2.4s v1, v8, v6 |
9: LD1 {v1.2s}, [x0], #8 | |
10: LD1 {v0.2s}, [x0] | 24: ADD x0, x0, #-24 |
11: TRN1.2s v12, v1, v0 | 25: ST1 {v0.4s}, [x0], #16 |
12: TRN2.2s v19, v1, v0 | 26: ST1 {v1.4s}, [x0], #16 |
13: ZIP1.2d v8, v17, v12 | 27: ADD x13, x13, #-1 |
14: ZIP1.2d v9, v18, v19 | 28: CBNZ x13, loop |
| .endm |
4. Evaluation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Feynman, R.P. Simulating physics with computers. In Feynman and Computation; CRC Press: Boca Raton, FL, USA, 2018; pp. 133–153. [Google Scholar]
- Shor, P.W. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 1999, 41, 303–332. [Google Scholar] [CrossRef]
- Grover, L.K. A fast quantum mechanical algorithm for database search. In Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, Philadelphia, PA, USA, 22–24 May 1996; pp. 212–219. [Google Scholar]
- Choi, C.Q. IBM’s Quantum Leap: The Company Will Take Quantum Tech Past the 1,000-Qubit Mark in 2023. IEEE Spectr. 2023, 60, 46–47. [Google Scholar] [CrossRef]
- Yan, B.; Tan, Z.; Wei, S.; Jiang, H.; Wang, W.; Wang, H.; Luo, L.; Duan, Q.; Liu, Y.; Shi, W.; et al. Factoring integers with sublinear resources on a superconducting quantum processor. arXiv 2022, arXiv:2212.12372. [Google Scholar]
- Hossain, M.; Kayas, G.; Hasan, R.; Skjellum, A.; Noor, S.; Islam, S.R. A Holistic Analysis of Internet of Things (IoT) Security: Principles, Practices, and New Perspectives. Future Internet 2024, 16, 40. [Google Scholar] [CrossRef]
- Kumar, A.; Ottaviani, C.; Gill, S.S.; Buyya, R. Securing the future internet of things with post-quantum cryptography. Secur. Priv. 2022, 5, e200. [Google Scholar] [CrossRef]
- Balogh, S.; Gallo, O.; Ploszek, R.; Špaček, P.; Zajac, P. IoT security challenges: Cloud and blockchain, postquantum cryptography, and evolutionary techniques. Electronics 2021, 10, 2647. [Google Scholar] [CrossRef]
- Kumari, S.; Singh, M.; Singh, R.; Tewari, H. Post-quantum cryptography techniques for secure communication in resource-constrained Internet of Things devices: A comprehensive survey. Softw. Pract. Exp. 2022, 52, 2047–2076. [Google Scholar] [CrossRef]
- Shamshad, S.; Riaz, F.; Riaz, R.; Rizvi, S.S.; Abdulla, S. An enhanced architecture to resolve public-key cryptographic issues in the internet of things (IoT), employing quantum computing supremacy. Sensors 2022, 22, 8151. [Google Scholar] [CrossRef]
- Malina, L.; Popelova, L.; Dzurenda, P.; Hajny, J.; Martinasek, Z. On feasibility of post-quantum cryptography on small devices. IFAC-PapersOnLine 2018, 51, 462–467. [Google Scholar] [CrossRef]
- NIST PQC Project. Available online: https://csrc.nist.gov/Projects/post-quantum-cryptography (accessed on 21 July 2024).
- KpqC Competition. Available online: https://kpqc.or.kr/competition.html (accessed on 21 July 2024).
- Oder, T.; Speith, J.; Höltgen, K.; Güneysu, T. Towards practical microcontroller implementation of the signature scheme Falcon. In Proceedings of the Post-Quantum Cryptography: 10th International Conference, PQCrypto 2019, Chongqing, China, 8–10 May 2019; Revised Selected Papers 10; Springer: Berlin/Heidelberg, Germany, 2019; pp. 65–80. [Google Scholar]
- Chen, M.S.; Chou, T. Classic McEliece on the ARM cortex-M4. In IACR Transactions on Cryptographic Hardware and Embedded Systems; IACR: Lyon, France, 2021; pp. 125–148. [Google Scholar]
- Sim, M.; Eum, S.; Kwon, H.; Kim, H.; Seo, H. Optimized implementation of encapsulation and decapsulation of Classic McEliece on ARMv8. Cryptol. ePrint Arch. 2022, 2022/1706. Available online: https://eprint.iacr.org/2022/1706 (accessed on 23 September 2024).
- Nguyen, D.T.; Gaj, K. Fast falcon signature generation and verification using armv8 neon instructions. In Proceedings of the International Conference on Cryptology in Africa; Springer: Berlin/Heidelberg, Germany, 2023; pp. 417–441. [Google Scholar]
- Huang, J.; Adomnicăi, A.; Zhang, J.; Dai, W.; Liu, Y.; Cheung, R.C.; Koç, Ç.K.; Chen, D. Revisiting Keccak and Dilithium Implementations on ARMv7-M. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024, 2024, 1–24. [Google Scholar] [CrossRef]
- Kim, Y.; Song, J.; Youn, T.Y.; Seo, S.C. Crystals-Dilithium on ARMv8. Secur. Commun. Netw. 2022, 2022, 5226390. [Google Scholar] [CrossRef]
- Seo, S.C.; An, S. Parallel implementation of CRYSTALS-Dilithium for effective signing and verification in autonomous driving environment. ICT Express 2023, 9, 100–105. [Google Scholar] [CrossRef]
- Becker, H.; Hwang, V.; Kannwischer, M.J.; Yang, B.Y.; Yang, S.Y. Neon ntt: Faster dilithium, kyber, and saber on cortex-a72 and apple m1. Cryptol. ePrint Arch. 2021, 2021/986. Available online: https://eprint.iacr.org/2021/986 (accessed on 23 September 2024). [CrossRef]
- Seo, H.; Sanal, P.; Jalali, A.; Azarderakhsh, R. Optimized implementation of SIKE round 2 on 64-bit ARM Cortex-A processors. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 2659–2671. [Google Scholar] [CrossRef]
- Cheon, J.H.; Choe, H.; Devevey, J.; Güneysu, T.; Hong, D.; Krausz, M.; Land, G.; Möller, M.; Stehlé, D.; Yi, M. Haetae: Shorter lattice-based fiat-shamir signatures. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024, 2024, 25–75. [Google Scholar] [CrossRef]
- Kwon, H.; Sim, M.; Song, G.; Lee, M.; Seo, H. Evaluating kpqc algorithm submissions: Balanced and clean benchmarking approach. In Proceedings of the International Conference on Information Security Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 338–348. [Google Scholar]
- Cottaar, J.; Hövelmanns, K.; Hülsing, A.; Lange, T.; Mahzoun, M.; Pellegrini, A.; Ravagnani, A.; Schäge, S.; Trimoska, M.; de Weger, B. Report on evaluation of KpqC candidates. Cryptol. ePrint Arch. 2023, 2023/1853. Available online: https://eprint.iacr.org/2023/1853 (accessed on 23 September 2024).
- Choi, Y.; Kim, M.; Kim, Y.; Song, J.; Jin, J.; Kim, H.; Seo, S.C. KpqBench: Performance and Implementation Security Analysis of KpqC Competition Round 1 Candidates. IEEE Access 2024. [Google Scholar] [CrossRef]
- Lee, J.; Lee, E.m.; Kim, J. Security Analysis on TiGER KEM in KpqC Round 1 Competition Using Meet-LWE Attack. J. Korea Inst. Inf. Secur. Cryptol. 2023, 33, 709–719. [Google Scholar]
- Ikematsu, Y.; Jo, H.; Yasuda, T. A security analysis on MQ-Sign. In Proceedings of the International Conference on Information Security Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 40–51. [Google Scholar]
- Kim, S.; Lee, E.M.; Lee, J.; Lee, M.J.; Noh, H. Security Evaluation on KpqC Round 1 Lattice-Based Algorithms Using Lattice Estimator. In Proceedings of the International Conference on Information Security and Cryptology; Springer: Berlin/Heidelberg, Germany, 2023; pp. 261–281. [Google Scholar]
- NIST PQC Project: Digital Signature Schemes. Available online: https://csrc.nist.gov/Projects/pqc-dig-sig/round-1-additional-signatures (accessed on 21 July 2024).
- Lyubashevsky, V. Fiat-Shamir with aborts: Applications to lattice and factoring-based signatures. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security; Springer: Berlin/Heidelberg, Germany, 2009; pp. 598–616. [Google Scholar]
- Lyubashevsky, V. Lattice signatures without trapdoors. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques; Springer: Berlin/Heidelberg, Germany, 2012; pp. 738–755. [Google Scholar]
- Devevey, J.; Fawzi, O.; Passelègue, A.; Stehlé, D. On rejection sampling in lyubashevsky’s signature scheme. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security; Springer: Berlin/Heidelberg, Germany, 2022; pp. 34–64. [Google Scholar]
- Abdulrahman, A.; Hwang, V.; Kannwischer, M.J.; Sprenkels, A. Faster kyber and dilithium on the cortex-m4. In Proceedings of the International Conference on Applied Cryptography and Network Security; Springer: Berlin/Heidelberg, Germany, 2022; pp. 853–871. [Google Scholar]
- Armv8-A Instruction Set Architecture. Available online: https://developer.arm.com/documentation/den0024/a/An-Introduction-to-the-ARMv8-Instruction-Sets (accessed on 21 July 2024).
- Kwon, H.; Kim, H.; Sim, M.; Eum, S.; Lee, M.; Lee, W.K.; Seo, H. ARMing-Sword: Scabbard on ARM. In Proceedings of the International Conference on Information Security Applications; Springer: Berlin/Heidelberg, Germany, 2022; pp. 237–250. [Google Scholar]
- Kwon, H.; Kim, H.; Sim, M.; Lee, W.K.; Seo, H. Look-up the Rainbow: Table-based Implementation of Rainbow Signature on 64-bit ARMv8 Processors. ACM Trans. Embed. Comput. Syst. 2023, 22, 80. [Google Scholar] [CrossRef]
- Sim, M.; Kwon, H.; Eum, S.; Song, G.; Lee, M.; Seo, H. Efficient Implementation of the Classic McEliece on ARMv8 Processors. In Proceedings of the International Conference on Information Security Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 324–337. [Google Scholar]
- Ducas, L.; Kiltz, E.; Lepoint, T.; Lyubashevsky, V.; Schwabe, P.; Seiler, G.; Stehlé, D. Crystals-Dilithium: A lattice-based digital signature scheme. In IACR Transactions on Cryptographic Hardware and Embedded Systems; IACR: Lyon, France, 2018; pp. 238–268. [Google Scholar]
- Montgomery, P.L. Modular multiplication without trial division. Math. Comput. 1985, 44, 519–521. [Google Scholar] [CrossRef]
- Chung, C.M.M.; Hwang, V.; Kannwischer, M.J.; Seiler, G.; Shih, C.J.; Yang, B.Y. NTT multiplication for NTT-unfriendly rings: New speed records for Saber and NTRU on Cortex-M4 and AVX2. In IACR Transactions on Cryptographic Hardware and Embedded Systems; IACR: Lyon, France, 2021; pp. 159–188. [Google Scholar]
- Zhang, N.; Yang, B.; Chen, C.; Yin, S.; Wei, S.; Liu, L. Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT. In IACR Transactions on Cryptographic Hardware and Embedded Systems; IACR: Lyon, France, 2020; pp. 49–72. [Google Scholar]
Scheme | n | q | Security Level | Verify Key (bytes) | Secret Key (bytes) | Signature (bytes) | ||
---|---|---|---|---|---|---|---|---|
HAETAE120 | 256 | 64,513 | 1 | 58 | 2 | 992 | 1376 | 1474 |
HAETAE180 | 256 | 64,513 | 1 | 80 | 3 | 1472 | 2080 | 2349 |
HAETAE260 | 256 | 64,513 | 1 | 128 | 5 | 2080 | 2720 | 2948 |
<Q> | - | 2 | - | 2 | - | 2 |
---|---|---|---|---|---|---|
Ta | 8h | 8h | 4s | 4s | 2d | 2d |
Tb | 8b | 16b | 4h | 8h | 2s | 4s |
asm | Operands | Description | Operation |
---|---|---|---|
ADD | Vd.T, Vn.T, Vm.T | Add | Vd ← Vn + Vm |
LD1 | Vt.T, [Xn] | Load multiple single-element structures | Vt ← [Xn] |
LD1R | Vt.T, [Xn] | Load single 1-element structure and replicate to all lanes (of one register). | Vt.T ← [Xn] |
MOV | Xd, #imm | Move (immediate) | Xd ← #imm |
MOV | Vd.T, Vn.T | Move (vector) | Vd ← Vn |
MOVI | Vt.T, #imm | Move immediate (vector) | Vt ← #imm |
MUL | Vd, Vn, Vm | Multiply | Vd ← Vn × Vm |
SMULL | Vd.Ta, Vn.Tb, Vm.Tb | Signed Multiply Long (lower half) | Vd ← Vn × Vm |
SMULL2 | Vd.Ta, Vn.Tb, Vm.Tb | Signed Multiply Long (upper half) | Vd ← Vn × Vm |
SMLSL | Vd.Ta, Vn.Tb, Vm.Tb | Signed Multiply-Substract Long (lower half) | Vd ← Vn × Vm |
SMLSL2 | Vd.Ta, Vn.Tb, Vm.Tb | Signed Multiply-Substract Long (upper half) | Vd ← Vn × Vm |
RET | {Xn} | Return from subroutine | Return |
SHL | Vd.T, Vn.T, #shift | Shift Left immediate (vector) | Vd ← Vn <<#shift |
SSHR | Vd.T, Vn.T, #shift | Signed Shift Right and immediate (vector) | Vd ← Vn >>#shift |
ST1 | Vt.T, [Xn] | Store multiple single-element structures from one, two, three, or four registers | [Xn] ← Vt |
SUB | Xd, Xn, #imm | Subtract immediate | Xd ← Xn − #imm |
SUB | Vd, Vn, Vm | Subtract | Vd ← Vn − Vm |
REV32 | Vd.T, Vn.T | Reverse elements in 32-bit words | Vd ← Vn of Reverse |
REV16 | Vd.T, Vn.T | Reverse elements in 16-bit words | Vd ← Vn of Reverse |
CBNZ | Wt, Label | Compare and Branch on Nonzero | Go to Label |
ZIP1 | Vd.T, Vn.T, Vm.T | Zip vectors primary | Vd ← Vn[even], Vm[even] Vd ← Vn[odd], Vm[odd] |
ZIP2 | Vd.T, Vn.T, Vm.T | Zip vectors secondary | Vd ← Vn[even], Vm[even] Vd ← Vn[odd], Vm[odd] |
XTN, XTN2 | Vd.Tb, Vn.Ta | Extracted Narrow | Vd ←Vn |
SQDMULH | Vd.T, Vn.T, Vm.T | Signed saturating Doubling Multiply returning High half | Vd ← 2 × Vn × Vm |
SHSUB | Vd.T, Vn.T, Vm.T | Signed Halving Subtract | Vd ← (Vn − Vm)/2 |
TRN1 | Vd.T, Vn.T, Vm.T | Transpose vectors primary | Vd ← Vn[even], Vm[even] Vd ← Vn[odd], Vm[odd] |
Scheme | Cheon et al. [23] | This Work | This Work (B) | ||||||
---|---|---|---|---|---|---|---|---|---|
Keygen | Sign | Verify | Keygen | Sign | Verify | Keygen | Sign | Verify | |
HAETAE120 | 1288 | 7616 | 400 | 1126 | 6869 | 322 | 1114 | 6818 | 316 |
HAETAE180 | 1502 | 4198 | 694 | 1303 | 3749 | 554 | 1295 | 3721 | 545 |
HAETAE260 | 2407 | 76,081 | 862 | 2173 | 67,758 | 710 | 2177 | 66,864 | 692 |
Cheon et al. [23] | This Work | This Work (B) | |
---|---|---|---|
NTT | 900 | 319 | 293 |
Inverse NTT | 1157 | 373 | 319 |
Poly pointwise Montgomery | 247 | 58 | 27 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sim, M.; Lee, M.; Seo, H. HAETAE on ARMv8. Electronics 2024, 13, 3863. https://doi.org/10.3390/electronics13193863
Sim M, Lee M, Seo H. HAETAE on ARMv8. Electronics. 2024; 13(19):3863. https://doi.org/10.3390/electronics13193863
Chicago/Turabian StyleSim, Minjoo, Minwoo Lee, and Hwajeong Seo. 2024. "HAETAE on ARMv8" Electronics 13, no. 19: 3863. https://doi.org/10.3390/electronics13193863
APA StyleSim, M., Lee, M., & Seo, H. (2024). HAETAE on ARMv8. Electronics, 13(19), 3863. https://doi.org/10.3390/electronics13193863