Efficient Large-Width Montgomery Modular Multiplier Design Based on Toom–Cook-5
Abstract
:1. Introduction
1.1. Background
1.2. Related Work
1.3. Motivation and Contribution
- This paper analyzes the complete workflow of the Toom–Cook-n multiplication and elaborates on the impact of increasing the degree n at each step.
- This paper proposes a pre-simplification approach to address the complexity of the interpolation matrix in Toom–Cook-5. It significantly reduces the computational burden of the interpolation step without incurring additional clock cycles.
- This paper optimizes the traditional carry–save adder (CSA) architecture by decoupling the compressors and full adders, effectively eliminating unnecessary resource overhead caused by redundant full adders and improving area efficiency.
- This paper designs a two-stage pipelined 3-level Karatsuba multiplication architecture to optimize the timing of the post-splitting multiplications.
- This paper improves the addition processing in the Montgomery modular multiplication by nearly halving the bit-width of the addition operations.
2. Notations and Preliminaries
2.1. Toom–Cook-n Multiplication
2.2. Karatsuba Multiplication
2.3. Montgomery Modular Multiplication
Algorithm 1 Montgomery Modular Multiplication |
|
2.4. Carry–Save Adder
3. Proposed Algorithms and Methods
3.1. Proposed Toom–Cook-5 Multiplication
Algorithm 2 Interpolation Matrix Pre-simplified Toom–Cook-5 Multiplication |
|
3.2. Montgomery Modular Multiplication Based on Toom–Cook-5
Algorithm 3 Montgomery Modular Multiplication based on Toom–Cook-n |
|
Algorithm 4 Montgomery Modular Multiplication based on Toom–Cook-5 |
|
4. Hardware Architecture of Toom–Cook-5
4.1. Overall Architecture
4.2. Decoupled Carry–Save Adder Architecture
4.3. 3-Level Karatsuba Multiplication Architecture
4.4. Bit Width Derivation and Division Optimization for Interpolation
5. Implementation Results and Comparison
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rivest, R.L.; Shamir, A.; Adleman, L. A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 1978, 21, 120–126. [Google Scholar]
- Miller, V.S. Use of elliptic curves in cryptography. In Proceedings of the Conference on the Theory and Application of Cryptographic Techniques, Linz, Austria, 9–11 April 1985; Springer: Berlin/Heidelberg, Germany, 1985; pp. 417–426. [Google Scholar]
- Hankerson, D.; Menezes, A.; Springer, S.V. Guide to Elliptic Curve Cryptography; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Eberle, H.; Shantz, S.; Gupta, V.; Gura, N.; Rarick, L.; Spracklen, L. Accelerating next-generation public-key cryptosystems on general-purpose CPUs. IEEE Micro 2005, 25, 52–59. [Google Scholar]
- Choe, J.Y.; Shin, K.W. A High Performance Modular Multiplier for ECC. J. IKEEE 2020, 24, 961–968. [Google Scholar]
- Karatsuba, A. Multiplication of multidigit numbers on automata. Sov. Phys. Dokl. 1963, 7, 595–596. [Google Scholar]
- Heidarpur, M.; Mirhassani, M. An Efficient and High-Speed Overlap-Free Karatsuba-Based Finite-Field Multiplier for FGPA Implementation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 667–676. [Google Scholar]
- Granlund, T.; The GMP Development Team. The GNU Multiple Precision Arithmetic Library Manual. 2014. Available online: https://gmplib.org/ (accessed on 25 March 2025).
- Toom, A.L. The complexity of a scheme of functional elements realizing the multiplication of integers. Soviet Math 1963, 3, 498. [Google Scholar]
- Yap, C.; Li, C. QuickMul: Practical FFT-Based Integer Multiplication; Department of Computer Science Courant Institute: New York, NY, USA, 2001. [Google Scholar]
- Cook, S.A.; Aanderaa, S.O. On the minimum computation time of functions. Trans. Am. Math. Soc. 1969, 142, 291–314. [Google Scholar]
- Elia, M. Loss of Precision in Implementations of the Toom-Cook Algorithm; The University of Vermont and State Agricultural College: Burlington, VT, USA, 2021. [Google Scholar]
- Wang, J.; Yang, C.; Zhang, F.; Meng, Y.; Su, Y. TCPM: A Reconfigurable and Efficient Toom-Cook-Based Polynomial Multiplier Over Rings Using a Novel Compressed Postprocessing Algorithm. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2023, 31, 1153–1166. [Google Scholar]
- Das, M.; Jajodia, B. Area and Delay Trade-Offs in Three-Way Toom-Cook Large Integer Multipliers Implemented on FPGAs. IEEE Trans. Circuits Syst. I Regul. Pap. 2024, 72, 600–609. [Google Scholar]
- Bodrato, M. Towards optimal Toom-Cook multiplication for univariate and multivariate polynomials in characteristic 2 and 0. In Proceedings of the Arithmetic of Finite Fields: First International Workshop, WAIFI 2007, Madrid, Spain, 21–22 June 2007; Proceedings 1. Springer: Berlin/Heidelberg, Germany, 2007; pp. 116–133. [Google Scholar]
- Dutta, S.; Bhattacharjee, D.; Chattopadhyay, A. Quantum circuits for Toom-Cook multiplication. Phys. Rev. A 2018, 98, 012311. [Google Scholar]
- Umer, U.; Rashid, M.; Alharbi, A.R.; Alhomoud, A.; Kumar, H.; Jafri, A.R. An Efficient Crypto Processor Architecture for Side-Channel Resistant Binary Huff Curves on FPGA. Electronics 2022, 11, 1131. [Google Scholar] [CrossRef]
- Wang, J.; Yang, C.; Zhang, F.; Meng, Y.; Xiang, S.; Su, Y. A High-Throughput Toom-Cook-4 Polynomial Multiplier for Lattice-Based Cryptography Using a Novel Winograd-Schoolbook Algorithm. IEEE Trans. Circuits Syst. I Regul. Pap. 2023, 71, 359–372. [Google Scholar]
- Putranto, D.S.C.; Wardhani, R.W.; Larasati, H.T.; Kim, H. Space and Time-Efficient Quantum Multiplier in Post Quantum Cryptography Era. IEEE Access 2023, 11, 21848–21862. [Google Scholar] [CrossRef]
- Ding, J.; Li, S.; Gu, Z. High-Speed ECC Processor Over NIST Prime Fields Applied With Toom–Cook Multiplication. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 66, 1003–1016. [Google Scholar]
- Gu, Z.; Li, S. A Division-Free Toom–Cook Multiplication-Based Montgomery Modular Multiplication. IEEE Trans. Circuits Syst. Part II Express Briefs 2019, 66, 1401–1405. [Google Scholar]
- Hao, Y.; Wang, W.; Dang, H.; Wang, G. Efficient Barrett Modular Multiplication Based on Toom–Cook Multiplication. IEEE Trans. Circuits Syst. II Express Briefs 2023, 71, 862–866. [Google Scholar]
- Barker, E.B.; Barker, W.C.; Burr, W.E.; Polk, W.T.; Smid, M.E. Recommendation for Key Management Part 1: General (Revision 3); National Institute of Standards and Technology: Gaithersburg, MD, USA, 2012. [Google Scholar]
- Barker, E.B.; Dang, Q. Recommendation for Key Management Part 3: Application-Specific Key Management Guidance; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2009. [Google Scholar]
- Lochter, M.; Merkle, J. Elliptic Curve Cryptography (ECC) Brainpool Standard Curves and Curve Generation. In N Koblitz an Elliptic Curve Implementation of the Finite Field Digital Signature Algorithm Proceedings of Crypto ’98 Lncs; Springer: Berlin, Germany, 2010. [Google Scholar]
- Awano, H.; Ikeda, M. Fourℚ on ASIC: Breaking Speed Records for Elliptic Curve Scalar Multiplication. In Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy, 25–29 March 2019. [Google Scholar]
- St. Denis, T.J.; Hamilton, N.F. Karatsuba Based Multiplier and Method. WO2007012179A3, 21 July 2006. [Google Scholar]
- Maeder, R.E. Storage allocation for the Karatsuba integer multiplication algorithm. In Proceedings of the Design & Implementation of Symbolic Computation Systems, International Symposium, Disco 93, Gmunden, Austria, 15–17 September 1993. [Google Scholar]
- Montgomery, P.L. Modular multiplication without trial division. Math. Comp 1985, 44, 519–521. [Google Scholar]
- Bedrij, O.J. Carry-Select Adder. IRE Trans. Electron. Comput. 1962, EC-11, 340–346. [Google Scholar] [CrossRef]
- Ramkumar, B.; Kittur, H.M.; Kannan, P.M. ASIC implementation of modified faster carry save adder. Eur. J. Entific Res. 2010, 42, 53–58. [Google Scholar]
- Kong, Y. Optimizing the Improved Barrett Modular Multipliers for Public-Key Cryptography. In Proceedings of the 2010 International Conference on Computational Intelligence and Software Engineering, Wuhan, China, 10–12 December 2010. [Google Scholar]
- Kuang, S.R.; Wang, J.P.; Chang, K.C.; Hsu, H.W. Energy-efficient high-throughput Montgomery modular multipliers for RSA cryptosystems. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2012, 21, 1999–2009. [Google Scholar]
- Kuang, S.R.; Wu, K.Y.; Lu, R.Y. Low-cost high-performance VLSI architecture for Montgomery modular multiplication. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2015, 24, 434–443. [Google Scholar]
- Zhang, Z.; Zhang, P. A scalable montgomery modular multiplication architecture with low area-time product based on redundant binary representation. Electronics 2022, 11, 3712. [Google Scholar] [CrossRef]
- Miyamoto, A.; Homma, N.; Aoki, T.; Satoh, A. Systematic design of RSA processors based on high-radix Montgomery multipliers. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2010, 19, 1136–1146. [Google Scholar]
- Knezevic, M.; Vercauteren, F.; Verbauwhede, I. Faster Interleaved Modular Multiplication Based on Barrett and Montgomery Reduction Methods. IEEE Trans. Comput. 2010, 59, 1715–1721. [Google Scholar] [CrossRef]
- Coronado, L.C.; García. Can Schönhage Multiplication Speed Up the RSA Decryption or Encryption? (Extended Abstract). 2005. Available online: http://www.cdc.informatik.tu-darmstadt.de/mitarbeiter/coronado.html (accessed on 25 March 2025).
Design | Process | Area () | Frequency (MHz) | Time (ns) | Power (mW) | ATP |
---|---|---|---|---|---|---|
[33] | 90 nm | 749,076 | 179 | 4570.3 | 70.6 | 3423 |
[34] | 90 nm | 498,379 | 250 | 3520 | - | 1754 |
[35] | 90 nm | 992,500 | 392 | 291 | - | 288 |
[36] | 90 nm | 54,587 | 472 | 4680 | - | 257 |
[21] | 90 nm | 1,799,451 | 257 | 85.58 | 402.93 | 154 |
[22] | 90 nm | 1,012,576 | 277 | 126 | 392.47 | 127 |
ours | 90 nm | 515,148 | 413 | 274.8 | 379.35 | 142 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, K.; Wang, X.; Hao, Y.; Zhang, J.; Wang, W. Efficient Large-Width Montgomery Modular Multiplier Design Based on Toom–Cook-5. Electronics 2025, 14, 1402. https://doi.org/10.3390/electronics14071402
Liu K, Wang X, Hao Y, Zhang J, Wang W. Efficient Large-Width Montgomery Modular Multiplier Design Based on Toom–Cook-5. Electronics. 2025; 14(7):1402. https://doi.org/10.3390/electronics14071402
Chicago/Turabian StyleLiu, Kuanhao, Xiaohua Wang, Yue Hao, Jingqi Zhang, and Weijiang Wang. 2025. "Efficient Large-Width Montgomery Modular Multiplier Design Based on Toom–Cook-5" Electronics 14, no. 7: 1402. https://doi.org/10.3390/electronics14071402
APA StyleLiu, K., Wang, X., Hao, Y., Zhang, J., & Wang, W. (2025). Efficient Large-Width Montgomery Modular Multiplier Design Based on Toom–Cook-5. Electronics, 14(7), 1402. https://doi.org/10.3390/electronics14071402