Efficient and Low-Cost Modular Polynomial Multiplier for WSN Security

Haroon, Fariha; Li, Hua

doi:10.3390/jsan14050086

Open AccessArticle

Efficient and Low-Cost Modular Polynomial Multiplier for WSN Security

by

Fariha Haroon

and

Hua Li

^*

Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada

^*

Author to whom correspondence should be addressed.

J. Sens. Actuator Netw. 2025, 14(5), 86; https://doi.org/10.3390/jsan14050086

Submission received: 18 June 2025 / Revised: 13 August 2025 / Accepted: 20 August 2025 / Published: 25 August 2025

(This article belongs to the Special Issue Applications of Wireless Sensor Networks: Innovations and Future Trends)

Download

Browse Figures

Versions Notes

Abstract

Wireless Sensor Network (WSN) technology has constrained computing resources that require efficient and low-cost cryptographic hardware to provide security services, particularly when dealing with large modular polynomial multiplication in cryptography. In this paper, a cost-efficient reconfigurable Karatsuba modular polynomial multiplier is proposed for general modulus polynomials. The modulus polynomial can be changed easily depending on the application. The proposed modular polynomial multiplier is synthesized and simulated by the AMD Vivado Design Tool. The design’s performance on ADP (Area Delay Product) has been improved compared to previous designs. It can be applied in ECC encryption to speed up the security services in WSN.

Keywords:

polynomial multiplier; galois field; elliptic curve cryptography; WSN

1. Introduction

The growing dependence on WSNs has resulted in growing demand for effective cryptographic solutions. WSN devices are often resource-constrained, making it essential to implement security measures that balance computational efficiency with robust data protection. Large integer and polynomial multiplications serve as the foundational arithmetic operations in many cryptography algorithms; however, they can lead to significant computational complexity issues. Key-establishing and digital signature techniques are necessary to ensure secure connections on these devices and protect data exchange between channels. They can be divided into two categories: symmetric and asymmetric keys [1]. Within asymmetric keys, there are commonly used public-key algorithms such as Diffie-Hellman [2], RSA [3], ElGamal [4], and Elliptic Curve Cryptography (ECC) [5]. The security of Diffie-Hellman algorithms depends on the Discrete Logarithm Problem (DLP) in finite fields. ECC provides comparable or even higher security with significantly shorter key lengths, making it more efficient in computational performance. ECC stands out as the most suitable for resource-constrained applications due to its ability to provide strong security with shorter key sizes [6]. The core arithmetic operations in ECC rely on finite-field

GF (2^{n})

computations [7]. The computation of elliptic curve points is fundamentally dependent on finite-field arithmetic operations, with finite-field multiplication being the most crucial task in its hardware implementation. The size and latency of the multiplier significantly impact area utilization, route delay, and throughput of the overall system, making it an essential component in improving ECC hardware designs. The basis representing the elements of a finite field affects the performance of arithmetic operations. Different types of basis, such as the polynomial basis, normal basis, and dual basis, can be employed to execute multiplication in

GF (2^{n})

. Moreover, the polynomial basis representation yields more efficient, consistent, and scalable multiplier designs [8]. One of the simplest algorithms for polynomial multiplication is known as the Schoolbook, or Conventional Algorithm (CA). Other algorithms like Karatsuba [9] and Fast Fourier Transform (FFT) [10] are also proposed.

The existing modular polynomial multipliers took up more hardware and processing resources, which were not suitable for computing restricted WSN applications. In addition, the modulus polynomial in the previous designs was fixed. This hardware must be redesigned if the modulus polynomial is changed. In this paper, a low-cost and efficient reconfigurable Karatsuba modular polynomial multiplier is proposed for security services in different applications. The contributions of the paper are as follows:

Cost-Efficient and Symmetric Design: The proposed modular polynomial multiplier has been designed with symmetric PEs and submodules for efficient implementation. It includes $3 n$ registers, $4 n$ XOR gates, $3 n$ AND gates, and $1.5 n$ 2-to-1 multiplexers for the modular polynomial multiplication in $GF (2^{n})$ .
Serial Input and Parallel Output: One of the multiplicands is serially input to minimize I/O pin usage. After $\frac{n}{2}$ clock cycles, the modular product is output in parallel.
Karatsuba Reconfigurable Polynomial Multiplier for General Modulus Polynomials: The proposed architecture is reconfigurable with respect to the modulus polynomial, which is supplied as an input parameter. It supports dynamic updates of the modulus polynomial without hardware modification and is further optimized using a Karatsuba approach.

2. Background

A finite field, or Galois field (GF), is a set of finite number of elements. The finite field is widely used in cryptography because addition and subtraction in GF(2) are the same XOR operation, which is efficient for implementation. The elements in

GF (2^{n})

can be represented by a polynomial with degree less than n.

2.1. Schoolbook Polynomial Multiplication

The schoolbook (or conventional) polynomial multiplication is the most straightforward method used for multiplying two polynomials. In this approach, each coefficient of one polynomial is multiplied by every coefficient of the other polynomial, and the partial products are summed accordingly. Given two polynomials of degree

n - 1

over the finite field

R_{q}

:

A (x) = a_{0} + a_{1} x + a_{2} x^{2} + \dots + a_{n - 1} x^{n - 1},

(1)

B (x) = b_{0} + b_{1} x + b_{2} x^{2} + \dots + b_{n - 1} x^{n - 1},

(2)

the product polynomial

P (x)

is computed using (1) and (2) as:

P (x) = A (x) \cdot B (x) = \sum_{i = 0}^{n - 1} \sum_{j = 0}^{n - 1} a_{i} b_{j} x^{i + j} .

(3)

2.2. Karatsuba Polynomial Multiplication

To enhance efficiency and lower the complexity of schoolbook polynomial multiplication, divide-and-conquer techniques have gained significant attention, particularly for their ability to increase parallelism. Let

A (x) = \sum_{i = 0}^{n - 1} a_{i} x^{i}

and

B (x) = \sum_{i = 0}^{n - 1} b_{i} x^{i}

be two polynomials and

P (x) = A (x) B (x) mod (x^{n} + 1) = \sum_{i = 0}^{n - 1} p_{i} x^{i}

. By Karatsuba’s divide-and-conquer algorithm [9],

A (x)

and

B (x)

can be represented by higher and lower degree polynomials:

\begin{matrix} A (x) & = A_{1} (x) + A_{2} (x) x^{n / 2} \end{matrix}

(4)

\begin{matrix} B (x) & = B_{1} (x) + B_{2} (x) x^{n / 2} \end{matrix}

(5)

where

\begin{matrix} A_{1} (x) & = \sum_{i = 0}^{\frac{n}{2} - 1} a_{i} x^{i}, & A_{2} (x) & = \sum_{i = 0}^{\frac{n}{2} - 1} a_{(i + n / 2)} x^{i} \end{matrix}

(6)

\begin{matrix} B_{1} (x) & = \sum_{i = 0}^{\frac{n}{2} - 1} b_{i} x^{i}, & B_{2} (x) & = \sum_{i = 0}^{\frac{n}{2} - 1} b_{(i + n / 2)} x^{i} \end{matrix}

(7)

Then,

\begin{matrix} P (x) & = (A_{1} (x) + A_{2} (x) x^{n / 2}) (B_{1} (x) + B_{2} (x) x^{n / 2}) mod (x^{n} + 1) \end{matrix}

(8)

\begin{matrix} = (A_{1} (x) B_{1} (x) + (A_{1} (x) B_{2} (x) + A_{2} (x) B_{1} (x)) x^{n / 2} + A_{2} (x) B_{2} (x) x^{n}) mod (x^{n} + 1) \end{matrix}

(9)

There are four polynomial multiplications in Equation (9). To save one polynomial multiplication, we calculate:

\begin{matrix} A_{1} (x) B_{2} (x) + A_{2} (x) B_{1} (x) & = (A_{1} (x) + A_{2} (x)) (B_{1} (x) + B_{2} (x)) - A_{1} (x) B_{1} (x) - A_{2} (x) B_{2} (x) \end{matrix}

(10)

Let

\begin{matrix} P_{1} & = A_{1} (x) B_{1} (x) \end{matrix}

(11)

\begin{matrix} P_{2} & = A_{2} (x) B_{2} (x) \end{matrix}

(12)

\begin{matrix} P_{3} & = (A_{1} (x) + A_{2} (x)) (B_{1} (x) + B_{2} (x)) \end{matrix}

(13)

Then,

P (x) = (P_{1} + (P_{3} - P_{1} - P_{2}) x^{n / 2} + P_{2} x^{n}) mod (x^{n} + 1)

(14)

The Karatsuba method reduces the total number of coefficient multiplications greatly by employing the divide-and-conquer strategy.

Heidarpur et al. proposed a fast finite-field multiplier and implemented it in FPGA devices. Compared to the CA and Karatsuba, the entire Area-Delay Product (ADP) was nearly 25% and 30% more efficient [11]. Thirumoorthi et al. presented a binary polynomial multiplier with M-term overlap-free Karatsuba multiplication to reduce the delay and ADP [12]. Meher designed a piped systolic multiplier at the digit level based on trinomial field polynomials [13]. Zeghid et al. proposed a digit-serial multiplier

GF (2^{n})

based on Bivariate Polynomial Basis representation and a modified Radix-n Interleaved Multiplication method [14]. Kumari et al. presented a hybrid Karatsuba multiplier with the conventional polynomial multiplier and further reduced Karatsuba decompositions [15]. Das and Jaiodia designed a hybrid recursive Karatsuba multiplier with an optimized Schoolbook multiplier at the lowest levels [16].

Previous multipliers required hardware modification to change the parameter of the modulus polynomial in cryptography, which was troublesome. Our design allows for changing the parameter of the modulus polynomial without hardware modification.

If there is an irreducible trinomial

x^{m} + x^{k} + 1

for Elliptic Curve Cryptography, it is advised that k be as minimal as feasible, that is, the lowest degree for the middle term

x^{k}

. NIST has recommended the trinomials

x^{233} + x^{73} + 1

and

x^{409} + x^{87} + 1

for finite fields

GF (2^{233})

and

GF (2^{409})

, respectively, where the value of k is less than half the value of m [17]. An irreducible trinomial cannot be found for certain values of m. An irreducible polynomial, or polynomial with five terms, can be the next-best option in certain situations and still allow reasonably efficient computing to perform modular reduction operations. Table 1 summarizes the essential mathematical symbols utilized in this paper for clarity.

3. Proposed Reconfigurable Karatsuba Modular Polynomial Multiplier

We have proposed a cost-efficient reconfigurable general modular polynomial multiplier in which the modulus polynomial is one of the input parameters, and it can be easily changed without hardware modification [18].

Figure 1 illustrates the reconfigurable polynomial multiplier for

n = 4

, which is used as a base module to implement the reconfigurable Karatsuba multiplier. The partial product is stored in the registers

p [0]

to

p [3]

, with the coefficients of

b (x)

serially inputting from the least significant bit

p [0]

. The coefficients of

b (x)

and

f (x)

are parallel input. The multiplexers select one of the two inputs as the output depending on a clock counter. In the first clock, the coefficients of

a (x)

will be loaded into the shift registers SR. In subsequent clock cycles, the output of

A N D

gate will be selected and loaded into the shift register SR. That is, in the first clock,

SR = a (x)

. After that,

SR \leftarrow a (x) \cdot x mod f (x)

in the next clock. PE calculates

p [i] = p [i] + a [i] \cdot x^{i} \cdot b (x) mod f (x)

at each clock cycle. The final product will be derived in n clock cycles. Algorithm 1 describes the detailed operations for the reconfigurable modular polynomial multiplier.

The further Karatsuba optimization effectively reduces latency by decomposing the n-bit modular multiplication into concurrent

\frac{n}{2}

-bit operations. This strategy enables parallel processing and significantly reduces delay. In a n-bit multiplier, three

\frac{n}{2}

-bit multiplications are performed in parallel, reducing the critical path latency from n clock cycles to

\frac{n}{2}

clock cycles. This structure enables low-latency computation and is suitable for scalable hardware implementations.

The entire design promotes modularity and hardware efficiency for polynomial arithmetic operations in cryptographic systems, especially where the area and power are restricted.

Algorithm 1 GF-based Reconfigurable Modular Polynomial Multiplier

Input:

Polynomial $A (x)$ : Coefficients $a [0 : n - 1]$
Polynomial $B (x)$ : Coefficients $b [0 : n - 1]$
Modulus $f (x)$ : Coefficients $f [0 : n - 1]$

Output:

$P (x) = A (x) B (x) mod f (x)$

Load

a [0 : n - 1]

into shift registers

S R [0 : n - 1]

for (

i = 0

;

i \leq n - 1

;

i + +

) {

p [i] \leftarrow b [0] \land a [i]

}

for (

j = 1

;

j \leq n - 1

;

j + +

) {

S R [0] \leftarrow S R [n - 1] \land f [0]

for (

i = 1

;

i \leq n - 1

;

i + +

) {

S R [i] \leftarrow S R [i - 1] \oplus (S R [n - 1] \land f [i])

}

Right Shift

S R [0 : n - 1]

for (

i = 0

;

i \leq n - 1

;

i + +

) {

p [i] \leftarrow p [i] \oplus (b [j] \land S R [i])

}

In this design, the polynomial

A (x)

is partitioned into two subsets: odd-indexed coefficients and even-indexed coefficients represented as

A_{e v e n} (x)

and

A_{o d d} (x)

. These subsets undergo independent multiplication with corresponding subsets of polynomials

B (x)

, denoted similarly as

B_{e v e n} (x)

and

B_{o d d} (x)

. The product is computed by decomposing it into three sub-multiplications. Figure 2 illustrates the proposed Karatsuba polynomial multiplier for n = 8 with

f (x) = x^{8} + 1

. It accepts an 8-bit input polynomial

A (x)

, and 1-bit serial inputs corresponding to bits from another polynomial

B (x)

. The input polynomial

A (x)

is first partitioned into two subsets: evenA, comprising the even-indexed coefficients

{A [6], A [4], A [2], A [0]}

, and oddA, comprising the odd-indexed coefficients

{A [7], A [5], A [3], A [1]}

. An intermediate sum, Asum, is calculated as the bitwise XOR of evenA and oddA. These components serve as the basis for three sub-multiplications in the Karatsuba scheme:

e v e n A \times e v e n B

,

o d d A \times o d d B

, and

A s u m \times B s u m

. The coefficients of

B (x)

are serially passed into the system. Three 4-bit modular multiplier modules are instantiated to compute the partial products, where

f (x) = x^{4} + 1

is the irreducible polynomial used for modular reduction. We should first verify that the modular polynomial is legitimate to prevent incorrect input of modular polynomials.

Figure 3 demonstrates how to break down the 8-bit polynomial multiplication into three 4-bit polynomial multiplications.

4. Performance Comparison

All hardware designs and experiments in the journal paper were conducted using the AMD Xilinx Vivado Design Suite, version 2022.2, installed on Windows 10 (64-bit). The Vivado toolchain was used for both the simulation and synthesis of the VHDL/Verilog designs. Simulation and functional verification were performed using Vivado’s built-in simulator. The FPGA platforms utilized for prototyping and validation included the Digilent Nexys A7 (Artix-7 XC7A100T-1CSG324C), Digilent Arty S7 (Spartan-7 XC7S50-1CSGA324C), and ZedBoard (Zynq-7000 XC7Z020-1CLG484C). Resource utilization metrics such as Look-Up Tables (LUTs), Flip-Flops (FFs), and Critical Path Delay were collected from the synthesis reports generated by Vivado Design Suite.

This paper employs realistic standardized delay values of logic gates from common integrated circuits (ICs) produced by STMicroelectronics N.V., Geneva, Switzerland (e.g., M74HC08, M74HC86), to measure real-world performance accurately. This gives realistic delay estimates for hardware implementations that are feasible. By employing a uniform baseline for every kind of logic gate (AND, XOR, OR, etc.), it guarantees a balanced comparison. The theoretical delay in the critical path at the gate level of the design was calculated using discrete component delays for the 2-input AND gates (

T_{A N D} = 6

ns), the XOR gates (

T_{X O R} = 12

ns), and the 2:1 MUXes (

T_{M U X} = 11

ns). The steps of our proposed design are split into two parts: (1) the reduction stage (modular reduction logic), which involves an AND gate, an XOR gate, and a MUX, resulting in a total delay of

T_{A N D} + T_{X O R} + T_{M U X} = 6 + 12 + 11 = 29

ns; and (2) the PE stage (product accumulation logic), comprising an AND gate followed by an XOR gate, resulting in

T_{A N D} + T_{X O R} = 6 + 12 = 18

ns. The reduction stage dominates the critical path due to the additional MUX delay. To estimate the quantity of CMOS transistors utilized, conventional counts have been employed: six transistors for an AND gate, six for an XOR gate, six for an OR gate, six for a 2:1 multiplexer, eight for a latch, and four for a NAND gate. The metrics are studied and compared with similar designs in Table 2.

This section examines the performance of the proposed multiplier and compares it with other studies in this field. Theoretical metrics such as area (number of gates, FF, and Mux), delay, and ADP (Area × Delay) are used to assess algorithm efficiency. FPGA synthesis is carried out on various bit sizes from 4-bit to 233-bit multipliers and was created using irreducible trinomials to verify this methodology. The work in [19] introduced a most-significant-bit (MSB)-first fully bit level serial-in parallel-out polynomial basis (PB) multiplier based on a definition of the field elements. Pillutla et al. designed a systolic polynomial basis finite field multiplier based on a class of trinomials [20]. A parallel-in and parallel-out sequential polynomial basis multiplier was proposed in [21]. Our proposed architecture accepts any modulus polynomial

f (x)

as input. This eliminates hardware redesign costs for different standards, making it ideal for WSN devices that require cryptographic agility and adaptability. Design [19] employs

2 n - 1

AND gates and a significantly higher number of XOR gates, expressed as

2 n + N_{α}^{2} - 1

, along with

3 n - 2

flip-flops. It uses no multiplexers, indicating a purely combinational implementation. However, the delay, given as

max {T_{α}^{2} + T_{X}, T_{A} + 2 T_{X}}

, may become substantial depending on the complexity of the

α^{2}

module. In contrast, the proposed design offers better balance, with lower FF and XOR counts and a predictable delay of

T_{A} + T_{X} + T_{MUX}

. Pillutla et al. utilized

n^{2}

AND gates and

n^{2} - 1

XOR gates, with a high storage requirement of

1.5 n^{2} + n

flip-flops [20]. Although it has no MUX overhead and maintains a moderate latency of

\frac{n}{2} + 2

, its quadratic hardware complexity limits scalability. The proposed design improves upon this by reducing all components to linear growth with respect to n, making it more hardware-efficient. Similarly, ref. [21] features a hardware cost of

\frac{n^{2} + n}{2}

for both AND and XOR gates and uses

5 n - 1

flip-flops and

4 n

multiplexers. Its latency,

2 k_{t} + 1

, is the highest among all designs due to shift and addition cycles, and the delay is relatively complex:

T_{A} + ⌊ {log}_{2} n ⌋ T_{X} + 2 T_{MUX}

. While it introduces logarithmic efficiency in delay, its significant resource demands and high latency make it less suitable for efficient implementations compared to the proposed design.

The work in [22] is relatively efficient, with

3 n

AND and

2 n

XOR gates, and uses

4 n + 2

flip-flops and

2 n

multiplexers. It achieves a low latency of

0.664 n

and a straightforward delay of

2 T_{X} + T_{MUX}

. The proposed design is comparable in terms of gate usage and delay but uses fewer flip-flops and multiplexers, improving area efficiency. In conclusion, the proposed design achieves a well-balanced trade-off between hardware complexity and performance. With

3 n

AND gates,

5 n

XOR gates,

3 n + \frac{n}{2}

flip-flops, and

1.5 n

multiplexers, it delivers reduced latency (

\frac{n}{2}

) and consistent delay (

T_{A} + T_{X} + T_{MUX}

). Compared to existing designs [19,20,21,22] in Table 3, it demonstrates substantial improvements in key metrics such as area, delay, and latency. In particular, the proposed design reduces total ADP count by 24.04% compared to [19], 97.87% compared to [20], 99.06% compared to [21], and 36.66% compared to [22].

The proposed Karatsuba 8-bit modular polynomial multiplier was synthesized on the AMD/Xilinx Spartan-7 SP701 Evaluation Platform (AMD/Xilinx, San Diego, CA, USA). Figure 4 illustrates the module-level visualization of the design. The use of the Spartan-7 platform validates the practicality and scalability of the proposed design for real-world cryptographic applications.

5. Discussion

The results demonstrate that the proposed multiplier achieves an effective balance between hardware efficiency and performance. Compared to previous work, it significantly reduces area and latency while maintaining competitive delay. Its linear scalability with operand size makes it suitable for large-field arithmetics, such as

GF (2^{233})

. The architecture also supports arbitrary modulus polynomials, enabling cryptographic agility without hardware redesign, which is ideal for resource-constrained environments like WSNs. Table 4 illustrates the simulation performance with different FPGA boards by AMD Vivado Design Tool and compares it with other work. It can be seen that our proposed polynomial multiplier reduces ADP and logic usage greatly. As shown in Table 4, the proposed design demonstrates clear advantages in terms of the Area-Delay Product (ADP) compared to previous multipliers. For the 8-bit implementation, the ADP is reduced to 353.232, compared to 770 in [11] and 1367 in [23]. In the 24-bit case, the ADP is 2191.992, which is lower than the 6805 in [11] and 25,870 in [24]. For larger operand sizes such as 113-bit, the ADP is 40,827.96, offering an improvement over the 51,295 in [11] and the 161,498 in [25]. The proposed design achieves significant ADP reduction compared to previous work.

The proposed reconfigurable modular polynomial multiplier can be easily integrated into the ECC crypto chip and improve security performance in WSN.

6. Conclusions

A cost-efficient reconfigurable Karatsuba modular polynomial multiplier is proposed for WSN security services in which the modulus polynomial can be changed easily depending on the user’s specific applications and requirements. The features of modularity and reconfigurability are highly suitable for applications that require adaptability and optimized performance. The proposed modular polynomial multiplier has been synthesized and simulated by AMD Vivado Design Suite.

The proposed design demonstrates advantages in terms of the Area-Delay Product (ADP) compared to previous modular polynomial multipliers. In terms of area, the proposed design uses fewer logic resources than most compared works, maintaining a compact structure suitable for WSN applications. The delay is also competitive, remaining lower than or close to the best reported values in the literature, further supporting its suitability for resource-constrained cryptographic applications. Thus, it can be applied in ECC encryption to speed up security services in resource-constrained environments such as embedded systems and WSN devices, and IoT.

In the future, we will try to reduce the three sub-multiplier modules in the proposed Karatsuba multiplier to only one sub-multiplier with serial processing and resource sharing in order to save the area cost.

Author Contributions

F.H. proposed the design for the efficient and low-cost modular polynomial multiplier and synthesized it with AMD Vivado Design Suite. H.L. proposed the efficient and low-cost reconfigurable architecture of the modular polynomial multiplier. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Schneier, B. Applied Cryptography, 2nd ed.; John Wiley and Sons: New York, NY, USA, 1996. [Google Scholar]
Diffie, W.; Hellman, M.E. New directions in cryptography. IEEE Trans. Inf. Theory 1976, 22, 644–654. [Google Scholar] [CrossRef]
Rivest, R.L.; Shamir, A.; Adleman, L. A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 1978, 21, 120–126. [Google Scholar] [CrossRef]
ElGamal, T. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 1985, 31, 469–472. [Google Scholar] [CrossRef]
Koblitz, N. Elliptic curve cryptosystems. Math. Comput. 1987, 48, 203–209. [Google Scholar] [CrossRef]
Miller, V.S. Use of elliptic curves in cryptography. In Advances in Cryptology—EUROCRYPT 1985; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1985; pp. 417–426. [Google Scholar]
Odlyzko, A.M. Discrete logarithms in finite fields and their cryptographic significance. In Advances in Cryptology—EUROCRYPT 1984; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1984; pp. 224–314. [Google Scholar]
Hsu, I.-S.; Truong, T.-K.; Deutsch, L.J.; Reed, I.S. A comparison of VLSI architectures of finite field multipliers using dual, normal, or standard bases. IEEE Trans. Comput. 1988, 37, 735–739. [Google Scholar] [CrossRef]
Karatsuba, A.; Ofman, Y. Multiplication of multidigit numbers on automata. Sov. Phys. Dokl. 1963, 7, 595–596. [Google Scholar]
Nussbaumer, H.J. The Fast Fourier Transform and Convolution Algorithms; Springer: Berlin/Heidelberg, Germany, 1982. [Google Scholar]
Heidarpur, M.; Mirhassani, M. An efficient and high-speed overlap-free Karatsuba-based finite-field multiplier for FPGA implementation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 667–676. [Google Scholar] [CrossRef]
Thirumoorthi, M.; Dinesh, B.; Elakkina, R.; Sundararajan, S.; Sekar, A.; Lakshmanan, G. Formulations of M-term overlap-free Karatsuba binary polynomial multipliers and their hardware implementations. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2023, 31, 1509–1522. [Google Scholar] [CrossRef]
Meher, P.K. Systolic and super-systolic multipliers for finite field GF(2^m) based on irreducible trinomials. IEEE Trans. Circuits Syst. I Reg. Pap. 2008, 55, 1031–1040. [Google Scholar] [CrossRef]
Zeghid, M.; Sghaier, A.; Ahmed, H.Y.; Abdalla, O.A. Power/area-efficient ECC processor implementation for resource-constrained devices. Electronics 2023, 12, 4110. [Google Scholar] [CrossRef]
Kumari, R.; Rout, T.; Saini, B.; Pandey, J.G.; Karmakar, A. An efficient hardware implementation of elliptic curve point multiplication over GF (2^m) on FPGA. In International Symposium on VLSI Design and Test; Springer Nature: Singapore, 2024; pp. 257–271. [Google Scholar]
Das, M.; Jajodia, B. Hybrid recursive Karatsuba multiplications on FPGAs. IEEE Embed. Syst. Lett. 2025, 17, 240–243. [Google Scholar] [CrossRef]
National Institute of Standards and Technology (NIST). Recommendations for Discrete Logarithm-Based Cryptography: Elliptic Curve Domain Parameters; NIST Special Publication 800-186; NIST: Gaithersburg, MD, USA, 2023. Available online: https://csrc.nist.gov/pubs/sp/800/186/final (accessed on 2 June 2025).
Haroon, F.; Li, H. Reconfigurable and Compact Modular Polynomial Multiplier in Galois Field for the Security of IoT. In Proceedings of the IEEE Cloud-Summit 2025, Washington, DC, USA, 26–27 June 2025. [Google Scholar]
El-Razouk, H.; Reyhani-Masoleh, A. New bit-level serial GF(2^m) multiplication using polynomial basis. In Proceedings of the 2015 IEEE 22nd Symposium on Computer Arithmetic (ARITH), Vail, CO, USA, 6–8 June 2015; pp. 129–136. [Google Scholar]
Pillutla, S.R.; Boppana, L. Area-efficient low-latency polynomial basis finite field GF(2^m) systolic multiplier for a class of trinomials. Microelectron. J. 2020, 97, 104709. [Google Scholar] [CrossRef]
Imaña, J. Low latency GF(2^m) polynomial basis multiplier. IEEE Trans. Circuits Syst. I Regul. Pap. 2011, 58, 935–946. [Google Scholar] [CrossRef]
Selimis, G.N.; Fournaris, A.P.; Michail, H.E.; Koufopavlou, O. Improved throughput bit-serial multiplier for GF(2^m) fields. Integration 2009, 42, 217–226. [Google Scholar] [CrossRef]
Samanta, J.; Sultana, R.; Bhaumik, J. FPGA-based modified Karatsuba multiplier. In Proceedings of the 2014 International Conference on VLSI and Signal Processing (ICVSP), Kharagpur, India, 10–12 January 2014; Volume 10. [Google Scholar]
Arish, S.; Sharma, R.K. An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm. In Proceedings of the 2015 International Conference on Signal Processing and Communication (ICSC), Noida, India, 16–18 March 2015; pp. 303–308. [Google Scholar]
Imaña, J.L. Fast bit-parallel binary multipliers based on type-I pentanomials. IEEE Trans. Comput. 2017, 67, 898–904. [Google Scholar] [CrossRef]

Figure 1. 4-Bit reconfigurable modular polynomial multiplier for

a (x) \cdot b (x)

mod

f (x)

.

Figure 1. 4-Bit reconfigurable modular polynomial multiplier for

a (x) \cdot b (x)

mod

f (x)

.

Figure 2. Proposed Karatsuba polynomial multiplier for n = 8.

Figure 3. The input polynomials

A (x)

and

B (x)

are split into even and odd 4-bit halves. Intermediate sums are computed, followed by three parallel 4-bit multiplications.

Figure 3. The input polynomials

A (x)

and

B (x)

are split into even and odd 4-bit halves. Intermediate sums are computed, followed by three parallel 4-bit multiplications.

Figure 4. Module-level visualization of the Karatsuba 8-bit multiplier on a Spartan-7 SP701 FPGA.

Table 1. Symbol table for polynomial multiplication in

GF (2^{n})

.

Table 1. Symbol table for polynomial multiplication in

GF (2^{n})

.

Symbol	Description
$A (x), B (x)$	Input polynomials to be multiplied
$f (x)$	Modulus polynomial (e.g., $x^{n} + 1$ )
$P (x)$	Output modular product: $A (x) \cdot B (x) mod f (x)$
$a_{i}, b_{i}$	Coefficients of $A (x)$ and $B (x)$ , elements of $GF (2)$
n	Degree or bit-width of the polynomials
⊕	XOR operation (addition/subtraction in $GF (2)$ )
∧	AND operation (multiplication in $GF (2)$ )

Table 2. Comparison of hardware cost (gates, FF, Mux), latency, and delays.

Designs	#AND	#XOR	#FF	#Mux	Latency	Delay
[19]	$2 n - 1$	$2 n + N_{α}^{2} - 1$	$3 n - 2$	0	n	$max {T_{α}^{2} + T_{X}, T_{A} + 2 T_{X}}$
[20]	$n^{2}$	$n^{2} - 1$	$1.5 n^{2} + n$	0	$\frac{n}{2} + 2$	$T_{A} + T_{X}$
[21]	$\frac{n^{2} + n}{2}$	$\frac{n^{2} + n}{2}$	$5 n - 1$	$4 n$	$2 k_{t} + 1$	$T_{A} + ⌊ {log}_{2} n ⌋ T_{X} + 2 T_{M U X}$
[22]	$3 n$	$2 n$	$4 n + 2$	$2 n$	$0.664 n$	$2 T_{X} + T_{M U X}$
Prop.	$3 n$	$4 n$	$3 n$	$1.5 n$	$\frac{n}{2}$	$T_{A} + T_{X} + T_{M U X}$

k: digit size;

k_{t}

: cycles for right-shifts/additions; latency: clock cycles; and

N_{α}^{2}

: XOR count for

α^{2}

module.

Table 3. Analyzing the transistor count and delay of the proposed multiplier and comparing it with related studies for NIST

GF (2^{233})

, where in table, ADP (in

10^{6}

) stands for Total Area Delay Product, where Delay Product =

D e l a y \times L a t e n c y

and Area = #Transistors Count.

Table 3. Analyzing the transistor count and delay of the proposed multiplier and comparing it with related studies for NIST

GF (2^{233})

, where in table, ADP (in

10^{6}

) stands for Total Area Delay Product, where Delay Product =

D e l a y \times L a t e n c y

and Area = #Transistors Count.

Designs	#AND	#XOR	#FF	#Mux	Latency	Delay	#Trans.	ADP	% Reduction
[19]	465	467	697	0	233	30	11,168	78	24.04
[20]	54,289	54,288	81,666	0	119	18	1,304,794	2794	97.87
[21]	27,261	27,261	1164	932	149	124	342,036	6319	99.06
[22]	699	466	934	466	155	35	17,258	93	36.66
Prop.	699	932	699	350	117	29	17,475	59	—

Table 4. Comparison of Different multipliers by FPGA implementation.

Algorithm	GF( $2^{n}$ )	LUTs	Slices	Total Delay	ADP	% Reduction	FPGA Device Board
[23]		62	36	13.95	1367	74.16
[11]	8	46	24	11.01	770	54.12	Spartan-7
Prop.		27	45	4.906	353.232	–
[24]		1018	972	13.00	25,870	91.52	Virtex-4
[11]	24	360	184	12.51	6805	67.78	Virtex-4
[12]		522	N/A	10.15	5298	58.63	Virtex-4
Prop.		79	130	10.488	2191.992	–	Zynq UltraScale+
[25]		5501	2354	20.56	161,498	74.71
[11]	113	3792	1084	10.52	51,295	20.40	Artix-7
Prop.		355	585	43.434	40,827.96	–

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Haroon, F.; Li, H. Efficient and Low-Cost Modular Polynomial Multiplier for WSN Security. J. Sens. Actuator Netw. 2025, 14, 86. https://doi.org/10.3390/jsan14050086

AMA Style

Haroon F, Li H. Efficient and Low-Cost Modular Polynomial Multiplier for WSN Security. Journal of Sensor and Actuator Networks. 2025; 14(5):86. https://doi.org/10.3390/jsan14050086

Chicago/Turabian Style

Haroon, Fariha, and Hua Li. 2025. "Efficient and Low-Cost Modular Polynomial Multiplier for WSN Security" Journal of Sensor and Actuator Networks 14, no. 5: 86. https://doi.org/10.3390/jsan14050086

APA Style

Haroon, F., & Li, H. (2025). Efficient and Low-Cost Modular Polynomial Multiplier for WSN Security. Journal of Sensor and Actuator Networks, 14(5), 86. https://doi.org/10.3390/jsan14050086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient and Low-Cost Modular Polynomial Multiplier for WSN Security

Abstract

1. Introduction

2. Background

2.1. Schoolbook Polynomial Multiplication

2.2. Karatsuba Polynomial Multiplication

3. Proposed Reconfigurable Karatsuba Modular Polynomial Multiplier

4. Performance Comparison

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI