1. Introduction
Currently, with the rapid development of the Internet of Things (IoT), an increasing number of vehicles are becoming intelligent, giving rise to the Internet of Vehicles (IoV). Within the same regional network, different IoV components exchange information through their self-organizing networks. However, IoV faces numerous challenges, such as communication delays, resource limitations, and complex communication environments. Efficiently verifying a large number of signatures using edge computing resources, while ensuring the accuracy and security of the information, is therefore a critical challenge [
1,
2]. To address this problem, current methods for accelerating signature verification primarily fall into two categories: (1) Compression Techniques: These methods aim to reduce the length of information to accelerate computation, such as through aggregate signature technology. (2) Algorithm Optimization: These methods focus on hardware and software optimization of the algorithms involved in signature verification, such as optimizing elliptic curve algorithms and leveraging Field-Programmable Gate Array (FPGA) hardware acceleration. However, aggregate signatures still face several issues, such as relatively long verification times and the potential for batch signature invalidation. Techniques such as parallel point multiplication acceleration and fault-tolerant aggregate signatures can effectively address these issues.
In order to address the challenge of efficiently verifying a large number of signatures, various scholars have proposed their own schemes based on different computational hardness problems and from distinct perspectives for improvement. Xie [
3] improved the CLAS aggregate signature scheme based on the Elliptic Curve Discrete Logarithm Problem (ECDLP). While this reduced the time complexity of various stages of the aggregate signature process, it did not adequately address the issues of numerous point multiplication operations and fault tolerance. An [
4] modified the SM9 algorithm to support aggregate signatures, improving its performance; however, the number of point multiplication operations remains high, and fault tolerance issues were not resolved. Mei [
5] designed an aggregate signature scheme aimed at addressing location privacy concerns in vehicular networks, effectively ensuring the integrity and authenticity of information, but further improvements in fault tolerance and elliptic curve acceleration remain possible. Ran [
6] further enhanced the elliptic curve-based aggregate signature scheme, making it more suitable for IoT applications, yet fault tolerance was not considered. Yang [
7] applied aggregate signature modifications to the Schnorr signature to adapt it to wireless communication networks in the medical IoT environment, but these modifications introduced a significant number of elliptic curve operations, leaving room for optimization in terms of efficiency. Fu [
8], to solve the problem of numerous point multiplication operations in aggregate signatures, accelerated these operations using FPGA, but the fault tolerance issue was still unresolved. To tackle the fault tolerance problem in aggregate signatures, Hartung [
9] proposed the concept of fault-tolerant aggregate signatures and provided theoretical derivations, but this concept has not been applied to practical schemes. Therefore, for the efficient optimization of aggregate signatures, accelerating point multiplication operations and introducing fault tolerance are crucial.
In current aggregate signature schemes, the point multiplication operation is a critical path that is relatively time consuming. There are various methods for point multiplication, such as the binary expansion method, sliding window method [
10], and window non-adjacent form (w-NAF) method [
11]. However, there is still room for optimization in both software and hardware implementations. To improve the efficiency of point multiplication algorithms, many scholars have conducted research from different perspectives. In software, Hai et al. [
12] proposed replacing odd numbers with prime numbers during the pre-computation phase and constructing micro multi-base chains to compensate for the differences between primes and odd numbers, reducing the computational complexity of scalar multiplication by up to 77.38%. Zhao et al. [
13] further optimized the computational complexity by replacing the odd numbers in the base chains of the w-NAF chain with the form of
. In hardware: Bellemou et al. [
14] used the Montgomery power ladder binary method and an improved Radix-
Montgomery modular multiplication to optimize elliptic curve point multiplication, achieving a computation time of 14.32 ms and balancing security and efficiency. Yue Hao et al. [
15], based on the Zynq platform, utilized the Montgomery ladder algorithm and adaptively modified point addition and point double, optimizing the computation time to 1.80 ms. Islam et al. [
16], also using the Montgomery ladder algorithm on the Virtex-7 platform, proposed a Radix-2 modular multiplication architecture on the Edwards curve, optimizing the computation time to 1.63 ms.
The SM9 [
17] identity-based encryption algorithm was proposed in 2016 and officially became an international standard in 2021. Numerous scholars have researched and applied this algorithm. For instance, as mentioned earlier, the SM9 aggregate signature proposed by An Tao [
4] added aggregation functionality to the SM9 signature. Additionally, Liu et al. [
18] combined threshold signatures, ring signatures, and SM9 to achieve higher security but did not optimize for time consumption. Liu et al. [
19] proposed a two-party cooperative signature based on SM9 and applied it to smart homes, although there remains significant room for optimization in verification efficiency. Jing et al. [
20] introduced a four-stage pipeline improved modular multiplication algorithm and enhanced point addition and point double algorithms. Tested on the Zynq FPGA platform based on SM9, the point multiplication algorithm achieved a performance of 1179 operations per second, though further improvements could be made in utilizing FPGA board resources. Wang et al. [
21] designed highly parallel computational unit structures and optimized
and
domain point addition, point double, and line functions. Based on the SM9 algorithm, they designed optimal ATE computation units on the Virtex-7 FPGA platform, reducing computation time to 3.43 ms, which is 20% of the time consumed by similar designs. Cheng [
22] reduced the hardware cost of the SM9 algorithm by simplifying the Frobenius map and implemented its parallel structure. However, FPGA optimization for bilinear pairings is quite resource intensive, and optimizing point multiplication operations is crucial for improving verification efficiency in aggregate signature verification. Therefore, how to fully utilize edge resources to accelerate point multiplication and apply it to specific schemes is a critical issue.
In our work, (1) we design a comprehensive scheme for SM9 aggregate signatures based on FPGA. (2) We employ a parallel hardware structure to accelerate the computation speed of aggregate signature verification. (3) We utilize an improved Montgomery ladder algorithm to compute domain point multiplications in SM9 verification and optimize the Barrett modular reduction algorithm to better suit the characteristics of SM9. (4) We implement a fault-tolerant mechanism within the aggregate signature scheme and analyze and prove the scheme’s security based on the K-ary Computational Additive Diffie–Hellman (K-CAA) difficult problem.
The rest of the paper is organized as follows: In
Section 2, we introduce the preliminary knowledge and related work pertinent to our work, mainly including the SM9 signature algorithm, aggregate signatures, K-CAA problem, and fault tolerance mechanisms. In
Section 3, we describe the overall architecture of our proposed scheme In
Section 4, we present the hardware design, optimization strategies employed, and the fault-tolerant mechanism design. In
Section 5, we analyze the experimental results. In
Section 6, we provide a conclusion and outlook for future work.
2. Related Work
Our work primarily involves the SM9 signature and verification algorithm, aggregate signatures, the K-CAA problem in hardness issues, and fault tolerance mechanisms. The background and related work are introduced in four parts below.
2.1. SM9 Signature Algorithm
As shown in
Figure 1, the SM9 algorithm can be divided into four levels in terms of functionality and complexity. The first level is the SM9 protocol layer, which includes the signature algorithm, key exchange protocol, and others. The second level consists of bilinear pairing and point multiplication, which are more complex functional components based on elliptic curve operations. The bilinear pairing
operation rules are quite special and can be referenced in [
23]. The first level achieves its functionality by invoking these components. The third level includes point addition and point double, which are the basic elliptic curve operations. The fourth level comprises modular addition, modular inversion, modular multiplication, and modular subtraction over finite fields [
24].
2.2. Aggregate Signature
The current focus of research on encryption mechanisms is on computational efficiency and memory optimization [
25]. To address these issues, many aggregate signature schemes have been developed [
2,
3,
4,
5,
6,
7]. Most aggregate signature algorithms involve the following steps: sign, single signature verification, aggregate signature generation, and aggregate signature verification. Below are the descriptions and explanations of these algorithms:
- (a)
Sign: The user, holding the signature private key issued by the Key Generation Center (KGC), generates a signature for the message they intend to send.
- (b)
Single Signature Verification: The verifier, holding the signature public key , receives the message and uses the public key , the message , and the signature to perform the verification.
- (c)
Aggregate Signature: The aggregator, holding the signature public keys for multiple messages, the signatures of these multiple messages, and other accompanying information sent, aggregates the signatures to obtain an aggregate signature along with the packaged messages and accompanying information .
- (d)
Aggregate Signature Verification: The verifier, upon receiving the aggregate signature , the packaged messages , and the accompanying information sent by the aggregator, uses the corresponding public keys to perform the verification.
These steps are the most time-consuming parts of the aggregate signature mechanism. This is particularly true when an error occurs in the aggregate signature, as it requires individual verification of each signature [
26], leading to a significant amount of time spent on single signature verifications.
2.3. K-CAA Problem
At present, the main computational hardness assumption problems for elliptic curves include the Computational Diffie–Hellman (CDH) problem [
25] and ECDLP [
3]. In elliptic curve aggregate signatures, the commonly used hardness assumption is the K-CAA problem. Below is the definition of the K-CAA problem.
Let be a cyclic group of order , where is a large prime number. For an integer and unknown , given , and , compute , where .
2.4. Fault Tolerance Mechanism
To address the significant time consumption caused by verifying individual signatures due to aggregated erroneous signatures, Hartung et al. [
9] proposed the concept of fault-tolerant aggregate signatures and conducted theoretical derivations. References [
27,
28] further extended these schemes based on non-overlapping sets using compression and nesting methods, respectively. However, these schemes are very challenging to implementing in practice. Wang et al. [
26] proposed a new fault-tolerant scheme based on
combinatorial theory and validated it theoretically. We adopt
approach. Below is an introduction to the theoretical framework. Given a set
and its subsets
, these subsets satisfy
, where
. Additionally, these subsets meet the following conditions: (1) The sizes of these subsets are all equal. (2) For any
subsets
, the union of
is
. (3) For any
distinct subsets, the union of any
subsets contains at least one element
that is not included in the union.
Cao et al. [
29] proposed a method suitable for constructing a structure of
. First, the set
is evenly divided into
groups, each including
subsets
. The format of the subsets is shown in Equation (1).
where
.
Below is an example. Suppose we have a set ; then, . It is straightforward to set , which satisfies the condition . Then, dividing these 5 subsets into 10 groups can be achieved by arranging them into 10 different sets of numbers, that is, .
Based on the requirements for
mentioned above, we can derive the specific subsets
, as shown in Equation (2):
The specific construction method is referred to in references [
26,
29].
3. Entire Structure of the Scheme
In our work, there are two available modes: the classic aggregate signature mode and the fault-tolerant aggregate signature mode. The classic aggregate signature mode is suitable for scenarios with minimal network errors, while the fault-tolerant aggregate signature mode can handle situations with a certain error rate. The choice between these two methods only requires the Road-Side Unit (RSU) to decide based on its own situation with little impact on the perception of the authentication center and vehicles in the network.
As shown in
Figure 2, the entire scheme is illustrated. While vehicles are driving on the road, they collect information, which is then aggregated by a randomly selected vehicle responsible for gathering the aggregated information, and subsequently sent to the RSU. The RSU initially processes the information using its CPU before passing it to the attached FPGA device for further processing. After the FPGA processes the information, it is sent back to the CPU for final processing. Once the RSU has processed the information, it is fed back to the vehicle cluster and the server. This scheme is an improvement based on the systems described in references [
4,
30]. The processes of system establishment, master key generation, vehicle registration, and key distribution are the same as those in the aforementioned references. The following sections will describe the main parts and the improved processes, including the sign, single signature verification, aggregate signature generation, and aggregate signature verification.
First, the main parameters required for the efficient SM9 aggregate signature scheme need to be explained, as shown in
Table 1. The public key, private key, and pseudonym of the vehicle are assigned by the authentication center.
The following is the overall process of the efficient SM9 aggregate signature scheme. This section is divided into four parts: sign, single signature verification, fault-tolerant aggregate signature, and fault-tolerant aggregate verification.
- (1)
Sign: , where is the message to be sent.
- (a)
Compute ;
- (b)
Randomly select , and compute ;
- (c)
Compute . If , reselect the random number; otherwise, ;
- (d)
Compute ;
- (e)
Package and send .
- (2)
Single signature verification: To verify the received message packet , the verification process is as follows:
- (a)
Compute ;
- (b)
Compute , verify whether is equal to . If they are not equal, the verification fails.
- (3)
Fault-tolerant aggregate signature: The RSU can negotiate with the selected vehicle OBU based on its own situation to adopt the following non-fault-tolerant or fault-tolerant aggregate signature modes.
- (a)
Non-fault-tolerant aggregate signature mode;
The OBU of the randomly selected vehicle acts as the signature aggregator. The set of pseudonyms of the received vehicles is , the corresponding set of messages is , and the received digital signatures are . The signature aggregator computes . The aggregate signature is .
- (b)
Fault-tolerant aggregate signature mode.
The OBU of the randomly selected vehicle acts as the signature aggregator, performing fault-tolerant aggregation processing on the received signature set. The set of pseudonyms of the received vehicles is
, the corresponding set of messages is
, and the received digital signatures are
. Due to the limited resources of the RSU, the received digital signatures and messages are divided into
. According to the lemma mentioned in [
26], there are a total of nine subsets, each containing eight elements. These are defined as
. The fault-tolerant aggregate signature set composition is represented as
, where
is not specifically the first eight received digital signatures but rather a generic representation. Suppose the digital signature set
forms the combination elements of
; then, the aggregation method of
is shown in Equations (3)–(5).
- (4)
Fault-tolerant aggregate verification: The RSU selects the verification mode based on the previously negotiated mode.
- (a)
Non-fault-tolerant verification mode;
The RSU receives the message set
and the aggregate signature
sent by the vehicle OBU, computation
and verification Equation (6).
- (b)
Fault-tolerant verification mode.
After receiving the message set
and the fault-tolerant aggregate signature
sent by the signature aggregator, the RSU computes
and verifies each fault-tolerant aggregate signature using Equation (7), where
is the number of signatures that constitute a single fault-tolerant aggregate signature set.
It can be noted that there are a large number of point multiplication operations in Equations (6) and (7). Regardless of whether it is in the non-fault-tolerant aggregation or fault-tolerant aggregation process, the number of point multiplication operations will increase with the number of signatures, and point multiplication is a quite time-consuming operation [
31]. Therefore, the time consumption will surge. Consequently, the point multiplication operations in the SM9 efficient aggregate signature scheme can be offloaded to the FPGA board for computation.
4. Hardware Improved Acceleration Structure
The overall structure of the hardware acceleration for the efficient SM9 aggregate signature scheme is shown in
Figure 3. The architecture consists of a CPU connected to multiple FPGA boards with information exchanged via the PCIe bus. The CPU is responsible for bilinear pairing operations involved in the SM9 algorithm, the configuration of parallel point multiplication parameters in SM9, and the final verification of the results. The FPGA boards are responsible for parallel acceleration of the point multiplication operations.
As illustrated in
Figure 3, a master state machine is set up to control the data flow within each point multiplication unit and handle the subsequent processing of the computation results. The state transition diagram of the master state machine is shown in
Figure 4.
In the Idle state, parameters are read, and the state machine is initialized. When the start signal is valid, the state machine updates multiple point multiplication parameter registers and transitions to the PONIT_MUL_DSB state; otherwise, it remains in Idle. Upon transitioning to the PONIT_MUL_DSB state, the state machine reads the multiple point multiplication tasks and assigns each individual point multiplication task to the corresponding point multiplication unit. Once the task distribution is completed, the state transitions to the PONIT_MUL_WAIT state. In the PONIT_MUL_WAIT state, after all point multiplication tasks are completed, the state machine reads all the computed results and transitions to the COORDINATE_CONVERT state. In the COORDINATE_CONVERT state, the state machine transfers all point multiplication results to the coordinate conversion unit for merging and coordinate transformation. Then, it transitions to the CONVERT_WAIT state. While in the CONVERT_WAIT state, the state machine waits for the coordinate conversion unit to complete its computation. After the coordinate conversion is completed, the state machine reads the results and transitions to the Output state to output the computed results. Finally, the state transitions back to the Idle state.
For each point multiplication unit, the state machine is responsible for controlling the data flow between the point addition and point doubling units, achieving unit reuse. Additionally, by configuring multiple point multiplication modules, the on-board resources can be fully utilized to accelerate large-scale point multiplication operations.
4.1. Montgomery Point Multiplication
Montgomery point multiplication is currently the most efficient point multiplication algorithm, with a time complexity superior to existing methods such as binary expansion, non-adjacent form (NAF), and NAF-related improved algorithms. Montgomery point multiplication requires a base point
, and subsequently calculates
and
, where
. Then, based on
,
and the multiplication scalar
, the point multiplication result is computed. The computation process is shown in Algorithm 1. Additionally, it can resist simple power analysis attacks.
Algorithm 1: Montgomery Point Multiplication |
Input:
|
Output:
|
- 1.
|
- 2.
do
|
- 3.
then ;
|
- 4.
then ;
|
- 5.
;
|
- 6.
end if
|
- 7.
end while
|
- 8.
|
4.2. Optimization of Finite Field Operational Units
Finite field arithmetic units are the fundamental components of elliptic curve operations, and they include modular addition, modular subtraction, modular multiplication, and modular inversion modules. The performance of these finite field arithmetic units is crucial for the computation speed of point multiplication algorithms on the FPGA.
4.2.1. Design of Modular Addition and Subtraction Units
To optimize the use of on-board resources and reduce the area, modular addition and subtraction are implemented in a combined manner using pure combinational logic. This module performs calculations based on the initial input modulus
and mode selection
. Since modular addition can result in overflow and modular subtraction can result in negative outcomes, these situations need to be handled accordingly. The optimized computation method is shown in Algorithm 2.
Algorithm 2: Modular Addition and Subtraction Algorithm |
Input:
|
Output:
|
1. begin |
2.
|
3. end else begin |
4.
|
5. end |
6. |
7. |
8. |
9. begin |
10. begin |
11.
|
12. end else begin |
13.
|
14. end |
15. end else begin |
16. begin |
17. |
18. end else begin |
19.
|
20. end |
21. end |
As shown in
Figure 5, the hardware structure of the modular addition and subtraction is composed of pure combinational logic, forming a parallel circuit structure. The computation is completed within a single clock cycle.
4.2.2. Design of the Modular Inversion Unit
Currently, the algorithms for computing modular inverses include Fermat’s Little Theorem, the Montgomery algorithm, the Extended Euclidean algorithm, and the binary modular inversion algorithm. Fermat’s Little Theorem requires modular exponentiation, while the Montgomery algorithm needs two domain transformations to obtain the final modular inverse result. Additionally, the original Extended Euclidean algorithm involves division in each step of the multiplication operations, which is highly costly [
32]. Therefore, a more hardware-suitable binary modular inversion algorithm is adopted. The computation process of the binary modular inversion algorithm is shown in Algorithm 3.
Algorithm 3: Modular Inverse Algorithm |
Input:
|
Output:
|
1.
|
2. do |
3. then |
4. ; ; |
5. end if |
6. then |
7. ; ; |
8. end if |
9. then |
10. ; ; |
11. end if |
12. ) then |
13. ; ; |
14. end if |
15. ) then |
16. then ; ;else |
17. ; ; |
18. end if |
19 end if |
20. end while |
21. then end if |
22. ) do |
23.
|
24. end while |
25. ; |
4.2.3. Design of the Fast Modular Multiplication Unit
In the computation of elliptic curve finite fields, modular multiplication is the most critical part affecting performance. Modular multiplication includes the Montgomery multiplication algorithm, interleaved modular multiplication algorithm, and Barrett modular multiplication algorithm. Although Montgomery multiplication offers good generality and flexibility, it requires a domain transformation of data. The interleaved modular multiplication involves many iterations, leading to longer computation time. The Barrett modular multiplication algorithm, on the other hand, can achieve low-cost computation for any modulus with precomputed parameters. Therefore, this section uses and Barrett modular reduction algorithm to implement the Barrett fast modular reduction algorithm.
- (1)
KOA Fast Multiplication
The main idea of KOA multiplication is to use a recursive approach to reduce multiple lower-complexity multiplications into a higher-complexity multiplication. This results in a faster and more efficient multiplication algorithm. In the computation of KOA, some parameters need to be defined first. For an
bit number
, it can be represented in binary as
,
, where
is the higher-order part and
is the lower-order part. Using the above definitions, the two numbers
and
to be multiplied using KOA multiplication can be represented as
For
, the result of the computation is
while
can be represented as
From the above, it is clear that KOA multiplication only requires the computation of , and , further reducing the computational complexity.
In SM9, all operations are based on 256-bit numbers. To better utilize the FPGA’s on-board resources, a double recursion method can be used to divide a 256-bit multiplication into multiple 64-bit multiplications. Additionally, the multiplications
of
and
can be accomplished through bit shifting. The KOA multiplication process is shown in Algorithm 4.
Algorithm 4: KOA Multiplication |
Input: , |
Output:
|
1. then return
|
2. end if |
3. ; |
4. ; |
5. ; |
6. ; |
7. ; |
8. ; |
9. ; |
To further leverage the performance of the KOA algorithm on the FPGA, the computation cycle of the DSP 64-bit multipliers used in the component design is set to 0. Wire-type variables are used to connect the components within the module, achieving the goal of immediate output upon input. A register is then set at the end to divide the clock cycle. As shown in
Figure 6, the schematic diagram of the 256-bit KOA algorithm expansion is illustrated, where H and L represent the higher-order and lower-order bits of the further split data. Finally, a 512-bit multiplication result is output through an adder.
- (2)
Barret Fast Multiplication
For any two positive integers
and
, there exist two numbers
and
such that the following equation holds:
Thus,
, where
is the modulus for the modular operation. However, finding such numbers
and
requires high-cost division operations. But when the modulus is a fixed value, the Barrett algorithm can be used to perform modular reduction using multiplication and shift operations [
33]. Let
to obtain the result of
. We can set
, then
, and we have
, which is the result of
.
First, we calculate the approximate value
of
. Let
, where
is the bit length of the modulus
in its binary representation. Then,
. The computation of
thus transforms into a combination of shifts and multiplications. For the calculation of
, this value is precomputed in software and then input
into the FPGA. In practical applications, once the modulus
is selected, it is typically treated as a system constant. Therefore,
can also be treated as a system constant. The Barrett modular reduction process is shown in Algorithm 5.
Algorithm 5: Barrett Reduction |
Input: , |
Output:
|
1. ; |
2. ; |
3. )then |
4. ; |
5. ; |
6. end if |
7. ; 8. ; 9. ; 10. then ; 11. end if 12. return c; |
For the multiplications involved, KOA multiplication is used. Since can be up to 257 bits, steps 3 to 6 handle this case. The final computed , as it ranges from 0 to , needs to be checked and adjusted if necessary. Steps 9 to 11 handle this process.
4.3. Optimization of Point Addition and Point Double Modules
In elliptic curve point multiplication, the use of different coordinate systems results in varying computational complexities. For instance, operations in the affine coordinate system involve modular inversion, which is computationally expensive. In our work, we employ the standard projective coordinate system for the computations. In the standard projective coordinate system, modular inversion is required only when converting back to the affine coordinate system. Additionally, in the Montgomery point multiplication process within the standard projective coordinate system, computations are simplified as they only need to be performed on the coordinates and , thus reducing the computational burden.
Consider points
and
in the standard projective coordinate system. The point addition formulas are given by Equations (13) and (14).
The point double operation formulas are given by Equations (15) and (16).
After completing the computations, the results are converted back to the affine coordinate system. The conversion methods are given by Equations (17) and (18).
The resulting are the converted results in the affine coordinate system.
Data Flow Optimization
To maximize the efficient use of on-board resources and complete point addition and point double in the shortest possible time, we optimize the data flow for state machine operations and Montgomery point multiplication. The computation processes and data flows for point addition and point double are shown in
Table 2 and
Table 3, respectively.
In the design presented in our work, the point addition and point double modules operate in parallel during the point multiplication process. Therefore, the total computation cycle count is determined by the module with the longest cycle duration. Modules with shorter cycles will idle, ensuring synchronization and smooth operation of the point multiplication state machine. Additionally, the multiplication operations utilize a fast modular multiplication unit with a computation cycle of seven cycles, while the modular addition and subtraction units each have a computation cycle of one cycle. Due to the presence of the state machine and synchronization stages, both the point addition and point double modules have an operation cycle of 102 cycles.
4.4. Optimization of Coordinate Transformation
After the point multiplication operation, the results need to be aggregated and coordinate conversion must be performed. In the design of this module, a modular inversion module and a point addition module are used with a state machine controlling the data flow and the reuse of these modules. The coordinate conversion method is described in Algorithm 6, where
are the results output by the eight parallel point multiplication modules. The function Point_add is a wrapper for the point addition module, and
is a wrapper for the modular inversion module. The output result is
.
Algorithm 6: Coordinate Transformation |
Input: , |
Output:
|
1. i = 1; |
2. |
3. ) do |
4. ) |
5. i = i+1; |
6. end while |
7.
|
8.
|
9.
|
10.
|
11. ; |
12. ; |
13.
|
14.
|
15.
|
16.
|
17.
|
18.
|
19.
|
20.
|
21.
|
22.
|
6. Conclusions
Considering the complex network environment and high-performance requirements of vehicular networks, we design an efficient aggregate signature algorithm with two modes based on the SM9 aggregate signature algorithm, FPGA acceleration technology, and the uniform(k,n) theory. The security of the proposed algorithm is also analyzed. For the numerous elliptic curve point multiplication operations involved, a parallel point multiplication architecture based on FPGA is designed. Single-point multiplication uses the Montgomery point algorithm, and the data flow of the point addition and point double modules is optimized. Key modules for modular addition/subtraction, multiplication, and inversion are designed using combined modular addition/subtraction, KOA and Barret algorithms, and the binary modular inversion algorithm, respectively. The parallel point multiplication architecture is applied to the SM9 efficient aggregate signature scheme, and its effectiveness is verified through simulation and on-board experiments. A comprehensive analysis of the proposed scheme’s performance is also conducted. The highly parallel elliptic curve point multiplication acceleration module designed on the FPGA platform is applied to both non-fault-tolerant and fault-tolerant modes, demonstrating good performance in the low-latency and complex network environment scenarios of vehicular networks. Compared to similar schemes, the proposed scheme not only achieves higher operational efficiency but also incorporates fault-tolerant features.
The next step could explore the hardware security design of the SM9 parallel point multiplication architecture to prevent side-channel attacks. This involves ensuring high performance while preventing the leakage of computational information. Additionally, developing new fault-tolerance theories to further optimize the construction speed of fault-tolerant sets can be investigated.