GSFedSec: Group Signature-Based Secure Aggregation for Privacy Preservation in Federated Learning

Kanchan, Sneha; Jang, Jae Won; Yoon, Jun Yong; Choi, Bong Jun

doi:10.3390/app14177993

Open AccessArticle

GSFedSec: Group Signature-Based Secure Aggregation for Privacy Preservation in Federated Learning

¹

Department of Internet Engineering and Computer Science, Universiti Tunku Abdul Rahman, Kajang 43000, Selangor, Malaysia

²

Department of Computer Science, Soongsil University, Seoul 06978, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7993; https://doi.org/10.3390/app14177993

Submission received: 27 July 2024 / Revised: 25 August 2024 / Accepted: 2 September 2024 / Published: 6 September 2024

(This article belongs to the Special Issue Information Security and Privacy Protection in Networks and Communication)

Download

Browse Figures

Versions Notes

Abstract

:

Privacy must be preserved when working with client data in machine learning. Federated learning (FL) provides a way to preserve user data privacy by aggregating locally trained models without sharing the user data. Still, the privacy of user identity is not preserved. Secure aggregation is a popular technique in FL for aggregating gradients without disclosing individual data. However, it is costly and inaccurate. Therefore, we propose a novel, scalable, cost-effective group signature-based secure aggregation algorithm in FL, called GSFedSec, where secure aggregation helps conceal the user’s update while the group signature helps conceal their identity. Our algorithm preserves the data and their source. Our simulation results show that the proposed algorithm does not suffer from a loss of accuracy, handles increases in network size competently, offers computational and communication efficiency, and is secure against various security attacks. We have compared the results of efficiency and security against existing algorithms in FL. Also, the security of the algorithm is verified using Burrows–Abadi–Needham (BAN) logic and simulated via the Automated Validation of Internet Security Protocols and Applications (AVISPA) protocol.

Keywords:

data privacy; user privacy; federated learning; secure aggregation; efficiency; group signature; AVISPA

1. Introduction

Federated learning (FL) is a new paradigm of artificial intelligence in which machines collectively learn without compromising the privacy of owner data [1]. Only model updates are shared with the central server for aggregation in FL. In contrast, the traditional machine learning (ML) approach requires all training data to be in one place. This can increase the risk of private and confidential data being leaked [2]. In the Internet era, where many of the data are online, protecting personal information has become more challenging [3,4]. Hence, FL was proposed in [5]. Decentralized clients train their data on their own devices, and only updated gradients are sent to the server for aggregation, and the model is updated and redistributed to all clients. Thus, the server does not know the actual data. The goal of FL is to allow clients to perform ML on decentralized data without compromising their privacy [6].

Although FL promises data secrecy (because only updates are shared in the network), new security issues associated with this learning model may hinder its widespread acceptance [7,8]. Hence, it is crucial to identify the new risks and limitations involved. For example, attackers may know the previous model, allowing them to monitor traffic and extract personal information about the end-user or leak raw data during communication. These types of attacks on personal data constitute security breaches and compromise privacy. In Figure 1, we show the likelihood of privacy breaches at various stages of FL. The Table 1 shows the symbols which will be used throughout the paper.

FL uses a decentralized network that is prone to various attacks (e.g., user privacy, denial-of-service, inference, and data poisoning) [9]. Moreover, if the network is compromised, an adversary can generate a large amount of traffic, disturbing the communication between a particular client and the server, perform a man-in-the-middle attack to prevent data transmission from client to server, or generate other restrictions that prevent the model from learning. If an attacker controls the model, they can infer the training and use it for personal advantage, attempt to bias the model to promote inferences in favor of themselves, or bias the model training, therefore damaging the model. Hence, security, especially privacy, is the most crucial issue in FL. If user or data privacy is not ensured, the server can collect sensitive information via an inference attack, and private data can be reconstructed by inverting the updated gradients (https://www.kdnuggets.com/2020/08/breaking-privacy-federated-learning.html, accessed on 5 June 2024). Hence, the security of the data and the privacy of their source must be ensured, even in FL.

Three popular techniques are available to provide privacy in FL: secure aggregation (SA) (using masking), differential privacy (DP), and homomorphic encryption (HE) [10,11,12]. SA is very costly (as explained later in this article); DP adds noise to the data, therefore reducing its accuracy; HE has a relatively high complexity, raising concerns regarding the feasibility of its real-time implementation. The properties of these schemes have been given in Table 2, in which schemes are compared based on their advantages and disadvantages.

Group signature was first introduced to FL in 2021 to solve the issue of privacy preservation for users in the network [13]. The main purpose of using a group signature (GS) is to preserve a sender’s privacy. The idea is to hide the individual signature so the receiver can verify the source’s identity without explicitly knowing it [14]. The GS must be changed if membership is revoked or a member willingly leaves the group. Otherwise, the former member can misuse the GS for their advantage [15]. The membership table is updated with a list of existing members and ex-members. Only authenticated members can sign a message using the GS; thus, the GS maintains its privacy with proper authentication. It achieves the main aim of FL: to preserve the user’s privacy. If the user’s identity is not revealed to the receiver, personal information linked to the user can be protected.

In this study, we aim to enhance the privacy of clients in the FL model using GS while also preserving updates using SA. In a conventional SA, clients share secret information with all other participating clients in the network, and the server aggregates them with client updates. All secret keys cancel each other out during the SA process. Each client must communicate with others selected for the FL process, significantly increasing the computation and communication costs. Signaling overheads also increase significantly, complicating the entire information collection process. As indicated in [12], this basic SA model is not scalable.

Therefore, we adopted a GS-based SA technique, in which a secret key is shared with the participating clients by the admin of the group while sharing the GS. Hence, the most trusted party, the admin, must compute the secret key only once. This reduces the computational cost of each client and the communication costs of sending messages to every other client on the network. After adding the secret keys to their updates, clients send their updates by signing them with a GS. When the server receives a signed message, it verifies the GS and aggregates the secured message.

Group signature scheme has its own limitations. The size of the signature significantly depends upon the size of the group. If the group size increases, the GS size also increases. Verification and key management complexity also increase with the increase in group size. However, this increase in complexity is still very minimal when compared with the increase in complexity of the SA algorithm. Increase in one member in SA increases (n + 1) communications. It also increases the computational overhead for each client to generate the secret key part for the new member. There are various factors explained later in this article that prove that SA is not scalable. Moreover, an increase in signature size depends upon the increase in group size. It does not depend on how many clients are selected for the FL process. However, an increase in members for the FL process affects the performance of SA significantly. Hence, GS performs better than existing algorithms even when group size increases. This is explained later in Performance analysis, Section 5.

To validate the secrecy, confidentiality, and other security aspects of the method, we verify it using a mathematical logic tool: Burrows–Abadi–Needham (BAN) logic. This operates on a set of rules that we define and helps determine the likelihood of attacks in our algorithm. We also simulate our algorithm using the Automated Validation of Internet Security Protocols and Applications (AVISPA) [16] and High-level Protocol Specification Language (HLPSL). AVISPA is a robust tool that uses HLPSL to specify algorithms and their security properties, and it integrates different backend tools to provide automated validation. We also compare our results with existing algorithms in terms of their cost, size, and security.

The contributions of this study are outlined below:

Protection of data and user privacy: We introduce a novel SA with GS to provide privacy to clients and their data. Contrary to data privacy or user privacy algorithms, this is the first work to address data security and user privacy simultaneously. The server cannot know the identity of the user because of GS, and it cannot guess the previous sequence of updates because of SA. Our work combines the best features of both technologies to produce an algorithm that completely mitigates inference attacks.
Improved computation and communication efficiency: Our algorithm has much lower computation and communication costs than existing algorithms. This is achieved by eliminating the need for peer-to-peer communication during key exchanges in the existing SA model. We provide a detailed analysis of the computation, communication, and signaling costs via comparisons with the existing algorithms.
Security proof of FL algorithm using BAN logic and AVISPA: We provide a formal mathematical proof using BAN logic and simulation analyses using AVISPA.

This paper is based on the assumption that the group admin is a trusted entity, like a certificate authority. This assumption is commonly accepted in group signature-based research. Clients share their keys with the admin to obtain the necessary group keys. The server also communicates with the admin to verify the latest group signature. Hence, the admin is an important entity that is assumed to be trusted and available during the entire FL process.

In Section 1, we introduce our work, the motivation behind our work, and our contribution to preserving the privacy of users and data in FL. Table 1 defines all the symbols used throughout the paper. The remainder of this paper is organized as follows: Section 2 presents a literature survey regarding FL and the techniques proposed by various authors. The methodology and proposed algorithm are presented in Section 3. Section 4 presents an analysis of the proposed algorithm’s security; a security analysis and scope comparison are also presented in this section. Section 5 compares the computation and communication costs of our algorithm with those of the existing algorithms. Section 6 concludes the paper and provides the scope for future research in this area.

2. Related Work

Brendan et al. [5] proposed FedAvg as a method for aggregating the learned gradients collected from various systems. It primarily focuses on efficiency through the availability of clients and limited communications. However, it suffers from high computational and communication costs because of client-to-client communication. The major acknowledged work on privacy preservation in FL is the SA proposed in [12]. This approach uses a shared secret key to aggregate client data and, therefore, prevent violations. In the first two rounds, the preparation phase and shared secret keys are generated. In the next round, the devices send cryptographically masked updates, and the server accumulates the sum thereof. All participating device updates are included. In the final round, the model reveals sufficient secret information from the shared secret key to allow the server to unmask the aggregate model. Only devices that have sent their data need to complete this round. This SA is slightly more costly than normal aggregation; however, its costs significantly increase with the number of users, which limits the maximum number of users. SA ensures that the server knows only that one of the users wrote the data without confirming which user wrote them. The main aim of this SA is to prevent users from revealing their identities to the server. The major disadvantages of this method are as follows:

The need for client-to-client communication increases the overall cost.
Members participating in masking must create and send everyone’s share of the secret information. If the minimum number of secrets required is not shared, unmasking cannot be performed.
If a certain number of members leave the FL process, unmasking is difficult and costly to achieve.

The literature survey below offers an introduction to the existing FL algorithms. Most of these are based on client-to-client secret-sharing methods to protect privacy. In contrast, we propose a group-based privacy preservation algorithm that is highly cost-effective. Grouping in FL is not a new process: various methods have been proposed, including stratified/cluster-based sampling groups [17], fork-merge-consolidate [18], the FedEval model [1], and group knowledge transfer [19]. However, most of these methods have categorized and collected data rather than provide security. Hence, we adopt the existing group-based approach and integrate it with the abovementioned SA to offer privacy efficiently. Other methods that offer modified SA and other security algorithms for the FL process are detailed below:

FedEval: A benchmark system with a comprehensive evaluation model for FL [1] (2020): FedEval is an easy-to-evaluate and lightweight federated benchmark system. Its designers customized the traditional message upload methods, server aggregation training method, and global-local model incorporation. The authors described FedSGD and FedAvg as the two most common methods in FL. FedSGD operates via large-batch synchronous stochastic gradient descent. FedAVG operates in small batches; it trains over various rounds and increases the number of clients in each round. Therefore, FedAvg is a more efficient method. However, it is less accurate than FedSGD and requires separate validation steps after uploading the actual updates, which increases the number of communications required. The algorithm protects the data from a gradient attack, using SA as its base; hence, it suffers from efficiency issues, as discussed at the beginning of this section.

Federated learning with differential privacy: Algorithms and performance analysis [11] (2020): This approach is based on increasing an optimal amount of the noise in the data to design a privacy-preserving FL architecture. The more noise is added, the more privacy is achieved. However, convergence performance is reduced with the increase in noise. Hence, there is always a trade-off between the convergence performance and privacy degree. The training quality is also affected by the size and distribution of the data. These are the major limitations of the approach, which leads us to think of the alternative of the technique. GS can be a better option because GS does not add any noise to the data, and so it does not affect the convergence performance of the algorithm.

Communication/computation-efficient SA for FL [20] (2021): This algorithm adopts the same four steps as the conventional scheme of SA: (1) advertising keys, (2) sharing keys, (3) mask input collection, and (4) unmasking. However, the key is shared with a subset of the clients (instead of all clients). This is more efficient than the conventional protocol because the involvement of fewer clients reduces the number of communication messages, and the overall computational cost is reduced using a smaller number of masks. However, this algorithm still requires client-to-client communication, making it less efficient when the network size is increased.

Breaking the quadratic aggregation barrier in secure FL [21] (2021): This approach uses secure multi-party computation based on cryptographic primitives, including secret sharing. Although the algorithm ensures privacy, it also suffers from high communication burdens when reconstructing secret information distributed over multiple clients.

Learning rate optimization for FL exploiting over-the-air computations [22] (2021): A dynamic learning rate has been proposed for efficient data aggregation in over-the-air computation. The proponents used parallel computing with multiple-input single-output and multiple-input multiple-outputs using ML to reduce the overall computation time. However, they only considered averaging and focused on minimizing the aggregate error. Overall, client-to-server communication and privacy were not discussed in detail.

Decentralized federated averaging: [23] (2015): The authors implemented FedAvg in a distributed manner because they assumed that attacking the central server would break the entire system’s privacy. They implemented the model on clients that were connected by an undirected graph. To further reduce the communication costs, they considered the quantized decentralized FedAvg with momentum. However, decentralized FL makes the algorithm less secure because the devices must trust one another.

FedV: Privacy-preserving FL over vertically partitioned data [24] (2021): To avoid existing client-to-client communications in horizontal FL, the authors proposed a vertical FL that used functional encryption to secure their data. This reduced the training time, and no issues arose when clients could not connect.

To the best of our knowledge, this study is the first algorithm that protects both client identification and data privacy. The server does not know which update belongs to which client, and at the same time it does not know the previous sequence of update of the current update. Hence, the server can neither predict the data nor the owner of the data, which mitigates the chances of inference attacks. Our algorithm effectively reduces the computational cost and network bandwidth while providing the necessary security features. Section 5 presents a detailed comparison between the proposed algorithm and existing methods.

3. GSFedSec: GS-Based SA for Privacy Preservation in FL

Our GS-based protocol fundamentally solves the high computational cost problems of the traditional SA using the GS, as shown in Figure 2. Each client sends data to a server and signs it with the GS without the need to exchange unmasking keys with the server and other clients. The server can check whether a given client is from a valid group, though it cannot identify individual members. Thus, the GS allows a client device to be validated at a lower cost without revealing its identity. Furthermore, if a group member exhibits suspicious behavior, it can be removed from the group. Then, the GS is updated for the remaining group members. Admin maintains a membership table and a revocation table at its side. Whenever any member joins the group, an entry is entered in the membership table. Whenever any member leaves or is revoked from the group, its entry is maintained in the revocation table. If the member has been revoked by the admin because of its malicious behavior, it is not allowed to join again. In this way, clients are encouraged to exhibit good behavior in the group; otherwise, they might be revoked from the group.

In addition, an adversary cannot target a particular model update because all are signed with the GS and secured with secret keys. Therefore, the GS-based technique offers several advantages over the traditional SA process, as follows:

GS is less costly than SA because the number of transmissions is reduced by removing the requirement for client-to-client communication.
In our algorithm, clients do not need to know each other because they do not need to communicate with each other. They only need to obtain an updated signature from the admin, and this cost is significantly small compared to that of sending messages to many nodes in the network.

The proposed protocol is improved from the short-GS algorithm presented in [25]. Group members transmit data using their GS, hiding their identities. When the server receives the data, it sees the GS, verifies that it is from a valid member, and extracts it. The server still does not know the exact identity of the sender; hence, it cannot distinguish the specific data belonging to each sender. Subsequently, it aggregates all collected data and computes a new system model. Therefore, we designed a new integrated protocol that simultaneously and efficiently ensures data and source privacy.

3.1. System Setup

Let

G_{1}

and

G_{2}

be two multiplicative cyclic groups of prime order p with

g_{1}

and

g_{2}

as their generators, respectively. These groups are isomorphic. Suppose that

\hat{e}

is a bilinear map, where

\hat{e} : G_{1} \times G_{2} \to G_{T}

and the principles of bilinearity and non-degeneracy are adhered to:

Bilinearity: $\hat{e} (u^{a}, v^{b})$ = $\hat{e} {(u, v)}^{a, b}$ , $\forall u \in G_{1}, v \in G_{2}$ , and $a, b \in Z_{p}^{*}$ .
Non-degeneracy: $\hat{e} (g_{1}, g_{2})$ = $g \neq 1_{G_{T}}$ .

Furthermore, we assume that our protocol obeys the following assumptions:

Strong Diffie–Hellman Assumption: For an unknown x and y, the adversary has no advantage in learning $g^{x y}$ , if g, $g^{x}$ , and $g^{y}$ are given (where g is the generator of the cyclic multiplicative group).
Decisional Diffie–Hellman (DDH) Problem: For an unknown x, y, and z $\in Z_{q}$ , and given $g^{x}$ , $g^{y}$ , and $g^{z}$ , decide whether $x y = z m o d q$ . This problem is not hard and can be solved in polynomial time.
Bilinear Diffie–Hellman Assumption: For an unknown x, y, and z $\in Z_{q}$ , and given $g^{x}$ , $g^{y}$ , and $g^{z}$ , the adversary has no advantage in learning $\hat{e} {(g, g)}^{x, y, z} \in G_{T}$ .
Decision Linear (DLIN) Assumption: Given $u, v, h, u^{x}, v^{y}, h^{z} \in G$ , where x, y, and z are random elements, an adversary cannot distinguish between $h^{x y}$ and a random element in polynomial time. The DLIN assumption is useful where the DDH assumption does not hold, especially in pairing-based cryptography.
Linear Encryption: This is an application of the DLIN assumption in which public keys are a triple of generators u, v, and h ∈ $G$ , and private keys are the set of exponents of $x, y \in Z_{p}$ such that $u^{x}$ = $v^{y}$ = h [26]. For encryption, we choose a and b randomly from $Z_{p}$ . The encrypted cipher text set is $(u^{a}, v^{b}, m h^{a + b})$ . This can be denoted as $(c_{1}, c_{2}, c_{3})$ . To recover the message from encryption, the user computes $c_{3} / (c_{1}^{x} c_{2}^{y})$ .

Our protocol is designed for dynamic client management. The FL server can choose any client it wants to participate. Client selection is dynamic, which means that for each iteration, the server can choose different clients from a group. Threshold-based cryptography can be used to maintain the dynamic participation of the clients in a group. The group keys are generated based on the number of clients in a group and a threshold value ‘t’ [27]. Admin generates the group key and a separate private key (

A_{i}

for each client i) for all group members. When any member joins or leaves the group, the group keys and threshold need to be modified accordingly. Admin distributes the keys to all members whenever there is a change in the group keys.

The messages in basic SA are padded with part of secret keys, which are later canceled out after aggregation. The time graph of this scheme is shown in Figure 3. Our algorithm adopts this feature but adds a group signature to avoid large computation costs, which has been shown in Figure 4. The step-by-step algorithm is defined as follows:

3.1.1. Key Generation

We adopt the short GS described in [25] for key generation. The admin is the trusted entity, and it is responsible for two types of key generation: signature key and secret key. A signature key is generated for all members of the group. We have one isomorphic bilinear group pair

(G_{1}, G_{2})

with the pair member generators

g_{1}

and

g_{2}

, respectively, and the property

g_{1} \leftarrow ψ (g_{2})

. The admin selects

h \leftarrow G_{1} {1_{G_{1}}}

,

ξ_{1}, ξ_{2} \leftarrow Z_{p}^{*}

, and

γ \leftarrow Z *_{p}

; thus, it computes

u^{ξ_{1}}

=

v^{ξ_{2}}

= h and

A_{p k}

=

g_{2}^{γ}

. Here,

A_{p k}

is the public key of the admin, and

(ξ_{1}, ξ_{2})

is the private key tuple, which is used to trace the signer in the case of any dispute.

γ

is the private key of the admin. Using this

γ

, a tuple

(A_{i}, x_{i})

for the i-th device in the group can be computed as follows:

A_{i} \leftarrow g^{1 / (γ + x_{i})} \in G_{1}

, where

x_{i}

is selected randomly from

Z_{p}^{*}

.

(A_{i}, x_{i})

is the tuple for corresponding ith device. Then, the admin publishes the group public key as

g_{p k} = (g_{1}, g_{2}, h, u, v, A_{p k})

.

After the server sends a list of clients selected for FL to the admin, the admin generates a secret key and divides it into n parts, where n is the number of clients selected for FL. This division is such that when the parts are combined, everything is canceled out. Then, the admin sends one part of the secret key

S_{i}

and

A_{i}

to each respective client i.

3.1.2. Local Optimization

All client devices who want to participate in FL send a request to the server. The server chooses only a few clients for efficiency and asks them to send local updates. Those clients who were selected for FL train the local data using the global dataset and optimize it before sending it to the server. Please note that raw data are never sent to the server, and data are always trained locally so that they differ considerably from the original data. The data are trained using the formula given in Equation (1):

ω^{k} \leftarrow ω^{k} - η ▿ F_{k} (ω_{k}),

(1)

where

η

is the fixed learning rate, k is the k-th client,

ω

is the model parameter, and

F_{k} (ω_{k})

is the local objective function of the client k.

After conducting local training on the locally collected data and optimizing it, the client adds a part of the secret key in the update to keep it secure from the server’s access.

3.1.3. Signature Generation

After the edge device receives

g_{p k}

and

x_{i}

from the admin, it can send messages by computing its GS. To formulate its GS, it must first compute the tracing, binding, and signing variables. To compute the tracing variables, we set

(c_{1}, c_{2}, c_{3})

and two helper values (

δ_{1}, δ_{2}

), as given in Equations (2a) and (2b):

c_{1} \leftarrow u^{α}, c_{2} \leftarrow v^{β}, c_{3} \leftarrow A h^{α + β},

(2a)

δ_{1} \leftarrow x α, δ_{2} \leftarrow x β \in Z_{p} .

(2b)

These tracing variables are used to detect the actual signer of the message. This tracing activity can be performed by the admin using the following formula given in Equation (3):

A_{i} \leftarrow c_{3} / (c_{1}^{ξ_{1}} c_{2}^{ξ_{2}}) .

(3)

This

A_{i}

has the corresponding value

x_{i}

(identity of the client) in the membership table, which is stored by the admin. Whenever required, the admin extracts the

x_{i}

from the table and identifies the actual signer. If the client does anything unwanted or shows malicious behavior, it can be removed by the group admin, and the group keys are updated accordingly [26]. Hence, adding the traceability feature in the group signature always keeps the clients in check. Hence, it is important to add tracing variables to the signature to make the signer traceable by the admin.

Subsequently, the device chooses

b_{α}

,

b_{β}

,

b_{x}

,

b_{θ 1}

, and

b_{θ 2}

as binding values; these are computed as follows:

Compute binding variables (Equations (4a)–(4c)):

$B_{1} \leftarrow u^{d_{α}}, B_{2} \leftarrow v^{d_{β}},$

(4a)

$B_{3} \leftarrow \hat{e} {(c_{3}, g_{2})}^{d_{x}} \hat{e} {(h, w)}^{- b_{α} - b_{β}} \hat{e} {(h, g_{2})}^{- b_{δ_{1}} - b_{δ_{2}}},$

(4b)

$B_{4} \leftarrow c_{1}^{b_{x}} u^{- b_{δ_{1}}}, B_{5} \leftarrow c_{2}^{b_{x}} v^{- b_{δ_{2}}} .$

(4c)
Compute challenger (Equation (5)):

$c \leftarrow H (ω, c_{1}, c_{2}, c_{3}, B_{1}, B_{2}, B_{3}, B_{4}, B_{5}) \in Z_{p} .$

(5)
Compute signing variables (Equations (6a) and (6b)):

$s_{α} \leftarrow b_{α} + c α, s_{β} \leftarrow b_{β} + c_{β} s_{x} \leftarrow b_{x} + c x,$

(6a)

$s_{δ_{1}} \leftarrow b_{δ_{1}} + c δ_{1}, s_{δ_{2}} \leftarrow b_{δ_{2}} + c δ_{2} .$

(6b)

Therefore, the GS can be given as formula given in Equation (7):

σ_{G S} \leftarrow (c_{1}, c_{2}, c_{3}, c, s_{α}, s_{β}, s_{x}, s_{δ_{1}}, s_{δ_{2}}) .

(7)

3.1.4. Encryption

After signing the message, the sender device encrypts the message with the server’s public key so that no device in the network can read the message while it is in transit. Then, the sender merges it with the challenger into a single message and forwards it to the server. Let

(p, s_{p k})

be the server public key pair. Message M can be encrypted as the formula given in Equation (8):

U p d_{e} = M_{i}^{s_{p k}} m o d p .

(8)

The client adds the secret key part

S_{i}

received from the admin as formula given in Equation (9):

U p d_{s e} = U p d_{e} + S_{i} .

(9)

Subsequently, a pair

(U p d_{s e}, σ_{G S})

is sent to the server.

3.1.5. Verification

The server can identify the signature of the member by its group signature. If a member is registered as a group member and its membership is valid, it would sign with the latest group signature. The server can verify the latest group signature keys with the admin as given in step 4 of Figure 4. If the signature of the client is not matching with the latest group signature, it will be rejected by the server. In this way, the server ensures the identity of the client. However, as we can see, the server knows the group of the client from its signature, not the actual signer. The server computes the challenger from the group keys received from the admin, and ciphertext variables and signing variables received from the client. The client also sends the hashed challenger. If the two challengers, one sent by the client and one computed by the server, are not matching, the signature is rejected.

When the server receives any message from the edge device, it first decrypts it, obtains message

U p d_{s}

, and verifies the signature. To verify the signature, the server first collects the latest GS from the admin. Then, it computes the verification variables as (formula given in Equations (10a)–(10c)):

\tilde{B_{1}} \leftarrow u^{s_{α}} c_{1}^{- c}, \tilde{B_{2}} \leftarrow v^{s_{β}} c_{2}^{- c},

(10a)

\begin{matrix} \tilde{B_{3}} \leftarrow \hat{e} {(c_{3}, g_{2})}^{s_{x}} \hat{e} {(h, w)}^{- s_{α} - s_{β}} \hat{e} {(h, g_{2})}^{- s_{δ_{1}} - s_{δ_{2}}} e (c_{3}, w) / \hat{e} (g_{1}, g_{2}))^{c}, \end{matrix}

(10b)

\tilde{B_{4}} \leftarrow c_{1}^{s_{x}} u^{- s_{δ_{1}}}, \tilde{B_{5}} \leftarrow c_{2}^{s_{x}} v^{- s_{δ_{2}}} .

(10c)

Then, the server computes the challenger c again as formula given in Equation (11):

\tilde{c} \leftarrow H (M, c_{1}, c_{2}, c_{3}, \tilde{B_{1}}, \tilde{B_{2}}, \tilde{B_{3}}, \tilde{B_{4}}, \tilde{B_{5}}) .

(11)

If

c = \tilde{c}

, the signature is verified. Otherwise, the server rejects the message received. In this way, the server knows message M but does not know the signer of the message. The server receives the model updates from all participating members in this manner.

3.1.6. Secure Aggregation

After receiving updates from participating devices, the server collects all updates. These updates are padded with

S_{i}

. To cancel the masks

S_{i}

collected from all clients, the server needs an unmasking key, which the admin sends along with the latest GS. Then, the Server aggregates all updates to obtain a new updated model. The formula for SA can be expressed as given in Equation (12):

ω_{t + 1} \leftarrow Σ_{i = 1}^{K} \frac{n_{i}}{n} w_{t + 1}^{i},

(12)

where

η

is the fixed learning rate, K is the number of clients, C is the fraction of clients participating in the process,

ω

is the model parameter,

ω_{t}

is the current model, and

ω_{t + 1}

is the subsequent model. We define n =

Σ_{i} n_{i}

, where n is the total number of samples in the dataset. Algorithm 1 outlines the procedures at the server, clients, and admin of the proposed algorithm.

Algorithm 1 GSFedSec.

A. Server Executes:

1:: initialize $ω_{0}$
2:: for each iteration round $t \in N$ do
3:: $S_{t}$ = set of n clients
4:: send $S_{t}$ to the admin
5:: for each client $i \in S_{t}$ do
6:: $(ω_{t + 1}^{i}, σ_{U p d}^{i}) \leftarrow C l i e n t U p d a t e (i, w_{t})$
7:: decrypts $σ_{U p d}^{i}$
8:: verifies $σ^{i}$ : Equations (10) and (11)
9:: end for
10:: $w_{t + 1}^{i} \leftarrow Σ_{i = 1}^{K} \frac{n_{i}}{n} w_{t + 1}^{i}$
11:: end for

B. ClientUpdate( $i, w_{t}$ ):
12:: $σ \leftarrow A d m i n G e n e r a t e s (x_{i})$
13:: $B \leftarrow$ split $P_{i}$ into batches of size B
14:: for each client $i \in S_{t}$ do
15:: for batch $b \in B$ do
16:: $ω_{t + 1}^{i} \leftarrow ω_{t}^{i} - η ▿ l (ω; b)$
17:: $σ_{U p d}^{i} \leftarrow C l i e n t S i g n (ω_{t + 1}^{i})$
18:: end for
19:: end for
20:: return encrypted $(ω_{t + 1}^{i}$ , $σ_{U p d}^{i})$ to the server

C. ClientSign( $ω$ ):
21:: compute tracing variables: Equations (2a) and (2b)
22:: compute binding variables: Equations (4a)–(4c)
23:: compute challenger: Equation (5)
24:: compute signing variables: Equations (6a) and (6b)
25:: generate group signature: Equation (7)
26:: encrypt model update: Equation (8)
27:: add secret key $S_{i}$ : Equation (9)
28:: return $U p d_{s e}$ to ClientUpdate( $i, w_{t}$ )

D. AdminGenerates( $x_{i}$ ):
29:: generate $S_{n} = P R G (n)$
30:: divide $S_{n}$ into n parts
31:: for each client i from 1 to m do //m = total number of clients
32:: $A_{i} \leftarrow g^{1 / (γ + x_{i})} \in G_{1}$
33:: for each i from 1 to n do //n = number of clients selected for FL
34:: $A_{i} S_{i} = A_{i} + S_{i}$
35:: end for
36:: end for
37:: return $A_{i} S_{i}$

4. Security Analysis of the Proposed Algorithm

No networking algorithm can be implemented in a network if it is not safe to use. The proposed technique can overcome noteworthy attacks. The protocol has been tested on privacy and authentication related security criteria. We provide mathematical analysis and simulation results for our algorithm. We also present a few popular attacks and discuss how our protocol overcomes them. The major security attacks to evaluate the security of the proposed algorithm include the below:

Privacy attack on the identity of the user
Inference attack on the data received in various iterations.
Impersonation attack by an unauthenticated intruder
Sybil Attack or fake identities attack
Honest-but-curious attack by the server
Collusion attack by clients on a targeted client

4.1. Mathematical Analysis through BAN Logic

BAN logic is a set of rules defined to analyze the logic behind the steps of an algorithm. It was proposed in 1989 by Burrows, Abadi, and Needham [28], and it has been used to mathematically analyze protocol security. This analysis is applied to the assumptions made, the formulae derived from there, and the reasons generated in the algorithm. Then, we must determine whether the algorithm satisfies all the formulae and generated reasons. The basic notations, assumptions, and rules are provided in the Appendix A. Using these rules, we created the following reasons for using our algorithm:

4.1.1. Goals and Assumptions

In our network, there are three main entities: Client i

(C_{i})

, Admin (A), and Server (S).

$C_{i}$ and A have signature generation keys $A_{i}$ , which are shared between only these two entities and known to these two only.
- $C_{i}$ believes $A_{i}$ is shared between only $C_{i}$ and A:
  
  $C_{i} | \equiv C_{i} \overset{A_{i}}{\leftrightarrow} A .$
- A believes $A_{i}$ is shared between only A and $C_{i}$ :
  
  $A | \equiv A \overset{A_{i}}{\leftrightarrow} C_{1} .$
- $C_{i}$ believes A believes that $A_{i}$ is shared between only $C_{i}$ and A:
  
  $C_{i} | \equiv A | \equiv C_{i} \overset{A_{i}}{\leftrightarrow} A .$
- A believes $C_{i}$ believes that $A_{i}$ is shared between only $C_{i}$ and A:
  
  $A \equiv C_{i} | \equiv A \overset{A_{i}}{\leftrightarrow} C_{1} .$
- $C_{i}$ believes that $A_{i}$ sent by A is fresh:
  
  $\frac{C_{i} ◃ A_{i} \land C_{i} | \equiv # A_{1}}{C_{i} | \equiv A | \equiv A_{i}} .$
$C_{i}$ and S have $U p d_{i}$ , which is shared between only these two entities and known to these two only.
- $C_{i}$ believes $U p d_{i}$ is shared between only $C_{i}$ and S:
  
  $C_{i} | \equiv C_{i} \overset{U p d_{i}}{\leftrightarrow} S .$
- S believes $U p d_{i}$ is shared between only S and $C_{i}$ :
  
  $S | \equiv S \overset{U p d_{i}}{\leftrightarrow} C_{i} .$
- $C_{i}$ believes S believes that $U p d_{i}$ is shared between only $C_{i}$ and S:
  
  $C_{i} | \equiv S | \equiv C_{i} \overset{U p d_{i}}{\leftrightarrow} S .$
- S believes $C_{i}$ believes that $U p d_{i}$ is shared between only $C_{i}$ and S:
  
  $S \equiv C_{i} | \equiv S \overset{U p d_{i}}{\leftrightarrow} C_{i} .$
- S believes that $U p d_{i}$ sent by $C_{i}$ is fresh:
  
  $\frac{S ◃ U p d_{i} \land S | \equiv # U p d_{i}}{S | \equiv C_{i} | \equiv U p d_{i}} .$

We must convert the communication messages transmitted between the clients, servers, and admin into an idealized form. All transmitted messages must be up-to-date and kept private between the communicating entities. Suppose

M_{1}, M_{2}, \dots, M_{6}

denotes message number.

$M_{1} : G S R e q (C_{i} \to A)$ :- $N a, (E n c {(N a, i)}_{A_{p k}})$
$M_{2} : G S R e p (A \to C_{i})$ :- $N b, {(E n c {(s i g n (N b, A_{i})}_{A_{p k}^{- 1}})}_{i}$ .
$M_{3} : U p d a t e (C_{i} \to S)$ :- $N c, (E n c {(S i g n {(N c, ω_{i})}_{σ_{G S}})}_{S_{p k}})$ .
$M_{4} : G S v e r R e q (S \to A)$ :- $N d, E n c {(N d, σ G S}_{A_{p k}})$ .
$M_{5} : G S v e r R e p (A \to S)$ :- $N e, E n c {(N e, σ_{G S}}_{S_{p k}})$ .
$M_{6} : M o d e l (S \to C)$ :- $N f, S i g n {(N f, ω_{t + 1})}_{S_{s k}})$ .

4.1.2. Proof of Concept

When applied to the rules, all the messages mentioned above should ensure freshness, confidentiality, origin, and non-repudiation. Hence, we define a new formula based on the following rules:

Origin Rule:
- S receives $G S v e r R e p$ from Admin and believes that $(\overset{σ_{G S}}{\to} C_{i})$ . When S sees $U p d a t e$ is signed with $σ_{G S}$ , it believes that this message was sent by $C_{i}$ :
  
  $\frac{S | \equiv A | \sim \overset{σ_{G S}}{\to} C_{i} \land S ◃ E n c {(S i g n {(N c, ω_{i})}_{σ_{G S}})}_{S_{p k}}}{S | \equiv C_{i} | \sim E n c {(S i g n {(N c, ω_{i})}_{σ_{G S}})}_{S_{p k}}} .$
- S believes that $(\overset{A_{p k}}{\to} A)$ , hence it also believes that A sent ${E n c (N d, σ G S)}_{A_{p k}}$ , verifying origin of the message from A:
  
  $\frac{S | \equiv (\overset{A_{p k}}{\to} A) \land S ◃ {(E n c (N d, σ G S)}_{A_{p k}})}{{S | \equiv A | \sim (E n c (N d, σ G S)}_{A_{p k}})} .$
Non-Repudiation Rule
- Because of above origin rules, $C_{i}$ cannot deny sending $M_{3}$ , and A cannot deny sending $M_{4}$
Message Freshness Rule:
- A believes that $N a$ is fresh, and $C_{i}$ once said $(E n c {(N a, x_{i})}_{A_{p k}})$ . Therefore, A believes that $G S R e q$ message is fresh:
  
  $\frac{A | \equiv # N a \land A | \equiv C_{i} | \sim (E n c {(N a, x_{i})}_{A_{p k}})}{A | \equiv C_{i} | \equiv # (G S R e q)} .$
- $C_{i}$ believes that $N b$ is fresh, and it also believes that A once said $(E n c {(N b, A_{i})}_{x_{i}})$ , hence $C_{i}$ believes that $G S R e p$ is fresh:
  
  $\frac{C_{i} | \equiv # N b \land C_{i} | \equiv A | \sim (E n c {(N b, A_{i})}_{x_{i}})}{M_{2} | \equiv # (G S R e p)} .$
- S believes that $N c$ is fresh, and it sees that $C_{i}$ has sent
  $(E n c {(S i g n {(N c, ω_{i})}_{σ_{G S}})}_{S_{p k}})$ . Therefore, it believes that the message is fresh:
  
  $\frac{S | \equiv # N c \land S | \equiv (E n c {(S i g n {(N c, ω_{i})}_{σ_{G S}})}_{S_{p k}})}{S | \equiv # (U p d a t e)} .$
Confidentiality Rule:
- $A_{p k}^{- 1}$ is the corresponding private key of $A_{p k}$ , and it is only known to admin A. Hence, only the admin can decrypt the message $M_{1}$ and identify i. In this way, confidentiality about i is maintained between sender $C_{i}$ and admin.
  
  $\frac{\overset{A_{p k}}{\to} A \land (N a, (E n c {(N a, i)}_{A_{p k}}))}{O n l y A ◃ i} .$
- Only the server knows $S_{p k}$ . Hence, only the server can decrypt $M_{3}$ and see an update of client i, i.e., $ω_{i}$ . Hence, it is confidential between client $C_{i}$ and the server.
  
  $\frac{\overset{S_{p k}}{\to} S \land (N c, (E n c {(S i g n {(N c, ω_{i})}_{σ_{G S}})}_{S_{p k}}))}{O n l y S ◃ ω_{i}} .$

4.2. Security Proof through Simulation

This section presents a security analysis of the proposed protocol, as realized using the AVISPA simulation tool. This tool provides results based on the goal of ensuring security. The entire HLPSL code for the various algorithm functions is provided in Appendix B. It includes six basic entities: Server, Client 1, Client 2, Client 3, Client 4, and Admin. First, the admin sends a GS to all four clients, and the clients send an FL training request to the server. The server responds by selecting three clients from the original set of four. Then, all selected clients respond with updates based on their system knowledge. This simulation focuses on the secrecy of sensitive data and the sender’s privacy. The goals of the algorithm are as follows:

goal
secrecy_of sec_A2_C1,sec_A2_C2,sec_A2_C3,
    sec_A2_C4,sec_S_C1,sec_S_C2,sec_S_C4
 
authentication_on  rand1_auth,rand2_auth,
                   rand3_auth,rand4_auth
end goal

The abovementioned secret function has three parameters: (1) the data we want to keep secret, (2) the name of the protocol we refer to in the code, and (3) the set of members who know the secret data. Other than the members referred to in the third parameter, no other entity should have access to the confidential data. The GS is a sensitive entity, and only the valid group members should know the signature. This is represented using Gsign and it must be kept secret between the clients (

C 1

,

C 2

,

C 3

,

C 4

) and the issuing authority (

A 2

). Client data can be shared only between the server and the client; to this end, all secrets should be protected. The AVISPA tool checks for each mentioned member and determines whether the algorithm fulfills all goals. Figure 5 shows the knowledge of the intruder at the end of the simulation. As shown in Figure 6, the simulation concludes that the algorithm is safe. This means that our algorithm fulfills all the goals mentioned in the goal section. Therefore, the keys are secret, the intruder cannot identify any secret information, and we have correctly authenticated each entity before communicating with them. Table 3 lists the possible attacks and the algorithms that can prevent these attacks. As can be seen in the table, our algorithm is safe from various possible attacks.

4.3. Robustness against Various Attacks

4.3.1. Privacy Attack

Active adversaries want to attack the sender or some other device in the network. An active adversary may be a server or an edge device. In real life, a server can be an attacker or an accomplice to an attacker. The server may join forces with other devices to leak the privacy information of a particular device. In our protocol, even group members cannot distinguish whether two GSs are signed by two different signers or by the same signer. Only the admin of the group can distinguish it from the original signer. We assume that the group admin is a trusted party, and it never reveals any information to others.

4.3.2. Inference Attack

Inference attacks can leak the information gathered from different samples. This can be performed by either a curious entity or an active adversary. The attacker continuously monitors the traffic. If it can associate two to three messages, it can guess the sender of the message. In our algorithm, the client sends a random variable to the server to receive replies in an encrypted form. If this random variable is repeated, then the server may know that the two messages are from the same client. This is an advantage to the server; hence, the random key must not be repeated.

4.3.3. Impersonation Attack

Impersonation attacks are possible when some network entity assumes the identity of some other entity in the network. Because identities are not revealed between clients in the network, clients have minimal information about other clients, making impersonation attacks less likely.

4.3.4. Sybil Attack

Each client is registered with the admin in our model and has a valid GS. The admin first validates the client’s identity and then adds it to the group. It also continuously monitors suspicious participants; if it finds any, it immediately revokes their group membership and updates the GS. It also maintains a table (the “revocation table”) where it shows the reasons for all members’ revocations. If a client is blacklisted, they cannot be added to the group.

4.3.5. Honest-but-Curious Setting

The honest-but-curious setting occurs when a communication party wants to learn sensitive information about other parties from legitimate messages. For example, suppose that the server receives messages from end devices. Devices send messages (including device updates and secret keys) to protect data and prevent the server from revealing the identity of the sending devices. The server must not be able to discover whether a particular user sends updates.

In this study, the client first adds a secret component in the update, encrypts it, and then signs it from the GS as

(U p d_{s e}, σ_{G S})

. The server cannot determine whether the received

σ_{G S}

is signed by device A or B. In addition, the original update cannot be obtained because it is secured using a secret key. Thus, the server will be unable to obtain any sensitive information that is intended to be kept secret.

4.3.6. Possible Collusion by Clients

Corrupted clients can join forces to manipulate the input for the FL process. Hence, the list of clients selected for FL should be kept secret. However, clients must be notified if they are to be selected. To facilitate this, we used an encrypted reply from the server side instead of a list of clients. The decryption key is only available to the receiver, so other clients will not know which other clients are selected for FL training.

Detecting collusion by clients can be done only on a dynamic basis when they actually collide and pose malicious behavior. Unfortunately, there is no specific measure present in the basic FL setup to handle this issue. However, using group signature, we have made sure that even the client does not have data of another client. Even if a few clients join hands and want to conspire against a client, since they do not have any additional information about the client, their keys, or their data, it will not be possible for the group of conspiring clients to attack the targeted client. It is possible only in one case, where all the clients of the group join hands with the external server to go against a single client of the group, which is a highly unlikely case. The server is an external entity, and clients belong to a group. Until an extremely specific case, it is very unlikely for all the clients of a group to go against a member of their group to join hands with an external entity. Moreover, the list of clients selected for FL for that particular iteration is never revealed in public. Therefore, the group of conspiring clients will never know which client to target. In this way, our scheme prevents the collusion attack by clients on a single client.

5. Performance Analysis

The efficiency and competence of the network are primarily determined by the speed of the technology and the network response time. If the technology is strongly secured but the algorithms require a significant processing time, the technology will fail in real-life scenarios, especially in networking. In today’s era of rapid Internet services, computation and communication must be fast enough to respond in real time. In this section, we consider three major factors used to verify the efficiency of our protocol: the computational cost, communication cost, and signaling overhead. Figure 2 shows an algorithm in which the GS is not used, but the client shares part of the secret key with other clients (as explained above). The number of messages exchanged in that algorithm is significantly greater than in ours.

5.1. Computation Overhead

This section focuses on the total cost incurred for the computation performed by the edge devices and server before sending the messages. These costs include encryption, signing, and message formulation costs. The cost of computing certain mathematical operations is well-defined and can be obtained from these values [29]. Table 4 presents the computational costs of various operations, and Table 5 shows the cost of encryption and decryption as the number of clients or iterators increases. From these, we can deduce the computational cost of our protocol via the following steps:

5.1.1. Computation Cost for the Client Message ( $C_{C l i e n t}$ )

The computational cost for the client message consists of the signing cost (

C_{σ}

) and encryption cost (

C_{E N C}

); these are combined as a formula given in Equation (13):

C_{C l i e n t} = C_{s i g n} + C_{E N C} .

(13)

Signing Cost ( $C_{s i g n}$ ): This includes the cost incurred during the computation of $A_{i}$ . The various symbols used for computation and their meanings are given in Table 1. The cost of actual signature generation is $C_{σ}$ given in Equations (14)–(16):

$C_{s i g n} = C_{A_{i}} + C_{s e c} + C_{σ},$

(14)

where

$C_{A_{i}} = C_{E X P} + C_{D I V} + C_{M U L},$

(15)

$C_{σ} = C_{B V} + C_{S V} + C_{T V} + C_{H A S H} .$

(16)

We can calculate the cost of various parameters of signature generation ( $C_{σ}$ ) as: $C_{B V} = 6 C_{E X P} + 3 C_{B P} + 4 C_{M} = 6 \times 1 + 3 \times 4.51 + 4 \times 0.612 = 21.978$ , $C_{S V} = 5 C_{A D D} + 5 C_{M} = 5 \times 0.001 + 5 \times 0.612 = 3.065$ , $C_{T V} = 3 C_{E X P} + 3 C_{M} + C_{A D D} = 1 + 3 \times 0.612 + 0.001 = 2.837$ , and $C_{H A S H} = 0.067$ . From Equation (16), we obtain $C_{σ} = 21.978 + 3.065 + 2.837 + 0.067 = 27.947$ ms.
Encryption Cost ( $C_{E N C}$ ): This is the cost of encrypting the message we want to send to the server.

Finally, the total computation cost on the client side

C_{C l i e n t}

is computed as a formula given in Equation (17):

C_{C l i e n t} = C_{σ} + C_{E N C} = 27.947 ms + C_{E N C} .

(17)

5.1.2. Computation Cost on the Server Side ( $C_{S e r v e r}$ )

The computation cost on the server side consists of the decryption, verification, and aggregation costs; these are computed as the formula given in Equation (18):

C_{S e r v e r} = C_{D E C} + C_{V E R} + C_{A G G} .

(18)

Decryption Cost ( $C_{D E C}$ ): This is the cost of decrypting a single message sent by the client. The total cost can be obtained by multiplying this by the number of clients selected for learning.
Verification Cost ( $C_{V E R}$ ): This is the cost of verifying the signature at the receiver end. When the receiver receives the signed message, it first verifies its signature to confirm the authenticity of the signer. For this, it computes binding variables using the parameters sent by the signer via the formula given in Equation (19):

$C_{V E R} = C_{B V} + C_{H A S H},$

(19)

where $C_{B V} = 9 C_{M} + 8 C_{E X P} + 5 C_{B P} = 9 \times 0.612 + 8 \times 1 + 5 \times 4.51 = 36.058$ ms and $C_{V E R} = 36.058 + 0.067 = 36.125 ms$ .
Aggregation Cost ( $C_{A G G}$ ): The aggregation cost is directly dependent upon the number of clients participating in the learning process. The aggregation of FedAvg is computed as $Σ_{k = 1}^{K} p_{k} w_{t + 1}^{k}$ . Hence, for a total of K participating clients, the total cost is $K \times (C_{E X P} + C_{M})$ .

Suppose there are 1000 clients in the FL model. Then, the total cost of the server can be given as the sum of the above-computed algorithm:

C_{S e r v e r} = C_{D E C} + 36.125 + 1000 \times (1 + 0.612) = C_{D E C} + 1612 + 36.125 = 1648.125 + C_{D E C}

ms.

Hence, the total computational cost of the FL algorithm is calculated as the following formula given in Equation (20):

C_{A l g} = C_{S e r v e r} + C_{C l i e n t} .

(20)

For our proposed algorithm, the total computational cost is computed as the formula given in Equation (21):

\begin{matrix} C_{P r o p o s e d} & = C_{S e r v e r} + C_{C l i e n t} \\ = 1648.125 ms + C_{D E C} + 27.947 ms + C_{E N C} \\ = 1676.072 ms + C_{D E C} + C_{E N C} . \end{matrix}

(21)

5.1.3. Computational Cost for Multiple Iterations

To demonstrate the real-time efficiency of the algorithms, we show the computational cost over multiple iterations. This is computed (as described above) by changing the values of the variables; this is sensitive to the number of iterations. Hence, Table 6 presents the dynamic values of the computational costs for various FL iterations. Suppose that this is denoted by n; n varies between 50 and 1000 and has two intermediate values: 100 and 150. We assume that all iterations use the ElGamal algorithm for cryptography [30]. The encryption and decryption costs for the respective n were taken from [31].

For our proposed algorithm, the cost for n iterations can be computed as the formula given in Equation (22):

\begin{matrix} C C o s t_{(P r o p o s e d, n)} = C_{σ} + C_{V E R} + n (C_{E N C} + C_{D E C} + C_{A G G}) . \end{matrix}

(22)

Hence,

\begin{array}{l} C C o s t_{(P r o p o s e d, 50)} \\ = 27.947 + 36.125 + 50 \times (103.78 + 55.582 + 1.612) \\ = 8112.772 ms = 8.11 s, \end{array}

\begin{array}{l} C C o s t_{(P r o p o s e d, 100)} \\ = 27.947 + 36.125 + 100 \times (217.85 + 102.37 + 1.612) \\ = 32,247.272 ms = 32.25 s, \end{array}

\begin{array}{l} C C o s t_{(P r o p o s e d, 150)} \\ = 27.947 + 36.125 + 150 \times (347.89 + 145.40 + 1.612) \\ = 74,299.372 ms = 74.30 s, \end{array}

\begin{array}{l} C C o s t_{(P r o p o s e d, 1000)} \\ = 27.947 + 36.125 + 1000 \times (5210 + 1060 + 1.612) \\ = 6,271,676.072 ms = 6271.67 s . \end{array}

For comparison, we can compute the computational costs of other popular algorithms similarly, as presented in the following subsections.

5.2. Computation Overhead for Bonawitz et al. [5]

5.2.1. Computation Cost on the Client Side

Secret Key Sharing Cost: In this algorithm, the authors used secure aggregation. The client must send a part of the secret to the server to collect sufficient parts of the secret, and the updates are aggregated. This requires all clients to communicate with every other client and participate in FL. This makes the computational complexity the square of the total number of clients participating in FL. Because we assumed the number of clients to be 1000, the computational cost would be at least $1000 \times (1000 - 1)$ ms, which is relatively quite high.
Encryption Cost: The authors did not mention which algorithm they used for decryption. We can assume that they have used the same algorithm as ours for encryption. In this case, we assume that the cost of encryption is $C_{E N C}$ .
Signing Cost: Since they are not using a GS, their signature would be encrypted with the client’s private key. Hence, the cost of signature is the cost of encryption $C_{S I G} = C_{E N C}$ .

Hence, the computational cost of the client can be given as a formula given in Equation (23):

C_{C l i e n t} = 999,000 ms + C_{E N C} + C_{S I G} .

(23)

5.2.2. Computational Cost on the Server Side

Decryption Cost: Assuming that the authors are using the same algorithm for decryption, the cost of decryption is $C_{D E C}$ .
Verification Cost: To verify the signature, we must decrypt it using the sender’s public key. Although the costs of normal decryption and signature verification are not identical, we can assume they are similar for a large calculation. Hence, for our calculation, $C_{V E R} \approx C_{D E C}$ .
Aggregation Cost: FedAvg can be calculated using $Σ_{k = 1}^{K} p_{k} w_{t + 1}^{k}$ , which is equal to $C_{A G G} = 1.612$ .

Hence, the computational cost of the server is computed as the formula given in Equation (24):

C_{S e r v e r} = C_{D E C} + C_{V E R} + C_{A G G} .

(24)

5.2.3. Computational Cost for Multiple Iterations

Finally, the total computational cost over n iterations of the algorithm proposed by Bonawitz et al. [5] is computed as a formula given in Equation (25):

\begin{matrix} C_{B o n a w i t z} & = C_{S e r v e r} + C_{C l i e n t} \\ = C_{V E R} + C_{D E C} + C_{A G G} + C_{E N C} + C_{S I G} + 999,000 ms . \end{matrix}

(25)

However, client-to-client communication, signature generation, and verification are performed only once for n iterations; hence, the computational cost can be given as formula given in Equation (26):

\begin{matrix} C C o s t_{(B o n a w i t z, n)} = C_{S I G} + C_{V E R} + 999,000 ms + n \times (C_{E N C} + C_{D E C} + C_{A G G}) . \end{matrix}

(26)

Therefore, for different values of n, cost can be computed as:

\begin{array}{l} C C o s t_{(B o n a w i t z, 50)} \\ = 103.78 + 55.582 + 999,000 + 50 \times (103.78 + 55.582 + 1.612) \\ = 1,007,208.062 ms = 1007.2 s, \end{array}

\begin{array}{l} C C o s t_{(B o n a w i t z, 100)} \\ = 217.85 + 102.37 + 999,000 + 100 \times (217.85 + 102.37 + 1.612) \\ = 1,031,503.42 ms = 1031.5 s, \end{array}

\begin{array}{l} C C o s t_{(B o n a w i t z, 150)} \\ = 347.89 + 145.40 + 999,000 + 150 \times (347.89 + 145.40 + 1.612) \\ = 1,073,728.59 ms = 1073.72 s, \end{array}

\begin{array}{l} C C o s t_{(B o n a w i t z, 1000)} \\ = 5210 + 1060 + 999,000 + 1000 \times (5210 + 1060 + 1.612) \\ = 7,276,882 ms = 7276.88 s . \end{array}

5.3. Computation Overhead for Sun et al. [23]

5.3.1. Computation Cost of Sender

This algorithm uses a decentralized server instead of a client-server model. To reduce the communication cost, the authors used a quantized model in which federated averaging depended on the local training of the node data, and they used a graph to connect all nodes in the network.

Secret Key Sharing Cost: In this algorithm, clients communicate with their neighboring clients. However, they communicate only after multiple FL sessions. This reduces the cost involved in comparing communications after every FL session. However, the communication between clients is still proportional to the number of clients in the network. We take the number of clients in the network as 1000; hence, the total number of computations for client-to-client communication would be 1000 × 999. Suppose that these clients are communicating after a maximum of 100 FL sessions in their model. The need for communication would be 100-fold less. In this case, the client-to-client interaction complexity would result in a computation time of at least 9,990,00/100 = 9990 ms, which is still very high.
Encryption and Digital Signature Cost: The authors have not mentioned the name of the cryptography algorithm used for encryption and decryption. For signature also, no algorithm is mentioned. Therefore, we assume that it is signed by the sender’s private key. Hence, the cost of signature in this algorithm is the same as the cost of signature in paper [5].

The total cost of the sender in this algorithm can be given as below formula given in Equation (27):

C_{S e n d e r} = 9990 + C_{E N C} + C_{S I G},

(27)

and similarly the total cost of the receiver can be given as Equation (28):

C_{R e c e i v e r} = C_{V E R} + C_{D E C} + C_{A G G} .

(28)

5.3.2. Computational Cost for Multiple Iterations

Hence, the total computational cost of the algorithm proposed by Sun et al. [23] is computed as:

\begin{array}{l} C C o s t_{(S u n, 50)} \\ = 103.78 + 55.582 + 9990 + 50 \times (103.78 + 55.582 + 1.612) \\ = 18,198.062 ms = 18.2 s, \end{array}

\begin{array}{l} C C o s t_{(S u n, 100)} \\ = 217.85 + 102.37 + 9990 + 100 \times (217.85 + 102.37 + 1.612) \\ = 42,493.42 ms = 42.5 s, \end{array}

\begin{array}{l} C C o s t_{(S u n, 150)} \\ = 347.89 + 145.40 + 9990 + 150 \times (347.89 + 145.40 + 1.612) \\ = 84,718.59 ms = 84.7 s, \end{array}

\begin{array}{l} C C o s t_{(S u n, 1000)} \\ = 5210 + 1060 + 9990 + 1000 \times (5210 + 1060 + 1.612) \\ = 6,287,872 ms = 6287.9 s . \end{array}

Similarly, we computed the cost of various cryptographic algorithms, and the comparison is presented in Table 6, and the graph shown is in Figure 7.

5.4. Communication Overhead

The nodes in the network have a fixed bandwidth for data transmission. Hence, extra parameters needed to ensure the identity and security of data are considered overheads and should always be minimized. In the GS method, the main task is to distribute group keys to the clients and clients to transfer messages to the server. Distributing group keys are of linear complexity which occurs when any member is being added to or leaving the group. When there is any change in the group members, group keys are updated. This may increase the complexity of the group signature algorithm by m (number of clients in a group) communication because the updated group keys need to be transferred to each group member. However, it is still linear and less complex when compared with the exponential complexity of the plain SA algorithm.

Our algorithm uses four types of communication: administrator-to-client communication, client-to-server communication, administrator-to-server communication, and server-to-client communication. Each communication adopts a different message format, as discussed below.

5.4.1. Cost for Admin-to-Client Communication

Admin sends

A_{i}

and

S_{i}

to a client. In addition, it publishes

g_{p k}

as

(g_{1}, g_{2}, h, u, v, w)

in the network. Suppose each group key has a size of 171 bits, and we have six parameters in

g_{p k}

; hence, the size of

g_{p k}

will be 171 × 6 = 1026 bits, and the secret keys used to secure the message should be of sufficient length to prevent it from being brute-forced by the server. Suppose the size is 1024 bits. Therefore,

A_{i}

and

S_{i}

will be 1024 bits each, and the total message size sent by the admin to a client will be 3074 bits. However, this is shared only once during the entire session. Therefore, the cost of communication for t iterations will be

((m \times 2050) + (n \times 3074)) / t

, where m is the total number of clients and n is the number of clients participating in the communication.

5.4.2. Cost for Client-Server Communication

Each client needs to send fixed numbers of parameters in the network, and hence, their format is fixed. Therefore, this overhead depends on the size and number of required parameters. For our algorithm, the packet size is a combination of the message, GS, and the client’s portion of the secret key, which is

(U p d, σ_{G S}, S_{i})

. Let us assume that the size of the updated message is 1024 bits. This is also common for all other algorithms. However, most algorithms use a digital signature instead of a GS; therefore, it will vary from other algorithms’ results.

σ_{G S}

has nine variables:

(c_{1}, c_{2}, c_{3}, c, s_{α}, s_{β}, s_{x}, s_{δ_{1}}, s_{δ_{2}})

. Each one contains 171 bits. Hence,

σ_{G S} = 9 \times 171 = 1539

bits. The message format is given in Table 7. The total size of the message is given in Equation (29):

\begin{matrix} L_{M} & = L_{G I D} + L_{M I D} + L_{R A N D} + L_{P a y l o a d} + L_{G S} + L_{T T L} + L_{T S} + L_{S_{i}} \\ = 16 + 16 + 128 + 1024 + 1539 + 8 + 32 + 1024 \\ = 3787 bits = 473 bytes \end{matrix}

(29)

In algorithms that use a personal signature (not a GS), the client sends two encrypted messages to the server: one for encryption and the other for the signature. Thus, there is no separate section for the signature in the final packet, which reduces the packet’s size. However, clients may need to send separate messages to preserve their privacy. Hence, the message sizes of the existing algorithms can be obtained as follows:

5.4.3. Message Size for Bonawitz et al. [5]

The client sends two messages to the server and

n - 1

messages to other clients participating in the FL. If 1000 clients participate in FL, the total packet size for all communications in one iteration of FL is given in Equation (30):

\begin{matrix} L_{(M, B o n a w i t z)} & = L_{M I D} + L_{P a y l o a d} + L_{T T L} + L_{T S} \\ = 16 + 1024 + 8 + 32 \\ = 1080 bits = 135 bytes . \end{matrix}

(30)

In addition to the server, each participating client must communicate with

n - 1

clients. Therefore, the total size of all messages is

n \times M_{B o n a w i t z}

, which is

1000 \times 135 = 135,000

bytes.

5.4.4. Message Size for Sun et al. [23]

For the analysis of this algorithm, we assumed that clients communicate with each other after 100 FL iterations. Hence, the total bandwidth at each 100-th round matched that in [5]. However, no communication occurred during the subsequent 99 rounds of training, and the bandwidth was saved for 99 rounds. Therefore, the average bandwidth for each round was 135,000/100 = 1350 bytes.

Similarly, the communication costs for other related algorithms were computed; they are listed in Table 8 and Figure 8. The graph shows a large difference in the average packet size of our algorithm compared with other existing algorithms. This shows that our algorithm is considerably more efficient.

5.5. Signaling Overhead

Signals indicate the number of transmissions required to complete the communication. Suppose that the client wants to send updates to the server. For this, it must communicate with the admin and server, collect the necessary keys, and wait for its turn. Figure 4 shows the number of signals required for one iteration. Assume that m is the total number of devices in the network, and n is the total number of devices chosen for the FL. In our algorithm, clients first send their FL requests to the server, which receives m signals. The server then selects n clients and replies to them. It takes n signals. The server also sends a list of participants to the admin, which receives one signal. Similarly, sending GSReq to the admin requires m signals, and sending GSRep to clients also requires m signals. n clients send updates to the server using n signals. The server and admin communicate with each other using two signals. The total number of signals transmitted for one iteration of the FL is

m + n + 1 + m + m + n + 1 + 1

.

For example, if we assume 2000 devices (of which 1000 are chosen for FL), the signaling overhead would be

2000 + 1000 + 1 + 2000 + 2000 + 1000 + 1 + 1 = 8001

for our proposed algorithm. The signaling overhead in [5] is the highest of all compared algorithms. Other steps would be the same, but instead of having the admin message the client, we would have

n \times (n - 1)

messages for client-to-client communication. Thus, for 2000 clients with 1000 participants, the signaling overhead would be

2000 + 2000 + 1000 + 1000 (1000 - 1) + 1000 + 1 + 2000 = 1, 007, 001

signals. Similarly, we can compute the signaling overhead of the other algorithms; the output is given in Table 9, and the associated graph is shown in Figure 9. The comparison graph shows that our algorithm outperforms the other existing algorithms, even when the number of clients increases in the network. Although other algorithms perform better for fewer clients or iterations, our algorithm is highly scalable and yields good results even for larger networks.

Figure 7, Figure 8 and Figure 9 show that our algorithm performs more efficiently than others. This demonstrates the high efficiency of our approach without affecting the accuracy of the updates.

6. Conclusions and Future Work

We propose a novel technique for combining GSs with SA to manage privacy preservation problems in FL. Instead of the conventional approach (i.e., of sharing secret keys between clients), we shared the secret key between the admin and client. The admin is already trusted to generate a GS, and generating secure keys on its side eases the burdens upon other clients. Detailed numerical analysis is provided to verify that the proposed algorithm achieves significantly lower computational and communication costs than existing algorithms for the iterative FL process. The number of transmissions required to complete the communication was also much lower under our algorithm. Simulation results show that our algorithm can effectively preserve the privacy of both the data and users than existing algorithms. We also provided an AVISPA-based security analysis to show that our proposed algorithm is resistant to numerous different security attacks.

Our future work will focus on reducing the number of computations required to reduce the overall complexity while also exploiting the iterative process in FL. Another important future work for this paper is to reduce the dependency on admin. The basic assumption of our paper is that the admin is a completely trusted entity, and it must be available all the time during the FL process. Clients and servers both depend upon the admin for authentication purposes. In real-life scenarios, it can be the limitation of the GS approach. In our future work, we will also consider cases where an admin is unavailable, or the admin is a compromised entity. We will explore the problems caused by the situation and try to find out optimal solution to it.

Author Contributions

Conceptualization, S.K. and B.J.C.; methodology, validation, and formal analysis, S.K.; investigation, J.W.J. and J.Y.Y.; resources, J.W.J. and J.Y.Y.; writing, S.K. and J.W.J.; review and editing, B.J.C.; supervision, B.J.C.; project administration, B.J.C.; funding acquisition, B.J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was supported by the MSIT Korea under the NRF Korea (NRF 2022R1A2C4001270) and the Information Technology Research Center (ITRC) support program (IITP-2024-2020-0-01602) supervised by the IITP.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. BAN Logic Notations and Logic

Appendix A.1. Notations

$P | \equiv X$ : P believes that X holds. It is the belief of P. X may not be true as well.
$P ◃ X$ P encounters the formula X.
$P | \Rightarrow X$ : P completely owns X.
$P | \sim X$ : P has once said X in its past.
$# X$ : X is a nonce, which has never been used before.
$\overset{K}{\to} P$ P’s public key is K, and its private key is $K^{- 1}$ .
$k e y (K, P \leftrightarrow Q)$ : K is a secret key of P and Q.
$P \overset{X}{\Leftrightarrow} Q$ : The secret X is shared between P and Q only.
${(X)}_{K}$ . K is the encryption key used to encrypt X.

Appendix A.2. Using Notations and Logic

To use the rules, we need to have Introduction and Elimination rules:

If P creates any random number, it believes in its freshness because it could never be used before.

$\frac{P c r e a t e s X}{P | \equiv # (X)}$
If P sends some message to Q, Q sees it.

$\frac{P \to Q : X}{Q ◃ X}$
Shared Key Rule: P believes K is the secret key shared only between P and Q. If it sees any message encrypted with K, it believes Q said that message.

$\frac{P | \equiv (P \overset{K}{\leftrightarrow} Q) \land P ◃ {(X)}_{K}}{P \equiv (Q | \sim X)}$
Public Key Rule: P believes K is the public key of Q. If it sees any message encrypted with $K^{- 1}$ , it believes Q said that message.

$\frac{P | \equiv (\overset{K}{\to} Q) \land P ◃ {(X)}_{K}^{- 1}}{P \equiv (Q | \sim X)}$
$| \sim$ Elimination Rule: P believes that Q once said X, and P believes X is fresh. Hence P believes Q believes X.

$\frac{P | \equiv (Q | \sim X) \land P | \equiv # X}{P | \equiv Q | \equiv X}$
⇒ Jurisdiction Rule: P believes Q can be trusted over X. P also believes that Q believes X, so P believes X as well.

$\frac{P | \equiv (Q | \Rightarrow X) \land P | \equiv (Q | \equiv X)}{P | \equiv X}$
Freshness of Multipart message rule: P believes X is fresh. P sees X is used in another message forming (X,Y). P believes (X,Y) is fresh as well.

$\frac{P | \equiv # X \land P | ◃ (X, Y)}{P | \equiv # (X, Y)}$

Appendix B. Simulation Code of the Security Proof

This appendix gives the simulation code of the various roles in our algorithm. This is carried out using HLPSL language, and execution is carried out in the AVISPA tool. Details of the coding-related information can be found in Section 4.

Listing A1. Role Analyst.

role analyst (A,S,C1,C2,C3,C4: agent,
                         Snd, Rcv: channel(dy))
played_by A
def=
  local  State: nat,
         SecA,SecS: symmetric_key,
         PubC,PubS: public_key,
         LatestModel, Data: text
 
  init State := 0
 
  transition
   1. State  = 0 /\ Rcv(start) =|>
      State':= 1 /\ Latest_Model := new()
                 /\PubC := new()
                 /\Snd(LatestModel')
 
   2. State  = 1 /\ Rcv(Data')_PubS =|>
      State':= 2 /\ Snd (LatestModel)
 
end role

Listing A2. Role Server.

role server (S,A, C1,C2,C3,C4: agent,
              Snd,Rcv: channel(dy))
   
played_by S
def=
  local State: nat,
        PubS,PubC: public_key,
        Req1, Req2, Req3, Req4,
        Cdata1,Cdata2,Cdata4,Data,
        Acc1,Acc2,Acc4: text,
        SecA,SecS,RAND1,RAND2,RAND3,
        RAND4,Gsign: symmetric_key
 
 const  sec_S_C1, sec_S_C2,
        sec_S_C4  : protocol_id
 
 init  State := 1
 
  transition
   1. State  = 1 /\ Rcv(Req1_Gsign.RAND1_PubS)
          /\ Rcv(Req2_Gsign.RAND2_PubS)
          /\ Rcv(Req3_Gsign.RAND3_PubS)
          /\ Rcv(Req4_Gsign.RAND4_PubS)=|>
      State':= 2 /\ Snd(Acc1_RAND1)
          /\ Snd(Acc2_RAND2)
          /\ Snd(Acc4_RAND4)
   2. State  = 2 /\ Rcv(Cdata1_Gsign)
          /\ Rcv(Cdata2_Gsign)
          /\ Rcv(Cdata4_Gsign) =|>
 State':= 3 /\ Data := new()
    /\ Snd(Data')_PubA
    /\ secret(Cdata1,sec_S_C1,{S,C1})
    /\ secret(Cdata2,sec_S_C2,{S,C2})
    /\ secret(Cdata4,sec_S_C4,{S,C4})
    /\ witness(D1,S,rand1_auth,RAND1)
    /\ witness(D2,S,rand2_auth,RAND2)
    /\ witness(D3,S,rand3_auth,RAND4)
    /\ witness(D4,S,rand4_auth,RAND4)
end role

Listing A3. Role Client1.

role client1 (A,S,C1: agent,
                 Snd,Rcv: channel(dy))
played_by C1
def=
  local State: nat,
    Cdata1,Acc1,LatestModel : text,
    RAND1,Gsign: symmetric_key,
    PubC1: public_key
    
const  sec_S_C1 : protocol_id
 
  init  State := 0
 
  transition
   1. State  = 0 /\ Rcv(Gsign.PubC1)
                 /\Rcv(LatestModel) =|>
      State':= 1 /\ Req1' := new()
                 /\RAND1' := new()
                 /\ Snd({Req1'}_Gsign.RAND1_PubS)
 
    
   2. State = 1 /\ Rcv(Acc1_RAND1)=|>
      State':= 2 /\ Cdata1 := new()
      /\Snd(Cdata1'_PubS)
      /\ secret(Cdata1,sec_S_C1,{S,C1})
      /\ wrequest(D1,S,rand1_auth,RAND1)
end role

Listing A4. Role Client2.

role client2 (A,S,C2: agent,
		     Snd,Rcv: channel(dy))
played_by C2
def=
  local State: nat,
      
        Cdata2,Acc2,LatestModel : text,
        RAND2, Gsign: symmetric_key,
        PubC2: public_key
	 
const  sec_S_C2: protocol_id
 
  init  State := 0
 
  transition
   1. State  = 0 /\ Rcv(Gsign.PubC2)
                 /\Rcv(LatestModel) =|>
      State':= 1 /\ Req2' := new()
                 /\RAND2' := new()
                 /\ Snd({Req2'}_Gsign.RAND2_PubS)
   2. State = 1 /\ Rcv(Acc2_RAND2)=|>
      State':= 2 /\ Cdata2 := new()
      /\Snd(Cdata2'_PubS)
            /\ secret(Cdata2,sec_S_C2,{S,C2})
            /\ wrequest(D2,S, rand2_auth,RAND2)
end role

Listing A5. Role Client3.

role client3 (A,S,C3: agent,
                 Snd,Rcv: channel(dy))
played_by C3
def=
  local State: nat,
        Cdata3,Acc3,LatestModel : text,
        RAND3, Gsign: symmetric_key,
        PubC3: public_key
	 
const  sec_S_c3 : protocol_id
 
  init  State := 0
 
  transition
   1. State  = 0 /\ Rcv(Gsign.PubC3)
   /\Rcv(LatestModel) =|>
      State':= 1 /\ Req3' := new()
      /\RAND3' := new()
      /\ Snd({Req3'}_Gsign.RAND3_PubS)
      /\ wrequest(D3,S,rand3_auth,RAND3)
end role

Listing A6. Role Client4.

role client4 (A,S,C4: agent,
 
Snd,Rcv: channel(dy))
 
played_by C4
def=
  local State: nat,
      
        Cdata4,Acc4,LatestModel : text,
        RAND4, Gsign: symmetric_key,
        PubC4: public_key
         
const  sec_S_C4 : protocol_id
 
  init  State := 0
 
  transition
 
   1. State  = 0 /\ Rcv(Gsign.PubC4)
   /\Rcv(LatestModel) =|>
      State':= 1 /\ Req4' := new()
      /\RAND4' := new()
      /\ Snd({Req4'}_Gsign.RAND1_PubS)
 
    
   2. State = 1 /\ Rcv(Acc4_RAND4)=|>
      State':= 2 /\ Cdata4 := new()
      /\Snd(Cdata4'_PubS)
 
          /\ secret(Cdata4,sec_S_C4,{S,C4})
          /\ wrequest(D4,S,rand4_auth,RAND4)
end role

Listing A7. Role Admin.

role admin (A2,C1,C2,C3,C4: agent,
 Snd,Rcv: channel(dy))
 
played_by A2
def=
  local State: nat,
        Gsign: public_key
	 
const  sec_A2_C4,sec_A2_C2,
       sec_A2_C3,sec_A2_C1 : protocol_id
 
  init  State := 0
 
  transition
 
   1. State  = 0 /\ Rcv(PubC1)
   /\ Rcv(PubC2)
   /\ Rcv(PubC3) /\ Rcv(PubC4)  =|>
      State':= 1 /\ Gsign' := new()
      /\Snd(Gsign.PubC1)
      /\ Snd(Gsign.PubC2)
      /\ Snd(Gsign.PubC3)
      /\ Snd(Gsign.PubC4)
      /\ secret(Gsign,sec_A2_C1,{A2,C1})
      /\ secret(Gsign,sec_A2_C2,{A2,C2})
      /\ secret(Gsign,sec_A2_C3,{A2,C3})
      /\ secret(Gsign,sec_A2_C4,{A2,C4})
            
		  
end role

Listing A8. Role Environment.

role environment()
def=
 
const a,a1,s,c1,c2,c3,c4: agent,
    Rand1,RAND2,RAND3,RAND4,Gsign : symmetric_key,
    PubC1,PubC2,PubC3,PubC4,PubS,PubA : public_key,
 
  intruder_knowledge = {PubC1,PubC2,PubC3,
  PubC4,PubS,PubA}
 
  composition
	session(a,a1,s,c1,c2,c3,c4)
end role

References

Chai, D.; Wang, L.; Yang, L.; Zhang, J.; Chen, K.; Yang, Q. Fedeval: A holistic evaluation framework for federated learning. arXiv 2020, arXiv:2011.09655. [Google Scholar]
Qiang, Y.; Yang, L.; Yong, C.; Yan, K.; Tianjian, C.; Han, Y. Federated learning. In Synthesis Lectures on Artificial Intelligence and Machine Learning; Brachman, R., Rossi, F., Stone, P., Eds.; Springer: Cham, Switzerland, 2019; Volume 13, pp. 1–207. [Google Scholar]
Van Rijmenam, M. Privacy in the Age of AI: Risks, Challenges and Solutions; The Digital Speaker: Darlinghurst, Australia, 2023. [Google Scholar]
Lin, Y.; Xie, Z.; Chen, T.; Cheng, X.; Wen, H. Image Privacy Protection Scheme Based on High-Quality Reconstruction DCT Compression and Nonlinear Dynamics. Expert Syst. Appl. 2024, 257, 124891. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.y. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics; Lawrence, N., Ed.; PMLR: London, UK, 2017; pp. 1273–1282. [Google Scholar]
Aledhari, M.; Razzak, R.; Parizi, R.M.; Saeed, F. Federated learning: A survey on enabling technologies, protocols, and applications. IEEE Access 2020, 8, 140699–140725. [Google Scholar] [CrossRef] [PubMed]
Bouacida, N.; Mohapatra, P. Vulnerabilities in Federated Learning. IEEE Access 2021, 9, 63229–63249. [Google Scholar] [CrossRef]
Li, O.; Sun, J.; Yang, X.; Gao, W.; Zhang, H.; Xie, J.; Smith, V.; Wang, C. Label leakage and protection in two-party split learning. arXiv 2021, arXiv:2102.08504. [Google Scholar]
Tolpegin, V.; Truex, S.; Gursoy, M.E.; Liu, L. Data poisoning attacks against federated learning systems. In Proceedings of the ESORICs 2020: 25th European Symposium on Research in Computer Security; Chen, L., Li, N., Liang, K., Schneider, S., Eds.; Springer International Publishing: New York, NY, USA, 2020; pp. 480–501. [Google Scholar]
Ma, J.; Naas, S.A.; Sigg, S.; Lyu, X. Privacy-preserving federated learning based on multi-key homomorphic encryption. Int. J. Intell. Syst. 2022, 37, 5880–5901. [Google Scholar] [CrossRef]
Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.H.; Farokhi, F.; Jin, S.; Quek, T.Q.S.; Poor, H.V. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017. [Google Scholar]
Sneha, K.; Choi, B.J. Group signature based federated learning approach for privacy preservation. In Proceedings of the 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET) IEEE, Cape Town, South Africa, 9–10 December 2021. [Google Scholar]
Camenisch, J.; Stadler, M. Efficient group signature schemes for large groups. In Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 17–21 August 1997; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
Wang, G.; Bao, F.; Zhou, J.; Deng, R.H. Security remarks on a group signature scheme with member deletion. In Proceedings of the International Conference on Information and Communications Security, ICICS 2003, Huhehaote, China, 10–13 October 2003. [Google Scholar]
Armando, A.; Basin, D.; Boichut, Y.; Chevalier, Y.; Compagna, L.; Cuellar, J.; Drielsma, P.H.; Heám, P.C.; Kouchnarenko, O.; Mantovani, J. The AVISPA tool for the automated validation of internet security protocols and applications. In Proceedings of the 17th International Conference, Computer Aided Verification: CAV 2005, Edinburgh, UK, 6–10 July 2005. [Google Scholar]
Hartmann, F. Federated Learning. Master’s Thesis, Freie Universität, Berlin, Germany, 20 August 2018. [Google Scholar]
Kopparapu, K.; Lin, E. Fedfmc: Sequential efficient federated learning on non-iid data. arXiv 2020, arXiv:2006.10937. [Google Scholar]
He, C.; Annavaram, M.; Avestimehr, S. Group knowledge transfer: Federated learning of large cnns at the edge. arXiv 2020, arXiv:2007.14513. [Google Scholar]
Choi, B.; Sohn, J.; Han, D.; Moon, J. Communication-Computation Efficient Secure Aggregation for Federated Learning. arXiv 2020, arXiv:2012.05433. [Google Scholar]
So, J.; Güler, B.; Avestimehr, A.S. Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning. IEEE J. Sel. Areas Inf. Theory 2021, 2, 479–489. [Google Scholar] [CrossRef]
Xu, C.; Liu, S.; Yang, Z.; Huang, Y.; Wong, K.K. Learning Rate Optimization for Federated Learning Exploiting Over-the-air Computation. arXiv 2021, arXiv:2102.02946. [Google Scholar] [CrossRef]
Sun, T.; Li, D.; Wang, B. Decentralized Federated Averaging. arXiv 2021, arXiv:2104.11375. [Google Scholar] [CrossRef] [PubMed]
Xu, R.; Baracaldo, N.; Zhou, Y.; Anwar, A.; Joshi, J.; Ludwig, H. FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data. arXiv 2021, arXiv:2103.03918. [Google Scholar]
Boneh, D.; Boyen, X.; Shacham, H. Short group signatures. In Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 15–19 August 2004; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Lin, X.; Sun, X.; Ho, P.H.; Shen, X. GSIS: A secure and privacy-preserving protocol for vehicular communications. IEEE Trans. Veh. Technol. 2007, 56, 3442–3456. [Google Scholar]
Mahalle, P.N.; Prasad, N.R.; Prasad, R. Threshold cryptography-based group authentication (TCGA) scheme for the Internet of Things (IoT). In Proceedings of the 4th International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace & Electronic Systems (VITAE), IEEE, Aalborg, Denmark, 11–14 May 2014; pp. 1–5. [Google Scholar]
Burrows, M.; Abadi, M.; Needham, R. A logic of authentication. ACM Trans. Comput. Syst. 1990, 8, 18–36. [Google Scholar] [CrossRef]
Parne, B.L.; Gupta, S.; Chaudhari, N.S. Segb: Security enhanced group based aka protocol for m2m communication in an iot enabled lte/lte-a network. IEEE Access 2018, 6, 3668–3684. [Google Scholar] [CrossRef]
Tsiounis, Y.; Yung, M. On the security of ElGamal based encryption. In Proceedings of the International Workshop on Public Key Cryptography, Pacifico Yokohama, Japan, 5–6 February 1998; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Chen, Y.; Shen, S.; Tzeng, W. Weave ElGamal Encryption for Secure Outsourcing Algebraic Computations Over Zp. IACR Cryptol. 2015, 33, 233–250. [Google Scholar]

Figure 1. Federated learning model and privacy breach.

Figure 2. (a) Traditional secure aggregation vs. (b) Group signature-based secure aggregation.

Figure 3. Original secure aggregation scheme [12].

Figure 4. GSFedSec: Group signature-based secure aggregation for privacy preservation in federated learning.

Figure 5. Intruder knowledge.

Figure 6. Simulation result: SAFE.

Figure 7. Comparison of computation cost of algorithms for a varying number of iterations in an FL session (

n = 50, 100, 150, 1000

).

Figure 7. Comparison of computation cost of algorithms for a varying number of iterations in an FL session (

n = 50, 100, 150, 1000

).

Figure 8. Comparison of communication cost of algorithms.

Figure 9. Comparison of signaling cost of algorithms.

Table 1. List of symbols and abbreviations used.

Abbreviation	Description	Abbreviation	Description
A	Admin	$A_{i}$	Secret key of client i
$B_{1}, B_{2}, B_{3}, B_{4}, B_{5}$	Binding variables	$c, \hat{c}$	Challengers
$C_{1}, C_{2}, C_{3}$	Tracing variables	$c_{1}, c_{2}, c_{3}$	Ciphertext set
$C_{B V}$	Cost of binding variables	$C_{S V}$	Cost of signing variables
$C_{T V}$	Cost of tracing variables	$C_{E N C}$	Cost of encryption
$C_{D E C}$	Cost of decryption	$C_{V E R}$	Cost of verification
$C_{A G G}$	Cost of aggregation	$D P$	Differential privacy
$\hat{e}$	Bilinear map	$E n c$	Encryption
$F L$	Federated learning	$F L R e q$	Participation request for FL
$F L R e p$	Confirmation reply of FLReq	$G_{1}, G_{2}, G_{T}$	Cyclic isomorphic group
$g_{1}, g_{2}, g_{T}$	Generators of $G_{1}, G_{2}, G_{T}$	$σ_{G S}$ or $G S$	Group signature
$G S R e q$	Request for sending GS	$G S R e p$	Reply with the latest GS
$H E$	Homomorphic encryption	m	Total number of clients
n	Number of clients selected for FL	$η$	Fixed learning rate of client
$ω_{k}$	Current model update	$ω_{k + 1}$	Updated model update
$s_{1}, s_{2}, s_{3}, s_{4}, s_{5}$	Signing variables	$S A$	Secure aggregation
$S_{i}$	Secret Key part in SA	$s_{p k}$	Public key of the server
$U p d$ , $ω$	Updated gradient	$U p d_{s}$	$U p d$ with $S_{i}$
$U p d_{s e}$	Encrypted $U p d$ with $s_{p k}$

Table 2. Comparison of various techniques used in FL to provide privacy.

Properties	SA	DP	HE	GS	GSFedSec (Proposed)
Privacy Protection	Data	Data	Data	User	Data and User
Client-Client Communication	Required	No	No	No	No
Participant List	Required	No	No	No	No
Computational Complexity	High	Low	High	Low	Low
Space Complexity	High	Low	Low	Low	Low
Efficiency	Low	Low	High	High	High
Accuracy	High	Low	High	High	High

Table 3. Prevention from various security attacks.

Attack	Algorithm
Attack	[24]	[1]	[5]	[23]	[22]	Proposed
1. Privacy Attack	×	×	×	×	×	✓
2. Inference Attack	✓	✓	✓	×	×	✓
3. Impersonation Attack	×	×	×	×	×	✓
4. Sybil Attack	×	×	×	×	×	✓
5. Honest-but-Curious Attack	✓	✓	✓	✓	✓	✓
6. Possible Collusion by Clients	×	×	×	×	×	✓

Table 4. Computation cost of operations.

Notation	Operation	Unit Cost (ms)
$C_{A D D}$	Addition/Subtraction	0.001
$C_{X O R}$	XOR	0.002
$C_{R A N D}$	Random number generation	0.045
$C_{H A S H}$	Hash function	0.067
$C_{M U L}$	Multiplication	0.612
$C_{E X P}$	Exponentiation	1.0
$C_{D I V}$	Division	1.22
$C_{M O D}$	Modulus	1.24
$C_{B P}$	Bilinear pairing	4.51

Table 5. Cost of encryption and decryption for different numbers of clients.

Algorithm	Cost (ms)
Algorithm	$n = 50$	$n = 100$	$n = 150$	$n = 1000$
$C_{E N C}$	103.78	217.85	347.89	5210
$C_{D E C}$	55.58	102.37	145.40	1060

Table 6. Computational cost for multiple iterations.

Algorithm	Number of CPU Cycles	Cost (s)
Algorithm	Number of CPU Cycles	$t = 50$	$t = 100$	$t = 150$	$t = 1000$
Runhua Xu et al. [24]	$C_{S I G} + C_{V E R} + t \times (C_{E N C} + C_{D E C}$ + $C_{A G G})$	38.8	93.642	166.4	6889.3
Chai et al. [1]	$C_{S I G} + C_{V E R} + t \times (3 \times (C_{E N C} + C_{D E C}) + C_{A G G})$	24.1	96.5	222.7	18,817.9
Bonawitz et al. [5]	$C_{S I G} + C_{V E R} + 999000 + t \times (C_{E N C} + C_{D E C} + C_{A G G})$	1007.2	1031.5	1073.72	7276.88
Sun et al. [23]	$C_{S I G} + C_{V E R} + 9990 + t \times (C_{E N C} + C_{D E C} + C_{A G G})$	18.2	42.5	84.7	6287.9
Xu et al. [22]	-	961	3176	6703	15,760
Proposed	$C_{σ} + C_{V E R} + t \times (C_{E N C} + C_{D E C} + C_{A G G})$	8.11	32.25	74.3	6271.7

Table 7. Message structure and sizes of fields (bits).

Msg.	GID	MID	RAND	Payload	GS	TTL	TS
Length	16	16	128	1024	1539	8	32

Table 8. Communication cost and packet size.

Algorithm	Average Cost (Bytes)
Runhua Xu et al. [24]	135,000
Chai et al. [1]	40,500
Bonawitz et al. [5]	135,000
Sun et al. [23]	1350
Xu et al. [22]	2700
Proposed	473

Table 9. Comparison of signaling overhead with existing algorithms.

Algorithm	Number of Signals per Iteration
Algorithm	$f (n, m)$	$n = 1000, m = 2000$
Runhua Xu et al. [24]	$m + m + m + n + n + n + 1 + m$	11,001
Chai et al. [1]	$m + m + n + n + 1 + n + n + m$	10,001
Bonawitz et al. [5]	$m + m + n + n (n - 1) + n + 1 + m$	1,006,001
Sun et al. [23]	$m + n (n - 1) / 100 + n + m$	14,990
Xu et al. [22]	$m + m + n + n (n - 1) + n + 1 + m$	1,006,001
Proposed	$m + n + 1 + m + m + n + 1 + 1$	8003

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kanchan, S.; Jang, J.W.; Yoon, J.Y.; Choi, B.J. GSFedSec: Group Signature-Based Secure Aggregation for Privacy Preservation in Federated Learning. Appl. Sci. 2024, 14, 7993. https://doi.org/10.3390/app14177993

AMA Style

Kanchan S, Jang JW, Yoon JY, Choi BJ. GSFedSec: Group Signature-Based Secure Aggregation for Privacy Preservation in Federated Learning. Applied Sciences. 2024; 14(17):7993. https://doi.org/10.3390/app14177993

Chicago/Turabian Style

Kanchan, Sneha, Jae Won Jang, Jun Yong Yoon, and Bong Jun Choi. 2024. "GSFedSec: Group Signature-Based Secure Aggregation for Privacy Preservation in Federated Learning" Applied Sciences 14, no. 17: 7993. https://doi.org/10.3390/app14177993

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GSFedSec: Group Signature-Based Secure Aggregation for Privacy Preservation in Federated Learning

Abstract

1. Introduction

2. Related Work

3. GSFedSec: GS-Based SA for Privacy Preservation in FL

3.1. System Setup

3.1.1. Key Generation

3.1.2. Local Optimization

3.1.3. Signature Generation

3.1.4. Encryption

3.1.5. Verification

3.1.6. Secure Aggregation

4. Security Analysis of the Proposed Algorithm

4.1. Mathematical Analysis through BAN Logic

4.1.1. Goals and Assumptions

4.1.2. Proof of Concept

4.2. Security Proof through Simulation

4.3. Robustness against Various Attacks

4.3.1. Privacy Attack

4.3.2. Inference Attack

4.3.3. Impersonation Attack

4.3.4. Sybil Attack

4.3.5. Honest-but-Curious Setting

4.3.6. Possible Collusion by Clients

5. Performance Analysis

5.1. Computation Overhead

5.1.1. Computation Cost for the Client Message ( C C l i e n t )

5.1.2. Computation Cost on the Server Side ( C S e r v e r )

5.1.3. Computational Cost for Multiple Iterations

5.2. Computation Overhead for Bonawitz et al. [5]

5.2.1. Computation Cost on the Client Side

5.2.2. Computational Cost on the Server Side

5.2.3. Computational Cost for Multiple Iterations

5.3. Computation Overhead for Sun et al. [23]

5.3.1. Computation Cost of Sender

5.3.2. Computational Cost for Multiple Iterations

5.4. Communication Overhead

5.4.1. Cost for Admin-to-Client Communication

5.4.2. Cost for Client-Server Communication

5.4.3. Message Size for Bonawitz et al. [5]

5.4.4. Message Size for Sun et al. [23]

5.5. Signaling Overhead

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. BAN Logic Notations and Logic

Appendix A.1. Notations

Appendix A.2. Using Notations and Logic

Appendix B. Simulation Code of the Security Proof

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1.1. Computation Cost for the Client Message ( $C_{C l i e n t}$ )

5.1.2. Computation Cost on the Server Side ( $C_{S e r v e r}$ )