1. Introduction
The accelerated development of big data has resulted in the increasing application of artificial intelligence (AI) technologies. The International Data Corporation has predicted that the amount of data generated through Internet of things (IoT) devices will reach 79.4 ZB in 2025 [
1], exceeding the capacities of IoT and mobile devices worldwide [
2]. Most of the data generated by a device are processed locally or in a remote cloud server. However, this process involves three main problems pertaining to the constraints associated with real environments and remote cloud servers [
3]:
Network Congestion: network congestion on a remote cloud server occurs when numerous user devices simultaneously send data to the server.
Privacy Leak: private user experience data may be leaked due to malicious network attacks during transmission.
Resource Constraints: the capacity of network resources (e.g., wireless channel subcarriers and bandwidth) and user devices (e.g., computing performances and battery life) is limited.
Mobile edge computing (MEC) has emerged as a solution to these problems because it can process and store the big data generated by devices. MEC divides traffic and computational processes from a remote cloud server to an edge server, thereby reducing the distance between the server and clients. Specifically, instead of directly sending all the data to a remote cloud server for processing and storage, MEC analyzes, processes, and stores data at an edge server. MEC can reduce the latency and real-time processing time for high-bandwidth applications and even eliminate certain resource constraints associated with client devices. In addition, as the big data generated by devices can be used by various AI applications (e.g., autonomous driving, medical tests, and recommendation systems), machine learning (ML) tasks constitute the major workload in MEC [
4].
However, MEC cannot solve the privacy leak problem because the personal data obtained from user devices are stored or processed on an edge server. Furthermore, the network congestion issue is not resolved because large amounts of data are transmitted for ML tasks. In this context, federated learning (FL) has attracted attention as a distributed learning method to perform training over large amounts of generated data and update models on local nodes (e.g., mobile devices). In this manner, FL can alleviate network congestion, prevent privacy leaks, and reduce resource consumption for computation and communication [
5]. Furthermore, the integration of FL with MEC will be a pivotal step towards achieving ubiquitous intelligence in 6G networks. This combination will enable more efficient utilization of the vast amounts of data generated by devices through MEC [
6]. Notably, in FL, the size of model parameters updated by training local devices, which may be billions in number, can reach tens of megabytes [
7]. Consequently, a bottleneck may occur during the aggregation of model parameters in a parameter server (PS). These bottlenecks may be exacerbated by conventional FL frameworks as they are based on direct communication between clients and a PS. Consequently, it is difficult to achieve model convergence because of the error in transmitting model parameters. This aspect adversely affects the model scalability, and thus, more communication rounds and local training are required to optimize the model [
8].
Device-to-device (D2D) communication is a localized version of peer-to-peer communication that enables direct access among local devices without base stations or access points. This framework effectively reduces communication resource consumption and network delay through the use of short-distance wireless communication and increases the coverage of systems [
9]. D2D communication can overcome the above-mentioned problems because it has a hierarchical structure in the FL architecture and can decrease the communication distance between mobile devices and a PS, thereby optimizing the consumption of communication resources.
The preliminary version of this study was presented as a conference paper [
10]. We proposed an FL framework with a hierarchical structure, in which the model parameters of local nodes in a cluster are aggregated to a leader client (LC), and the LCs send the aggregated model parameters to a PS. Considering the potential of D2D communication, we developed an FL mechanism to exploit the benefits of resource consumption and short-distance communication delay. Clusters among nodes were generated via k-means clustering. In the clusters formed by k-means clustering, clients communicate with each other within a predefined threshold distance for D2D communications, and only a subset of these clients participate in FL. In this paper, We used the Pareto principle to show that the participation of a small number of clients according to a biased criterion can improve model convergence and alleviate the bottleneck in aggregating model parameters. Enhancing the preliminary version, in this study, Pareto optimality is newly employed to ensure reasonable client selection by exploiting the client resource states and training losses. Moreover, we have added credibility through additional experiments in this paper. The main contributions of this study can be summarized as follows:
We propose an FL mechanism with a hierarchical D2D structure by clustering clients on the basis of the location and communication range of each client. This mechanism can effectively reduce the wireless communication traffic generated when the FL model is updated for each client.
We propose a biased client-selection method for a clustered structure by using Pareto optimality. This client-selection method employs high training loss values to accelerate model convergence and reduce resource consumption.
2. Related Work
FL is an alternative distributed ML method. FL differs from conventional distributed ML in that it involves an extremely large number of clients with heterogeneous and unbalanced local data distributions. A key task of FL is the generation of learning models using the data collected from clients. These data are stored in local devices, which can help prevent privacy leaks and avoid model divergence due to insufficient data and failure to participate owing to a lack of resources (e.g., wireless channel subcarriers, bandwidth, computing performance, and battery life). The convergence of a learning model and resource consumption in FL exhibit a trade-off relationship. Thus, many researchers have attempted to improve the efficiency of FL by simultaneously optimizing its performance aspects.
Federated averaging (FedAvg) [
11], which is the most conventional FL mechanism, adjusts the batch size and epochs of federated stochastic gradient descent to average the gradient descent generated from a learning process, thereby significantly reducing the overall number of communication rounds by iterating more local updates on a client device. FedAsync [
12] is an asynchronous FL mechanism for updating global models, in which the mixing weight is adaptively set as a function of staleness. Notably, in [
11,
12], experiments were conducted on non-iid data, i.e., data that are not independently and identically distributed. However, a theoretical guarantee could not be realized in a convex optimization setting. In [
13], convergence guarantee for FedAvg was ensured without the impractical assumptions that the data are iid and all clients are available.
A large number of participating clients in FL may lead to server-side congestion and bottlenecks in aggregating client model parameters. Additionally, a large number of participating clients can affect the model convergence for non-iid data [
13]. By appropriately selecting the participating clients, the above-mentioned problems can be solved, and model convergence can be improved. Unlike FedAvg, in which clients are randomly selected, certain researchers [
14] considered clients with high loss values and proved that biased client selection is directly related to model convergence. The FedCS FL protocol was developed [
15] for selecting clients within a deadline to manage the resources of heterogeneous clients. This method employed biased client selection; however, it did not ensure the convergence of models for non-iid and heterogeneous data. Furthermore, stragglers may be present in a mobile communication or IoT environment, which cannot participate in FL because the network connection is not persistent or a client device has shut down. The presence of stragglers may hinder model convergence. Therefore, the FLANP FL framework was proposed [
16] to alleviate the effect of stragglers by adaptively selecting clients in different communication rounds according to their computation speeds.
FL mechanisms with various structures have been proposed. In the hierarchical FL (HFL) mechanism proposed in [
17,
18,
19], a client and server communicated through an intermediate medium rather than a direct communication structure. In [
19], the hierarchical edge federated learning (HED-FL) model enhances traditional FL with a multi-layered edge node architecture for energy-efficient learning. Two heuristic methods were also introduced to assess the effects of static and dynamic round execution across these layers. Moreover, a hierarchical cluster-based structure was developed [
17], which divided clients into several clusters based on resource constraints. A leader node (LN) was elected, which was similar to an intermediate server. Only the LN directly communicated with a PS. Thus, the bottleneck that may have occurred in the PS was eliminated, and the consumption of communication resources was reduced. Similarly, an edge server was deployed between a PS and the clients [
18]. The edge association problem was solved using an evolutionary game between the clients and the edge server. The communication resource allocation problem between the edge server and PS was solved using a Stackelberg differential game.
D2D and peer-to-peer (P2P) communication has been introduced to reduce the communication overhead for the efficient transmission of model parameters. Certain researchers [
20] examined a social attribute that was used for k-means clustering in D2D communication. A software-defined networking controller was used for clustering by calculating the social attributes between devices, rather than clustering with an unspecified majority. Other researchers proposed algorithms for D2D communication with resource allocation [
21], which could efficiently manage resources and interference. Zhang et al. [
22] proposed a D2D-assisted hierarchical FL scheme to reduce the communication overhead in D2D environments. Semi-decentralized federated edge learning (SD-FEEL) [
23] proposes a structure that aggregates clients’ model parameters and exchanges model parameters with neighboring edge servers, followed by broadcasting the updated models. Two timescale hybrid FL (TT-HF) [
24] extends the FL architecture through aperiodic local and global model consensus procedures based on D2D communications, proposing a new model of gradient diversity and an adaptive control algorithm. In another framework [
25], clients communicated with one another without a server for aggregating model parameters in FL. Moreover, topology construction was conducted through deep reinforcement learning for P2P FL [
26].
The novelty of our framework lies in the following aspects: In [
22,
24,
25], they propose FL utilizing D2D communication, and [
23] forms clusters for aggregating clients’ model parameters, similar to our work. However, we have introduced the k-means clustering technique for the formation of D2D communication networks. This approach enables the selection of leader clients located in optimal positions without exceeding the communication distance threshold. By collecting and transmitting model parameters within clusters, it offers an ideal solution to alleviate server-side bottleneck issues. In [
27], Min-Max Pareto optimization was used to manage the trade-off relationship between the algorithmic fairness and performance inconsistency for each client. FedMGDA+ [
28], which is similar to the framework proposed in [
27], realizes the multi-objective optimization of robustness, fairness, and accuracy through the Pareto stationary solution. In contrast, we consider that the model performance is proportional to the client’s resource consumption. Therefore, we solve the target problem by using the Pareto optimality and considering the trade-off relationship between the model convergence and resource consumption. In this manner, the proposed method is different from those described in [
27,
28]: A comparative analysis of FL methods, including FedPO, is encapsulated in
Table 1, which delineates the distinct communication method, hierarchical architecture, and client selection strategies employed by each technique.
6. Conclusions and Future Work
This paper proposes a new FL scheme named FedPO that uses k-means clustering for D2D communication and utilizes Pareto optimality to select participating clients based on their resource state and loss. The effectiveness of the proposed scheme is experimentally evaluated through experiments in comparison with two methods: FedAvg, the conventional centralized method, and D2D-FedAvg, a modified version for D2D communications.
Thus, FedPO is a promising approach for addressing bottlenecks, reducing server-side traffic, and saving client resources. Additionally, this method achieves faster model convergence in the initial rounds compared with the other methods.
In future work, additional experiments should be performed to evaluate the effect of environmental factors, such as communication instability and disconnection, on the FL performance. Furthermore, although we use Pareto optimality to select clients based on their loss and resource state, a wider range of considerations, such as battery life, connectivity, and computational capabilities of devices in real-world settings, may be considered for client selection. In addition, when selecting the threshold in k-means clustering, factors that may affect model convergence may be considered in addition to communication aspects.
Future research directions for FedPO implementation can be summarized as follows:
Considering the effect of environmental factors on the FL performance: future work can be aimed at examining the effects of factors such as communication instability, network disconnection, and device heterogeneity on the FL performance.
Optimizing the clustering approach: when selecting the threshold for k-means clustering, other factors affecting the model convergence, such as the data distribution and number of clusters, can be considered.
Evaluating the performance of the proposed approach in real-world scenarios: The experiments in this study are conducted in simulated environments. In future work, the performance of the proposed approach can be evaluated in real-world settings to assess its practicality and effectiveness.