1. Introduction
With the increasing diversity of network services, manual and semi-manual network operations and management methods are gradually unable to make informed decisions. In order to cope with this situation, the industry urges the realization of network intelligence with artificial intelligence (AI) [
1]. Researchers in academia also emphasize the importance of realizing network intelligence in the evolution of the mobile communication system [
2]. With the development of AI and its successive landing in various industries, the feasibility of widely applying AI in the existing network has emerged. In the prospect of 6G, it can meet both the communication and intelligence needs of users [
3]. In recent years, many organizations have carried out relevant standard research with the core goal of intelligent ubiquity. These research studies have proposed some new concepts, e.g., network data analytics function (NWDAF), the new AI core network (CN) function defined by 3GPP [
4].
The distributed deployment of an AI function, such as NWDAF, is the current mainstream solution to achieve ubiquitous intelligence in the CN, which can be called a distributed intelligence solution. We believe that this solution has the following benefits:
- 1.
In the future, CN will contain multi-dimensional and massive network data and numerous task requirements for prediction and decision making everywhere. The distributed deployment of an AI function can handle these tasks closer to the demand side, reducing processing delay and transmission overhead.
- 2.
Compared with centralized intelligence solution, the distributed intelligence solution has advantages in deploying differentiated AI functions. Functions distributed in a network can have different types of computing power and realize different levels of privacy protection. Tasks select an appropriate AI function corresponding to their computing and privacy requirements, which helps to save system energy consumption and to protect privacy.
- 3.
With the help of distributed intelligence, such as federal learning and multi-agent reinforcement learning, the network can achieve collaborative intelligence between functions. Assembling multiple AI functions to complete network AI tasks will improve computing efficiency.
This paper considers the network architecture based on the definition of NWDAF [
4], as shown in
Figure 1. We place the AI functions on each edge cloud to give them computing power. The power includes the capability of inference and training, respectively, corresponding to MTLF (model training logical function) and AnLF (analytics logical function) divided from NWDAF.
Diversified network services will make the network load show strong randomness, which is very conceivable to cause the network to be unable to provide stable and reliable services. Capturing the changing trend of user demand via traffic prediction will significantly change this situation. It will enable the network to deploy the communication and computing resources in advance and meet the quality-of-service (QoS) requirements [
5].
Although many studies have focused on cellular traffic prediction, particularly in the past two years, researchers rarely consider whether their complex and huge models are suitable for deployment in reality. An enormous gap exists between designing a model structure and deploying it [
6]. We try to narrow this gap by considering the cellular traffic prediction problem in a more realistic network architecture based on the 3GPP standard. According to the definition of 3GPP, NWDAFs store network traffic data. Thus, traffic prediction is a typical use case of network intelligence in the CN.
Training powerful AI models depends on massive data and ample computing power. Thus, the MTLF requires plentiful computing resources, while the AnLF requires a relatively small amount of computing resources. Considering the high deployment cost of NWDAF (MTLF) and the hyper-scale of the future network, we believe that ubiquitous intelligence under the distributed intelligence solution will be initially realized through the widespread deployment of AnLF. At the same time, minority edge clouds within subnets possess MTLFs with high performance, which depends on whether their physical computing resources are sufficient. As a result, different subnets have different training and inference capability levels, as shown in
Figure 1.
We aspire to help the subnets (LCP-Nets) with limited computing power improve the efficiency of traffic prediction by transferring the deep learning model pre-trained in the subnet (ACP-Nets) with abundant computing power. The proposed framework embodies the collaborative intelligence advantage but also meets the privacy protection requirement. To the best of our knowledge, this paper is the first work to consider cellular traffic prediction using deep learning models from the perspective of existing standards and practical deployment. The contributions of this paper are as follows:
- 1.
We propose an intra-cluster federated learning model transfer framework. This framework can help the LCP-Nets to achieve higher efficiency in local cellular traffic prediction tasks, leveraging the feature-based models trained cooperatively by the ACP-Nets.
- 2.
We introduce two traffic features, i.e., statistical and regional features. These features support clustering ACP-Nets from different dimensions and training the model adapted to traffic prediction tasks of specific type areas.
- 3.
We design two kinds of model aggregation strategies that can customize traffic-prediction models for the LCP-Nets by combining the matched multiple models.
- 4.
We carry out an experiment based on an actual network dataset to verify the feasibility of the above methods. The experimental results show that the proposed framework can significantly enhance the traffic prediction efficiency of LCP-Nets by improving prediction accuracy and reducing training overhead.
The rest of the paper is organized as follows.
Section 2 presents the existing research on cellular traffic prediction, especially in distributed scenarios. We describe the LCP-Net traffic prediction problem scenario in detail in
Section 3.
Section 4 proposes the model transfer framework for solving the problem and a specific algorithm scheme based on it. Experimental results and discussions are presented in
Section 5, and
Section 6 concludes the paper.
2. Related Work
As one of the spatial-temporal series prediction tasks, cellular traffic prediction methods can be divided into three types: simple, statistical, and machine-learning-based. Simple methods, i.e., using simple calculation rules to predict, such as the simple moving average (SMA). These methods are easy to implement but cannot capture the changing pattern in data, so the prediction accuracy performance is poor. In order to realize more accurate series prediction, the early researchers introduce statistical methods. The autoregressive integrated moving average (ARIMA) is a classic statistical series prediction method. It is also suitable for solving the traffic-prediction problem. Ref. [
7] divided the cellular traffic into regular and random parts and illustrated that the regular part can be predicted by the ARIMA method.
The rise of machine learning motivates researchers to attempt deep learning, the most popular method in the current field of AI, to implement cellular traffic prediction by modeling the problem as a regression problem. Ref. [
8] used the recurrent neural network (RNN) and the convolutional neural network (CNN) to extract the cellular traffic changing pattern in the time and spatial domains, respectively. In recent years, researchers have kept pace with the development of deep learning and introduced the latest methods to the cellular traffic prediction problem. These attempts include the graph neural network (GNN) [
9] and transformer [
10], which provide new ideas for extracting the spatial–temporal pattern.
However, all of the above cellular traffic prediction methods are considered from the perspective of centralized intelligence and require quite a few computing resources. It does not meet the actual computing power condition of the existing system and does not give play to the advantages of distributed intelligence. Thus, this paper explores the applicable distributed intelligence framework for cellular traffic prediction.
Few studies have tried using distributed intelligence algorithms, including federated learning, multi-agent reinforcement learning (MARL), and swarm learning, to solve the cellular traffic-prediction problem. Ref. [
11] designed a dual federated learning framework for training the cellular traffic prediction model, and as far as we know, this is the only work in this research area. Specifically, ref. [
11] proposed a two-layer federated learning through intra-cluster and inter-cluster model aggregation to realize more efficient model training. However, they did not fully consider the differences in traffic features and computing power levels between geographical regions. The dual federated learning framework is not suitable for scenarios with LCP-Nets. At the same time, due to the inter-cluster model aggregation making models involved in the training similar, it cannot train sufficiently differentiated cluster models. In this research, leveraging feature-based intra-cluster federated learning, we can train unique models for different traffic features in ACP-Nets. These models are valuable for the traffic prediction of LCP-Nets with similar features.
The concept of clustering is applied in many sub-fields of communication, and the network efficiency is improved by examining the value of each cluster head, e.g., in wireless sensor networks [
12]. In this research, we customize the model for each LCP-Net by examining each cluster’s feature and model.
Transfer learning aims to transfer models between different domains or tasks to reduce the generalization error of the target domain. Refs. [
13,
14] used transfer learning to train models for large-scale cellular traffic prediction faster. Their approaches do help with LCP-Nets model training, but they do not fully consider differentiated regional traffic features and the scenario with distributed intelligence. In this research, we achieve more refined and efficient inter-domain model transfer by aggregating cluster models suitable for different traffic features.
4. Proposed Framework
This section introduces the proposed intra-cluster federated learning-based model transfer framework for the problem scenario described in
Section 3.2. Then, some specific framework considerations are elaborated, including the concept of different types of subnet features and the multi-model aggregation strategy. Finally, we adopt the K-means algorithm and FedAvg algorithm to show the complete process of framework implementation.
4.1. Intra-Cluster Federated Learning-Based Model Transfer Framework
As shown in
Figure 4, the proposed framework consists of four steps:
- 1.
Clustering ACP-Nets according to their features.
- 2.
Intra-cluster federated learning to obtain each ACP-Net’s local model and each cluster’s global model.
- 3.
Matching LCP-Nets with corresponding clusters according to their features.
- 4.
LCP-Nets obtain their local traffic prediction models by aggregating global models of the matched clusters and fine-tuning training.
4.2. Multidimensional Traffic Feature Extraction
In the proposed framework, the subnet features
are further divided into two parts: regional features (
r) and statistical features (
t), expressed as
and
. The regional features refer to external information related to regional functions and human activities, which affect traffic to some extent, such as the number of base stations, point of interest (POI), and social activities considered in [
13]. The statistical features refer to traffic statistics, e.g., mean value and variance. We believe that taking into account these two parts of the features is necessary for accurate clustering.
4.2.1. Regional Feature
In most metropolitan areas, urban functional areas are divided according to the primary use of each area, including commercial areas, residential areas, and industrial areas. The traffic within each type of area often emerges with specific regional features.
In order to make the inference as lightweight as possible, in this paper, we do not introduce additional data. Instead, we endeavor to extract regional features from traffic data itself. We define two indirect regional features, i.e., intra-day and intra-week distribution.
Figure 5 illustrates the sampling process of the intra-week distribution. We sum the traffic volumes by the day of the week. The intra-day distribution is obtained by summing similarly. To align data, we standardize these two features to have zero mean and unit variance within each subnet.
Inspired by [
15], we select three distinct regions of Duomo, Navigli district, and Bocconi in the Milan cellular traffic dataset [
16] as examples. Duomo is the city center, Bocconi is a university, and Navigli district is a nightlife place. Their geographical locations are shown in
Figure 6a. Without other prior knowledge, it is difficult for us to judge their regional function. Fortunately, the two distributions we defined reflect it. As shown in
Figure 6b, the intra-day distribution of traffic at Navigli is different from Duomo and Bocconi. More traffic at Navigli, the nightlife place, occurs at night. Further, the intra-week distribution of traffic shown in
Figure 6c distinguishes Bocconi, the university, where traffic decreased significantly as a result of the weekend break. These illustrate the necessity of extracting regional features from the two dimensions of intra-day and intra-week distribution of the traffic.
4.2.2. Statistical Feature
Since we use the samples obtained by sliding window sampling to train the traffic prediction model, the periodicity of traffic is very critical to the training of the model. We regard it as a statistical property. We define a statistical feature, autocorrelation coefficient, to describe this property. Its calculation formula is
where
is the mean value of the traffic within the subset, and
T determines the maximum period to be calculated.
We also take Duomo, Navigli, and Bocconi as examples. As shown in
Figure 7, we let
to observe the significance of linear correlation under different periods within a week. The traffic volumes in these three regions are periodic, which indicates that we can carry out the training of these data sampled by the sliding window effectively. However, they still show fine distinctions, e.g., when
, the autocorrelation coefficient of the traffic at Navigli is almost zero, while that at Duomo is about
. It should be noted that the autocorrelation coefficient only means no linear correlation. The zero autocorrelation coefficient does not mean that the two random variables are uncorrelated.
4.3. Multi-Model Aggregation Strategy
As shown in
Figure 4, a single LCP-Net may match multiple features, i.e., corresponding to several global models generated by intra-cluster federated learning. In order to make use of all these models to improve the prediction accuracy of the traffic in LCP-Nets, we design two multi-model aggregation strategies:
- 1.
Fixed aggregation: The outputs of multiple models are aggregated and weighted by fixed weights.
- 2.
Trainable aggregation: The outputs of multiple models are aggregated and weighted by trainable weights. Meanwhile, the weights of the last n layers are set to be trainable, where n can be adjusted according to the specific computing power condition of the subnet.
Take the aggregation of two simple multilayer perceptron (MLP) neural networks as an example in
Figure 8. The aggregation operator is denoted as
.
4.4. Clustering, Federated Learning and Proposed Algorithm
Clustering and federated learning are our proposed framework’s two most critical steps. We first gather the ACP-Nets with the same feature using the clustering algorithm and then carry out intra-cluster federated learning to obtain the global model of each cluster. We believe that these global models will include knowledge about analyzing traffic with their corresponding features. Thus, they adapt to the traffic prediction task of the LCP-Nets with the same feature.
The selection of specific algorithms of clustering and federated learning is open and flexible. We adopt the K-means method to achieve clustering and the classic FedAvg method to realize intra-cluster federated learning in the proposed algorithm.
4.4.1. Preliminaries of Clustering
Clustering has been successfully applied in several fields, including pattern recognition and data mining. For given samples, it will group them into clusters based on the similarity or distance of their features. There are many types of clustering algorithms available. In this paper, we adopt K-means clustering.
K-means clustering has two steps. Firstly, determine the centers of the k clusters, and divide the samples into clusters based on their distance to the centers. Secondly, update the centers of these clusters by averaging the samples inside each cluster. Repeat the above steps until convergence, and obtain the final clustering results.
The number of clusters in the K-means clustering needs to be set manually. We select the number of clusters based on the sum of squared distances of samples to their closest cluster center (inertia) and the silhouette coefficient [
17]. Smaller inertia and a larger silhouette coefficient mean better selection. The formula of the silhouette coefficient is expressed as
where
indicates the average distance between sample
i and other samples in its cluster,
indicates the average distance between sample
i and each sample of the nearest cluster.
4.4.2. Preliminaries of Federated Learning
Federated learning is a distributed machine learning paradigm with data security and privacy protection benefits. The steps of the basic federated paradigm are as follows:
- (1)
Clients obtain the weights of global model .
- (2)
Randomly select clients set from all clients.
- (3)
Each selected client i fine-tunes the weights based on local data and obtain fine-tuned weights .
- (4)
Obtain global weights based on weights updated from selected clients.
Most federated learning algorithms repeat the above steps until the training converges or the maximum number of iterations is reached. Many federal learning algorithms have emerged so far. In this paper, we adopt the classic FedAvg algorithm.
The whole procedure of the proposed framework is summarized in the form of the concrete algorithm, Algorithm 1.
Algorithm 1: Implementation of intra-cluster federated learning-based model transfer framework for traffic prediction of LCP-Nets |
Input: Feature of ACP-Nets and LCP-Nets, , ;Dimension of feature, N; Training dataset of ACP-Nets and LCP-Nets, , ; Fraction of subnets, ; Learning rate, . Output: The model weights . foreach dimension of feature do Determine cluster size C based on silhouette coefficient. Group into C clusters using K-Means and obtain cluster center . for each cluster do Randomize . for each round do a random set of ACP-Nets in for each ACP-Net in paralleldo . . Obtain the global model of cluster . foreach LCP-Net do for each dimension of feature do Group into cluster using K-Means with . Obtain matched model set corresponding to . Aggregate match models and obtain model for LCP-Net p .
|
5. Experimental Results and Discussions
In this section, we introduce the details related to the experiment, including datasets and evaluation methods, then give the experiment results and corresponding discussions.
5.1. Dataset and Preprocessing
We use the cellular traffic datasets [
16,
18] launched by Telecom Italia for the experiment. These two datasets record the call details of Milan (MI) [
16] and Trentino (TN) [
18] in the last two months of 2013. They are the most commonly used datasets in cellular traffic prediction [
6]. The datasets contain five types of traffic, i.e., SMS in/out, voice call in/out, and internet services, and they are recorded in temporal–spatial granularity. We experiment exclusively on voice call in traffic and internet service traffic, the most common cellular traffic in the existing network. In the spatial domain, the datasets divide Milan and Trentino into 10,000 grids and 6575 grids according to their geographical location. In the time domain, the traffic records are generated every 10 min.
We perform a series of preprocessing operations on the datasets:
- 1.
The too-short statistical interval will make the unpredictable random components of the sequence significant. At the same time, the too-long statistical interval will make it impossible to extract enough training samples from time-limited datasets. Therefore, we aggregate traffic data at one-hour intervals.
- 2.
We intercept the first seven weeks of data from two months of traffic data to avoid the impact of unconventional traffic during Christmas week on the experiment. Our task is to predict the traffic in the last week of the seven weeks based on the traffic in the beginning six weeks.
- 3.
We standardize the traffic data to have zero mean and unit variance within each grid.
These two datasets well-match the scenario considered in this paper. Since the traffic of each grid is generated by the communication services provided by multiple BSs inside it, we consider a grid as a subnet.
5.2. Setting of Baseline Methods and Evaluation Metrics
5.2.1. Setting of Baseline Methods
In the experiment of this paper, our goal is to verify that the proposed intra-cluster federated learning model transfer framework can improve the prediction efficiency of LCP-Nets. Therefore, we compare five methods as follows:
- 1.
SMA (simple moving average): SMA is the primary traffic prediction method, which is the easiest to implement. When an appropriate sampling strategy, such as one based on the autocorrelation coefficient, is adopted, this method can achieve reasonably high prediction accuracy. We use it as the baseline to ensure that the trained deep learning models work.
- 2.
ARIMA (autoregressive integrated moving average): ARIMA is representative of the statistical method in time series prediction, and its complexity of model training is much lower than that of the deep learning method. We believe that most LCP-Nets have enough computing power to train ARIMA models.
- 3.
LTNN (local-trained neural network): LCP-Nets train their traffic prediction models locally. LTNN cannot be achieved in the considered scenario where LCP-Nets are not equipped with sufficient computing power. After parameter adjustment, we regard it as the most accurate method ideally.
- 4.
ICFed-F (intra-cluster federated learning-based transfer framework with fixed aggregation strategy): LCP-Nets obtain models from feature-matched clusters and aggregate them via the fixed strategy, according to the proposed framework.
- 5.
ICFed-T (intra-cluster federated learning-based transfer framework with trainable aggregation strategy): Assume that LCP-Nets have the computing capability to carry out a few model training tasks. ICFed-T allows fine tuning training in LCP-Nets after ICFed-F.
5.2.2. Evaluation Metrics
We use three evaluation metrics to evaluate the effect of the above methods, i.e., MAE, RMSE and .
- 1.
MAE (mean absolute error) is the most common regression metric. Its calculation formula is
where
is the predictive value and
is the actual value.
- 2.
RMSE (root mean square error) is extended by MAE. It amplifies the error value, and its calculation formula is
- 3.
(coefficient of determination) is another commonly used measure of goodness of fit, and its calculation formula is
where
is the average of the actual value.
5.3. The Result of Experiment
In this section, we introduce the experimental results in two parts, clustering and traffic prediction results, corresponding to Steps 1 and 2, and Steps 3 and 4 of the proposed framework in
Section 4.1, respectively.
We use the MI dataset to introduce the sampling and clustering process of the experiment in detail. For the MI dataset, we randomly sample 100 and 20 grids from 10,000 of them as ACP-Nets and LCP-Nets, respectively. As shown in
Figure 9, the two types of subnets are distributed all over Milan.
5.3.1. The Result of Clustering
The first step of the experiment is clustering ACP-Nets. The number of clusters corresponding to the autocorrelation coefficient, intra-week feature, and intra-day distribution is set to 8, 7, and 4, respectively, based on the silhouette coefficient and inertia.
As shown in
Figure 10, subnets of different clusters show different characteristics, some of which can be explained and verified by common sense. In
Figure 10a, the traffic of subnets in most clusters shows significant and stable cycle dependency. A few subnets in Clusters 2 and 4 do not show this characteristic. We believe that the traffic-changing pattern of subnets in these clusters differs from that in other clusters.
As shown in
Figure 10b,c, the clustering results of intra-weekly and intra-day traffic distribution almost reflect the classification of urban functional areas. For example, Cluster 1 in
Figure 10b represents the non-residential area, where traffic on the weekend is significantly lower than on weekdays, and Cluster 3 in
Figure 10c represents the places of entertainment, where traffic mainly occurs at night. Further, these two features can be combined to achieve a more specific functional area description.
Due to the artificiality of traffic, there are abnormal values in the traffic dataset, such as Cluster 4 in
Figure 10c. In most previous algorithms, these abnormal values may affect the performance of the trained model. However, LCP-Nets will automatically avoid this effect in the proposed algorithm by not selecting the corresponding clusters. In addition, if there is manual participation, this step will also help with data cleaning.
After clustering, we obtain the feature-based models by intra-cluster federated learning as the second step. The selection of specific model structures in ICFed is very flexible. Before training cluster models using federation learning, we test the influence of different network structures on traffic prediction within a single grid in the MI dataset to find the network structure suitable for different types of traffic. We compare MLP and long short term memory (LSTM) with a smaller and larger size, respectively. Specifically, the compared network structures are as follows:
- 1.
MLP-S: A smaller size MLP. , i.e., three hourly samples, two daily samples, and one weekly sample composed of the 6-dimensional input.There are two hidden layers with a size of 16. One-dimensional output is the predicted value of the network to the target time slot.
- 2.
MLP-L: A larger size MLP. , i.e., one hundred sixty-eight samples as the 168-dimensional input. Similar to the MLP-S, except that the hidden layer size is 128.
- 3.
LSTM-S: A smaller size single-layer LSTM with a hidden layer size of 8. , i.e., same input as MLP-S with a hidden layer size of 8. The one-dimensional predicted value is output through a fully connected (FC) layer at the end of the LSTM module.
- 4.
LSTM-L: A larger size single-layer LSTM with a hidden layer size of 16. , , , i.e., same input as MLP-L with a hidden layer size of 8. The FC layer after the LSTM module outputs a one-dimensional predicted value.
As shown in
Table 1, we also record the corresponding amount of calculation and the number of parameters. The experimental results show that the larger the model size, the better the prediction performance, which is incorrect. Due to the limited training samples of traffic within a single grid, a too-large model size may lead to overfitting. In order to use models to be as lightweight as possible and to obtain better prediction results, we use MLP-S to predict the call-in traffic and LSTM-S to predict the internet service traffic. We trained four cluster models for each type of traffic, which correspond to four different features, i.e., the traffic on weekdays/weekends is significant, and the peak of traffic occurs at night or in the afternoon.
5.3.2. The Result of Traffic Prediction
In Step 3 and Step 4, LCP-Nets obtain feature-based models from the feature-matched clusters and aggregate them to customize their models. The method names, ICFed-F or ICFed-T, are distinguished by aggregation strategies. ICFed-T can be regarded as extending a fine-tuning training step based on ICFed-F.
The evaluation results of the prediction performance are shown in
Table 2.
and
represent the number of communication rounds of ACP-Nets’ federated learning and the number of epochs of LCP-Nets’ local fine-tuning training, respectively. With the increased number of communication rounds, the traffic prediction error of LCP-Nets decreases gradually. When the number of communication rounds is 50, the performance of ICFed-F is better than that of SMA.
However, since ICFed-F never uses the historical traffic data of LCP-Nets to train the models, even if the number of communication rounds is increased to 100, the performance is still far from the ideal LTNN. The advantage of the ICFed-F method is that LCP-Nets no longer need to obtain high-precision neural network models through local training.
Assuming that LCP-Nets have a few computing resources for lightweight model training, we can use the ICFed-F method in which the last FC layer is trainable. As shown in
Table 2, with the increase in the number of fine-tuning training epochs, the prediction error continues to decline on the basis of ICFed-F and reaches a level very close to the performance of LTNN when the number of fine-tuning training epochs is 150. Limited computing power, which results in being incapable of training all the weights in the model, gives rise to this slight gap.
Generally speaking, ICFed-F enables LCP-Nets to achieve better traffic prediction performance with deep learning under the premise of zero local training cost, and ICFed-T makes the performance of the traffic prediction models trained by the LCP-Nets close to that of the local adequately trained models under limited computing power. Both of them achieve privacy protection by obtaining feature-based models through feature matching. At the same time, intra-cluster federated learning is also helpful for efficient training of ACP-Nets’ local models. The proposed framework benefits both the LCP-Nets and the ACP-Nets.
6. Conclusions
This paper proposes an intra-cluster federated learning model transfer framework to help LCP-Nets customize models. We define two types of traffic features to cluster ACP-Nets to obtain specialized traffic prediction models for subnets with specific characteristics through intra-cluster federated learning. We design two aggregation strategies to help LCP-Nets to combine the obtained models. The experimental results show that the proposed framework can improve the traffic prediction efficiency of LCP-Nets by reducing prediction error and training overhead. Nevertheless, there are still shortcomings that exist. On the one hand, predicting future traffic in the proposed framework depends entirely on historical traffic and lacks using other multidimensional data, e.g., regional population density and emergency occurrence information. These data are valuable for cellular traffic prediction. On the other hand, traffic features need to be defined by model users. Although these manually defined features are simple and effective, there may still be some other helpful traffic features that are expected to be mined by artificial intelligence. We believe that improving the shortcomings of these two aspects will be the focus of future work.