Intra-Cluster Federated Learning-Based Model Transfer Framework for Traffic Prediction in Core Network

Li, Pengyu; Shi, Yingji; Xing, Yanxia; Liao, Chaorui; Yu, Menghan; Guo, Chengwei; Feng, Lei

doi:10.3390/electronics11223793

Open AccessArticle

Intra-Cluster Federated Learning-Based Model Transfer Framework for Traffic Prediction in Core Network

by

Pengyu Li

¹,

Yingji Shi

^2,*

,

Yanxia Xing

¹,

Chaorui Liao

²,

Menghan Yu

¹,

Chengwei Guo

² and

Lei Feng

²

¹

6G Research Center, China Telecom Research Institute, Beijing 102209, China

²

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(22), 3793; https://doi.org/10.3390/electronics11223793

Submission received: 29 September 2022 / Revised: 14 November 2022 / Accepted: 15 November 2022 / Published: 18 November 2022

Download

Browse Figures

Versions Notes

Abstract

:

Accurate prediction of cellular traffic will contribute to efficient operations and management of mobile network. With deep learning, many studies have achieved exact cellular traffic prediction. However, the reality is that quite a few subnets in the core network do not have sufficient computing power to train their deep learning model, which we call subnets (LCP-Nets) with limited computing power. In order to improve the traffic prediction efficiency of LCP-Nets with the help of deep learning and the subnets (ACP-Nets) with abundant computing power under the requirement of privacy protection, this paper proposes an intra-cluster federated learning-based model transfer framework. This framework customizes models for LCP-Nets, leveraging transferring models trained by ACP-Nets. Experimental results on the public dataset show that the framework can improve the efficiency of LCP-Nets traffic prediction.

Keywords:

traffic prediction; core network; federated learning

1. Introduction

With the increasing diversity of network services, manual and semi-manual network operations and management methods are gradually unable to make informed decisions. In order to cope with this situation, the industry urges the realization of network intelligence with artificial intelligence (AI) [1]. Researchers in academia also emphasize the importance of realizing network intelligence in the evolution of the mobile communication system [2]. With the development of AI and its successive landing in various industries, the feasibility of widely applying AI in the existing network has emerged. In the prospect of 6G, it can meet both the communication and intelligence needs of users [3]. In recent years, many organizations have carried out relevant standard research with the core goal of intelligent ubiquity. These research studies have proposed some new concepts, e.g., network data analytics function (NWDAF), the new AI core network (CN) function defined by 3GPP [4].

The distributed deployment of an AI function, such as NWDAF, is the current mainstream solution to achieve ubiquitous intelligence in the CN, which can be called a distributed intelligence solution. We believe that this solution has the following benefits:

1.: In the future, CN will contain multi-dimensional and massive network data and numerous task requirements for prediction and decision making everywhere. The distributed deployment of an AI function can handle these tasks closer to the demand side, reducing processing delay and transmission overhead.
2.: Compared with centralized intelligence solution, the distributed intelligence solution has advantages in deploying differentiated AI functions. Functions distributed in a network can have different types of computing power and realize different levels of privacy protection. Tasks select an appropriate AI function corresponding to their computing and privacy requirements, which helps to save system energy consumption and to protect privacy.
3.: With the help of distributed intelligence, such as federal learning and multi-agent reinforcement learning, the network can achieve collaborative intelligence between functions. Assembling multiple AI functions to complete network AI tasks will improve computing efficiency.

This paper considers the network architecture based on the definition of NWDAF [4], as shown in Figure 1. We place the AI functions on each edge cloud to give them computing power. The power includes the capability of inference and training, respectively, corresponding to MTLF (model training logical function) and AnLF (analytics logical function) divided from NWDAF.

Diversified network services will make the network load show strong randomness, which is very conceivable to cause the network to be unable to provide stable and reliable services. Capturing the changing trend of user demand via traffic prediction will significantly change this situation. It will enable the network to deploy the communication and computing resources in advance and meet the quality-of-service (QoS) requirements [5].

Although many studies have focused on cellular traffic prediction, particularly in the past two years, researchers rarely consider whether their complex and huge models are suitable for deployment in reality. An enormous gap exists between designing a model structure and deploying it [6]. We try to narrow this gap by considering the cellular traffic prediction problem in a more realistic network architecture based on the 3GPP standard. According to the definition of 3GPP, NWDAFs store network traffic data. Thus, traffic prediction is a typical use case of network intelligence in the CN.

Training powerful AI models depends on massive data and ample computing power. Thus, the MTLF requires plentiful computing resources, while the AnLF requires a relatively small amount of computing resources. Considering the high deployment cost of NWDAF (MTLF) and the hyper-scale of the future network, we believe that ubiquitous intelligence under the distributed intelligence solution will be initially realized through the widespread deployment of AnLF. At the same time, minority edge clouds within subnets possess MTLFs with high performance, which depends on whether their physical computing resources are sufficient. As a result, different subnets have different training and inference capability levels, as shown in Figure 1.

We aspire to help the subnets (LCP-Nets) with limited computing power improve the efficiency of traffic prediction by transferring the deep learning model pre-trained in the subnet (ACP-Nets) with abundant computing power. The proposed framework embodies the collaborative intelligence advantage but also meets the privacy protection requirement. To the best of our knowledge, this paper is the first work to consider cellular traffic prediction using deep learning models from the perspective of existing standards and practical deployment. The contributions of this paper are as follows:

1.: We propose an intra-cluster federated learning model transfer framework. This framework can help the LCP-Nets to achieve higher efficiency in local cellular traffic prediction tasks, leveraging the feature-based models trained cooperatively by the ACP-Nets.
2.: We introduce two traffic features, i.e., statistical and regional features. These features support clustering ACP-Nets from different dimensions and training the model adapted to traffic prediction tasks of specific type areas.
3.: We design two kinds of model aggregation strategies that can customize traffic-prediction models for the LCP-Nets by combining the matched multiple models.
4.: We carry out an experiment based on an actual network dataset to verify the feasibility of the above methods. The experimental results show that the proposed framework can significantly enhance the traffic prediction efficiency of LCP-Nets by improving prediction accuracy and reducing training overhead.

The rest of the paper is organized as follows. Section 2 presents the existing research on cellular traffic prediction, especially in distributed scenarios. We describe the LCP-Net traffic prediction problem scenario in detail in Section 3. Section 4 proposes the model transfer framework for solving the problem and a specific algorithm scheme based on it. Experimental results and discussions are presented in Section 5, and Section 6 concludes the paper.

2. Related Work

As one of the spatial-temporal series prediction tasks, cellular traffic prediction methods can be divided into three types: simple, statistical, and machine-learning-based. Simple methods, i.e., using simple calculation rules to predict, such as the simple moving average (SMA). These methods are easy to implement but cannot capture the changing pattern in data, so the prediction accuracy performance is poor. In order to realize more accurate series prediction, the early researchers introduce statistical methods. The autoregressive integrated moving average (ARIMA) is a classic statistical series prediction method. It is also suitable for solving the traffic-prediction problem. Ref. [7] divided the cellular traffic into regular and random parts and illustrated that the regular part can be predicted by the ARIMA method.

The rise of machine learning motivates researchers to attempt deep learning, the most popular method in the current field of AI, to implement cellular traffic prediction by modeling the problem as a regression problem. Ref. [8] used the recurrent neural network (RNN) and the convolutional neural network (CNN) to extract the cellular traffic changing pattern in the time and spatial domains, respectively. In recent years, researchers have kept pace with the development of deep learning and introduced the latest methods to the cellular traffic prediction problem. These attempts include the graph neural network (GNN) [9] and transformer [10], which provide new ideas for extracting the spatial–temporal pattern.

However, all of the above cellular traffic prediction methods are considered from the perspective of centralized intelligence and require quite a few computing resources. It does not meet the actual computing power condition of the existing system and does not give play to the advantages of distributed intelligence. Thus, this paper explores the applicable distributed intelligence framework for cellular traffic prediction.

Few studies have tried using distributed intelligence algorithms, including federated learning, multi-agent reinforcement learning (MARL), and swarm learning, to solve the cellular traffic-prediction problem. Ref. [11] designed a dual federated learning framework for training the cellular traffic prediction model, and as far as we know, this is the only work in this research area. Specifically, ref. [11] proposed a two-layer federated learning through intra-cluster and inter-cluster model aggregation to realize more efficient model training. However, they did not fully consider the differences in traffic features and computing power levels between geographical regions. The dual federated learning framework is not suitable for scenarios with LCP-Nets. At the same time, due to the inter-cluster model aggregation making models involved in the training similar, it cannot train sufficiently differentiated cluster models. In this research, leveraging feature-based intra-cluster federated learning, we can train unique models for different traffic features in ACP-Nets. These models are valuable for the traffic prediction of LCP-Nets with similar features.

The concept of clustering is applied in many sub-fields of communication, and the network efficiency is improved by examining the value of each cluster head, e.g., in wireless sensor networks [12]. In this research, we customize the model for each LCP-Net by examining each cluster’s feature and model.

Transfer learning aims to transfer models between different domains or tasks to reduce the generalization error of the target domain. Refs. [13,14] used transfer learning to train models for large-scale cellular traffic prediction faster. Their approaches do help with LCP-Nets model training, but they do not fully consider differentiated regional traffic features and the scenario with distributed intelligence. In this research, we achieve more refined and efficient inter-domain model transfer by aggregating cluster models suitable for different traffic features.

3. Problem Formulation

This section proposes the traffic prediction problem for ACP-Nets and LCP-Nets.

3.1. Single Subnet Traffic Prediction

As shown in Figure 1, a subnet contains multiple BSs and an edge cloud. In each subnet, the edge cloud stores the traffic data of the local BSs with different metrics, including call traffic, SMS traffic, and internet traffic. The traffic of a subnet in consecutive T time intervals is presented as

d^{T} = [\begin{matrix} d_{1} & d_{2} & \dots & d_{T} \end{matrix}]

. The target traffic volume of prediction at the current time interval is denoted as

d_{t}

for each subnet. Suppose that the traffic volumes of the previous

t - 1

time intervals are known.

In order to improve the training and inference efficiency of the prediction model, we reduce the number of input features, i.e., only using part of the historical traffic for prediction. Based on our life experience, we believe that cellular traffic may show regularity on hourly, daily, and weekly scales. Thus, we sample

p, q, r

time intervals on each scale, and the prediction process can be expressed as

\begin{matrix} {\hat{d}}_{t} = f (\overset{p h o u r l y s a m p l e s}{\overset{︷}{d_{t - 1}, \dots, d_{t - p}}}, \overset{q d a i l y s a m p l e s}{\overset{︷}{d_{t - s_{d a y} \times 1}, \dots, d_{t - s_{d a y} \times q}}}, \overset{r w e e k l y s a m p l e s}{\overset{︷}{d_{t - s_{w e e k} \times 1}, \dots, d_{t - s_{w e e k} \times r}}}; w), \end{matrix}

(1)

where

f (\cdot)

denotes the prediction model, w the corresponding weights, and

s_{d a y}, s_{w e e k}

the sliding window sizes for daily and weekly sampling.

s_{d a y} = 24, s_{w e e k} = 168 (24 \times 7)

, if the unit of time interval is one hour. The process of sampling and prediction is shown in Figure 2.

3.2. Multiple Subnets Scenario

As shown in Figure 3, given K ACP-Nets and P LCP-Nets, all subnets are connected to a cloud node in the CN. Data, including model weights and multi-dimensional features, are allowed to be transmitted between each subnet and the cloud node, except for specific traffic volumes to protect privacy. We denote the features of ACP-Nets and LCP-Nets uploaded to the cloud node as

C = [\begin{matrix} c_{1} & c_{2} & \dots & c_{K} \end{matrix}]

and

C^{*} = [\begin{matrix} c_{1}^{*} & c_{2}^{*} & \dots & c_{P}^{*} \end{matrix}]

, respectively.

With the powerful capability of model training, ACP-Nets can train their prediction models on previous traffic data and obtain the weights w by solving

arg min_{w} L (f ({\hat{d}}_{t}; w), d_{t}),

(2)

where

L

is the loss function.

Denote the optimized model weights of K ACP-Nets as

W = [\begin{matrix} w_{1} & w_{2} & \dots & w_{K} \end{matrix}]

.

With the limited capability of model training, LCP-Nets have to rely on receiving model weights from the cloud node for their local inference. The received weights are denoted as

W^{*} = [\begin{matrix} w_{1}^{*} & w_{2}^{*} & \dots & w_{P}^{*} \end{matrix}]

.

The objective of this scenario is to minimize the error in the inference of LCP-Nets. Thus, the problem of subnet k is formulated as

min_{G (\cdot)} L (f ({\hat{d}}_{t}; w_{k}^{*}), d_{t}) |_{C, C^{*}, W, W^{*} = G (C, C^{*}, W)},

(3)

where

G (\cdot)

represents the function of the cloud node to construct

W^{*}

.

4. Proposed Framework

This section introduces the proposed intra-cluster federated learning-based model transfer framework for the problem scenario described in Section 3.2. Then, some specific framework considerations are elaborated, including the concept of different types of subnet features and the multi-model aggregation strategy. Finally, we adopt the K-means algorithm and FedAvg algorithm to show the complete process of framework implementation.

4.1. Intra-Cluster Federated Learning-Based Model Transfer Framework

As shown in Figure 4, the proposed framework consists of four steps:

1.: Clustering ACP-Nets according to their features.
2.: Intra-cluster federated learning to obtain each ACP-Net’s local model and each cluster’s global model.
3.: Matching LCP-Nets with corresponding clusters according to their features.
4.: LCP-Nets obtain their local traffic prediction models by aggregating global models of the matched clusters and fine-tuning training.

4.2. Multidimensional Traffic Feature Extraction

In the proposed framework, the subnet features

C, C^{*}

are further divided into two parts: regional features (r) and statistical features (t), expressed as

C^{T} = [\begin{matrix} C_{r} & C_{t} \end{matrix}]

and

C^{* T} = [\begin{matrix} C_{r}^{*} & C_{t}^{*} \end{matrix}]

. The regional features refer to external information related to regional functions and human activities, which affect traffic to some extent, such as the number of base stations, point of interest (POI), and social activities considered in [13]. The statistical features refer to traffic statistics, e.g., mean value and variance. We believe that taking into account these two parts of the features is necessary for accurate clustering.

4.2.1. Regional Feature

In most metropolitan areas, urban functional areas are divided according to the primary use of each area, including commercial areas, residential areas, and industrial areas. The traffic within each type of area often emerges with specific regional features.

In order to make the inference as lightweight as possible, in this paper, we do not introduce additional data. Instead, we endeavor to extract regional features from traffic data itself. We define two indirect regional features, i.e., intra-day and intra-week distribution. Figure 5 illustrates the sampling process of the intra-week distribution. We sum the traffic volumes by the day of the week. The intra-day distribution is obtained by summing similarly. To align data, we standardize these two features to have zero mean and unit variance within each subnet.

Inspired by [15], we select three distinct regions of Duomo, Navigli district, and Bocconi in the Milan cellular traffic dataset [16] as examples. Duomo is the city center, Bocconi is a university, and Navigli district is a nightlife place. Their geographical locations are shown in Figure 6a. Without other prior knowledge, it is difficult for us to judge their regional function. Fortunately, the two distributions we defined reflect it. As shown in Figure 6b, the intra-day distribution of traffic at Navigli is different from Duomo and Bocconi. More traffic at Navigli, the nightlife place, occurs at night. Further, the intra-week distribution of traffic shown in Figure 6c distinguishes Bocconi, the university, where traffic decreased significantly as a result of the weekend break. These illustrate the necessity of extracting regional features from the two dimensions of intra-day and intra-week distribution of the traffic.

4.2.2. Statistical Feature

Since we use the samples obtained by sliding window sampling to train the traffic prediction model, the periodicity of traffic is very critical to the training of the model. We regard it as a statistical property. We define a statistical feature, autocorrelation coefficient, to describe this property. Its calculation formula is

r_{k} = \frac{\sum_{t = 1}^{T - k} (d_{t} - \bar{d}) (d_{t + k} - \bar{d})}{\sum_{t = 1}^{T} {(d_{t} - \bar{d})}^{2}}, 0 \leq k \leq T,

(4)

where

\bar{d}

is the mean value of the traffic within the subset, and T determines the maximum period to be calculated.

We also take Duomo, Navigli, and Bocconi as examples. As shown in Figure 7, we let

t = 168 (24 \times 7)

to observe the significance of linear correlation under different periods within a week. The traffic volumes in these three regions are periodic, which indicates that we can carry out the training of these data sampled by the sliding window effectively. However, they still show fine distinctions, e.g., when

k = 12

, the autocorrelation coefficient of the traffic at Navigli is almost zero, while that at Duomo is about

- 0.7

. It should be noted that the autocorrelation coefficient only means no linear correlation. The zero autocorrelation coefficient does not mean that the two random variables are uncorrelated.

4.3. Multi-Model Aggregation Strategy

As shown in Figure 4, a single LCP-Net may match multiple features, i.e., corresponding to several global models generated by intra-cluster federated learning. In order to make use of all these models to improve the prediction accuracy of the traffic in LCP-Nets, we design two multi-model aggregation strategies:

1.: Fixed aggregation: The outputs of multiple models are aggregated and weighted by fixed weights.
2.: Trainable aggregation: The outputs of multiple models are aggregated and weighted by trainable weights. Meanwhile, the weights of the last n layers are set to be trainable, where n can be adjusted according to the specific computing power condition of the subnet.

Take the aggregation of two simple multilayer perceptron (MLP) neural networks as an example in Figure 8. The aggregation operator is denoted as

G^{'} (\cdot)

.

4.4. Clustering, Federated Learning and Proposed Algorithm

Clustering and federated learning are our proposed framework’s two most critical steps. We first gather the ACP-Nets with the same feature using the clustering algorithm and then carry out intra-cluster federated learning to obtain the global model of each cluster. We believe that these global models will include knowledge about analyzing traffic with their corresponding features. Thus, they adapt to the traffic prediction task of the LCP-Nets with the same feature.

The selection of specific algorithms of clustering and federated learning is open and flexible. We adopt the K-means method to achieve clustering and the classic FedAvg method to realize intra-cluster federated learning in the proposed algorithm.

4.4.1. Preliminaries of Clustering

Clustering has been successfully applied in several fields, including pattern recognition and data mining. For given samples, it will group them into clusters based on the similarity or distance of their features. There are many types of clustering algorithms available. In this paper, we adopt K-means clustering.

K-means clustering has two steps. Firstly, determine the centers of the k clusters, and divide the samples into clusters based on their distance to the centers. Secondly, update the centers of these clusters by averaging the samples inside each cluster. Repeat the above steps until convergence, and obtain the final clustering results.

The number of clusters in the K-means clustering needs to be set manually. We select the number of clusters based on the sum of squared distances of samples to their closest cluster center (inertia) and the silhouette coefficient [17]. Smaller inertia and a larger silhouette coefficient mean better selection. The formula of the silhouette coefficient is expressed as

s = \frac{1}{n} \sum_{i = 1}^{n} \frac{b (i) - a (i)}{max {a (i), b (i)}},

(5)

where

a (i)

indicates the average distance between sample i and other samples in its cluster,

b (i)

indicates the average distance between sample i and each sample of the nearest cluster.

4.4.2. Preliminaries of Federated Learning

Federated learning is a distributed machine learning paradigm with data security and privacy protection benefits. The steps of the basic federated paradigm are as follows:

(1): Clients obtain the weights of global model $w^{t}$ .
(2): Randomly select clients set $S_{t}$ from all clients.
(3): Each selected client i fine-tunes the weights based on local data and obtain fine-tuned weights $w_{i}^{t + 1}$ .
(4): Obtain global weights $w^{t + 1}$ based on weights $w_{i}^{t + 1}$ updated from selected clients.

Most federated learning algorithms repeat the above steps until the training converges or the maximum number of iterations is reached. Many federal learning algorithms have emerged so far. In this paper, we adopt the classic FedAvg algorithm.

The whole procedure of the proposed framework is summarized in the form of the concrete algorithm, Algorithm 1.

Algorithm 1: Implementation of intra-cluster federated learning-based model transfer framework for traffic prediction of LCP-Nets

$_{1}$: Input: Feature of ACP-Nets and LCP-Nets, ${\{c_{k}\}}_{k = 1}^{K}$ , ${\{c_{p}\}}_{p = 1}^{P}$ ;Dimension of feature, N; Training dataset of ACP-Nets and LCP-Nets, ${\{x_{k}, y_{k}\}}_{k = 1}^{K}$ , ${\{x_{p}, y_{p}\}}_{p = 1}^{P}$ ; Fraction of subnets, $σ$ ; Learning rate, $η$ .
$_{2}$: Output: The model weights ${\{w_{p}\}}_{p = 1}^{P}$ .
$_{3}$: foreach dimension of feature $n = 1, 2, . . ., N$ do
$_{4}$: Determine cluster size C based on silhouette coefficient.
$_{5}$: Group ${\{c_{k}^{n}\}}_{k = 1}^{K}$ into C clusters $L_{n} = (l_{1}, l_{2}, . . ., l_{C})$ using K-Means and obtain cluster center ${\{v_{c}^{n}\}}_{c = 1}^{C}$ .
$_{6}$: for each cluster $c l = 1, 2, . . ., C$ do
$_{7}$: Randomize $w_{c l}^{0}$ .
$_{8}$: for each round $t = 1, 2, . . .$ do
$_{9}$: $S_{t} \leftarrow$ a random set of $max (K \cdot σ, 1)$ ACP-Nets in $l_{c l}$
$_{10}$: for each ACP-Net $k = 1, 2, . . ., K$ in paralleldo
$_{11}$: $w_{k}^{t + 1} \leftarrow w_{k}^{t} - η \nabla_{w_{c l}^{t}} L (f (x_{k}; w_{c l}^{t}), y_{k})$ .
$_{12}$: $w_{c l}^{t + 1} \leftarrow \sum_{k = 1}^{K} \frac{n_{k}}{n} w_{k}^{t + 1}$ .
$_{13}$: Obtain the global model of cluster $c l$ $w_{c l}^{n} \leftarrow w_{c l}^{t + 1}$ .
$_{14}$: foreach LCP-Net $p = 1, 2, . . ., P$ do
$_{15}$: for each dimension of feature $n = 1, 2, . . ., N$ do
$_{16}$: Group ${\{c_{p}^{n}\}}_{p = 1}^{P}$ into cluster $l_{n}$ using K-Means with ${\{v_{c}^{n}\}}_{c = 1}^{C}$ .
$_{17}$: Obtain matched model set $W^{'} = {\{w_{n}\}}_{n = 1}^{N}$ corresponding to ${\{l_{n}\}}_{n = 1}^{N}$ .
$_{18}$: Aggregate match models and obtain model for LCP-Net p $w_{p} \leftarrow G^{'} (W^{'})$ .

5. Experimental Results and Discussions

In this section, we introduce the details related to the experiment, including datasets and evaluation methods, then give the experiment results and corresponding discussions.

5.1. Dataset and Preprocessing

We use the cellular traffic datasets [16,18] launched by Telecom Italia for the experiment. These two datasets record the call details of Milan (MI) [16] and Trentino (TN) [18] in the last two months of 2013. They are the most commonly used datasets in cellular traffic prediction [6]. The datasets contain five types of traffic, i.e., SMS in/out, voice call in/out, and internet services, and they are recorded in temporal–spatial granularity. We experiment exclusively on voice call in traffic and internet service traffic, the most common cellular traffic in the existing network. In the spatial domain, the datasets divide Milan and Trentino into 10,000 grids and 6575 grids according to their geographical location. In the time domain, the traffic records are generated every 10 min.

We perform a series of preprocessing operations on the datasets:

1.: The too-short statistical interval will make the unpredictable random components of the sequence significant. At the same time, the too-long statistical interval will make it impossible to extract enough training samples from time-limited datasets. Therefore, we aggregate traffic data at one-hour intervals.
2.: We intercept the first seven weeks of data from two months of traffic data to avoid the impact of unconventional traffic during Christmas week on the experiment. Our task is to predict the traffic in the last week of the seven weeks based on the traffic in the beginning six weeks.
3.: We standardize the traffic data to have zero mean and unit variance within each grid.

These two datasets well-match the scenario considered in this paper. Since the traffic of each grid is generated by the communication services provided by multiple BSs inside it, we consider a grid as a subnet.

5.2. Setting of Baseline Methods and Evaluation Metrics

5.2.1. Setting of Baseline Methods

In the experiment of this paper, our goal is to verify that the proposed intra-cluster federated learning model transfer framework can improve the prediction efficiency of LCP-Nets. Therefore, we compare five methods as follows:

1.: SMA (simple moving average): SMA is the primary traffic prediction method, which is the easiest to implement. When an appropriate sampling strategy, such as one based on the autocorrelation coefficient, is adopted, this method can achieve reasonably high prediction accuracy. We use it as the baseline to ensure that the trained deep learning models work.
2.: ARIMA (autoregressive integrated moving average): ARIMA is representative of the statistical method in time series prediction, and its complexity of model training is much lower than that of the deep learning method. We believe that most LCP-Nets have enough computing power to train ARIMA models.
3.: LTNN (local-trained neural network): LCP-Nets train their traffic prediction models locally. LTNN cannot be achieved in the considered scenario where LCP-Nets are not equipped with sufficient computing power. After parameter adjustment, we regard it as the most accurate method ideally.
4.: ICFed-F (intra-cluster federated learning-based transfer framework with fixed aggregation strategy): LCP-Nets obtain models from feature-matched clusters and aggregate them via the fixed strategy, according to the proposed framework.
5.: ICFed-T (intra-cluster federated learning-based transfer framework with trainable aggregation strategy): Assume that LCP-Nets have the computing capability to carry out a few model training tasks. ICFed-T allows fine tuning training in LCP-Nets after ICFed-F.

5.2.2. Evaluation Metrics

We use three evaluation metrics to evaluate the effect of the above methods, i.e., MAE, RMSE and

R^{2}

.

1.: MAE (mean absolute error) is the most common regression metric. Its calculation formula is

$MAE = \frac{\sum_{i = 1}^{n} ∣ {\hat{y}}_{i} - y_{i} ∣}{n},$

(6)

where ${\hat{y}}_{i}$ is the predictive value and $y_{i}$ is the actual value.
2.: RMSE (root mean square error) is extended by MAE. It amplifies the error value, and its calculation formula is

$RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{n}},$

(7)
3.: $R^{2}$ (coefficient of determination) is another commonly used measure of goodness of fit, and its calculation formula is

$R^{2} = \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{y})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},$

(8)

where $\bar{y}$ is the average of the actual value.

5.3. The Result of Experiment

In this section, we introduce the experimental results in two parts, clustering and traffic prediction results, corresponding to Steps 1 and 2, and Steps 3 and 4 of the proposed framework in Section 4.1, respectively.

We use the MI dataset to introduce the sampling and clustering process of the experiment in detail. For the MI dataset, we randomly sample 100 and 20 grids from 10,000 of them as ACP-Nets and LCP-Nets, respectively. As shown in Figure 9, the two types of subnets are distributed all over Milan.

5.3.1. The Result of Clustering

The first step of the experiment is clustering ACP-Nets. The number of clusters corresponding to the autocorrelation coefficient, intra-week feature, and intra-day distribution is set to 8, 7, and 4, respectively, based on the silhouette coefficient and inertia.

As shown in Figure 10, subnets of different clusters show different characteristics, some of which can be explained and verified by common sense. In Figure 10a, the traffic of subnets in most clusters shows significant and stable cycle dependency. A few subnets in Clusters 2 and 4 do not show this characteristic. We believe that the traffic-changing pattern of subnets in these clusters differs from that in other clusters.

As shown in Figure 10b,c, the clustering results of intra-weekly and intra-day traffic distribution almost reflect the classification of urban functional areas. For example, Cluster 1 in Figure 10b represents the non-residential area, where traffic on the weekend is significantly lower than on weekdays, and Cluster 3 in Figure 10c represents the places of entertainment, where traffic mainly occurs at night. Further, these two features can be combined to achieve a more specific functional area description.

Due to the artificiality of traffic, there are abnormal values in the traffic dataset, such as Cluster 4 in Figure 10c. In most previous algorithms, these abnormal values may affect the performance of the trained model. However, LCP-Nets will automatically avoid this effect in the proposed algorithm by not selecting the corresponding clusters. In addition, if there is manual participation, this step will also help with data cleaning.

After clustering, we obtain the feature-based models by intra-cluster federated learning as the second step. The selection of specific model structures in ICFed is very flexible. Before training cluster models using federation learning, we test the influence of different network structures on traffic prediction within a single grid in the MI dataset to find the network structure suitable for different types of traffic. We compare MLP and long short term memory (LSTM) with a smaller and larger size, respectively. Specifically, the compared network structures are as follows:

1.: MLP-S: A smaller size MLP. $p = 3, q = 2, r = 1$ , i.e., three hourly samples, two daily samples, and one weekly sample composed of the 6-dimensional input.There are two hidden layers with a size of 16. One-dimensional output is the predicted value of the network to the target time slot.
2.: MLP-L: A larger size MLP. $p = 168, q = 0, r = 0$ , i.e., one hundred sixty-eight samples as the 168-dimensional input. Similar to the MLP-S, except that the hidden layer size is 128.
3.: LSTM-S: A smaller size single-layer LSTM with a hidden layer size of 8. $p = 3,$ $q = 2, r = 1$ , i.e., same input as MLP-S with a hidden layer size of 8. The one-dimensional predicted value is output through a fully connected (FC) layer at the end of the LSTM module.
4.: LSTM-L: A larger size single-layer LSTM with a hidden layer size of 16. $p = 168$ , $q = 0$ , $r = 0$ , i.e., same input as MLP-L with a hidden layer size of 8. The FC layer after the LSTM module outputs a one-dimensional predicted value.

As shown in Table 1, we also record the corresponding amount of calculation and the number of parameters. The experimental results show that the larger the model size, the better the prediction performance, which is incorrect. Due to the limited training samples of traffic within a single grid, a too-large model size may lead to overfitting. In order to use models to be as lightweight as possible and to obtain better prediction results, we use MLP-S to predict the call-in traffic and LSTM-S to predict the internet service traffic. We trained four cluster models for each type of traffic, which correspond to four different features, i.e., the traffic on weekdays/weekends is significant, and the peak of traffic occurs at night or in the afternoon.

5.3.2. The Result of Traffic Prediction

In Step 3 and Step 4, LCP-Nets obtain feature-based models from the feature-matched clusters and aggregate them to customize their models. The method names, ICFed-F or ICFed-T, are distinguished by aggregation strategies. ICFed-T can be regarded as extending a fine-tuning training step based on ICFed-F.

The evaluation results of the prediction performance are shown in Table 2.

α

and

β

represent the number of communication rounds of ACP-Nets’ federated learning and the number of epochs of LCP-Nets’ local fine-tuning training, respectively. With the increased number of communication rounds, the traffic prediction error of LCP-Nets decreases gradually. When the number of communication rounds is 50, the performance of ICFed-F is better than that of SMA.

However, since ICFed-F never uses the historical traffic data of LCP-Nets to train the models, even if the number of communication rounds is increased to 100, the performance is still far from the ideal LTNN. The advantage of the ICFed-F method is that LCP-Nets no longer need to obtain high-precision neural network models through local training.

Assuming that LCP-Nets have a few computing resources for lightweight model training, we can use the ICFed-F method in which the last FC layer is trainable. As shown in Table 2, with the increase in the number of fine-tuning training epochs, the prediction error continues to decline on the basis of ICFed-F and reaches a level very close to the performance of LTNN when the number of fine-tuning training epochs is 150. Limited computing power, which results in being incapable of training all the weights in the model, gives rise to this slight gap.

Generally speaking, ICFed-F enables LCP-Nets to achieve better traffic prediction performance with deep learning under the premise of zero local training cost, and ICFed-T makes the performance of the traffic prediction models trained by the LCP-Nets close to that of the local adequately trained models under limited computing power. Both of them achieve privacy protection by obtaining feature-based models through feature matching. At the same time, intra-cluster federated learning is also helpful for efficient training of ACP-Nets’ local models. The proposed framework benefits both the LCP-Nets and the ACP-Nets.

6. Conclusions

This paper proposes an intra-cluster federated learning model transfer framework to help LCP-Nets customize models. We define two types of traffic features to cluster ACP-Nets to obtain specialized traffic prediction models for subnets with specific characteristics through intra-cluster federated learning. We design two aggregation strategies to help LCP-Nets to combine the obtained models. The experimental results show that the proposed framework can improve the traffic prediction efficiency of LCP-Nets by reducing prediction error and training overhead. Nevertheless, there are still shortcomings that exist. On the one hand, predicting future traffic in the proposed framework depends entirely on historical traffic and lacks using other multidimensional data, e.g., regional population density and emergency occurrence information. These data are valuable for cellular traffic prediction. On the other hand, traffic features need to be defined by model users. Although these manually defined features are simple and effective, there may still be some other helpful traffic features that are expected to be mined by artificial intelligence. We believe that improving the shortcomings of these two aspects will be the focus of future work.

Author Contributions

Conceptualization, P.L. and Y.X.; Funding acquisition, P.L., Y.X. and M.Y.; Investigation, Y.S. and M.Y.; Methodology, L.F. and Y.S.; Supervision, P.L., Y.X.; Validation, Y.S. and C.G.; Visualization, Y.S.; Writing—original draft, Y.S. and C.L.; Writing—review and editing, L.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Key R&D Program of China: 2020YFB1806700.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

LCP-Nets	Subnets with limited computing power
ACP-Nets	Subnets with abundant computing power
AI	Artificial intelligence
CN	Core network
NWDAF	Network data analytics function
MTLF	Model training logical function
AnLF	Analytics logical function
MLP	Multilayer perceptron
LSTM	Long short term memory
SMA	Simple moving average
ARIMA	Autoregressive integrated moving average
LTNN	Local-trained neural network
ICFed-F	Intra-cluster federated learning-based transfer framework with fixed aggregation strategy
ICFed-T	Intra-cluster federated learning-based transfer framework with trainable aggregation strategy

References

Fujioka, M. Ericsson vision and technology development towards 6G. IEICE Tech. Rep. IEICE Tech. Rep. 2021, 121, 31–36. [Google Scholar]
Letaief, K.B.; Chen, W.; Shi, Y.; Zhang, J.; Zhang, Y.J.A. The Roadmap to 6G: AI Empowered Wireless Networks. IEEE Commun. Mag. 2019, 57, 84–90. [Google Scholar] [CrossRef] [Green Version]
Zhang, P.; Xu, X.; Qin, X.; Liu, Y.; Ma, N.; Han, S. Evolution Toward Artificial Intelligence of Things Under 6G Ubiquitous-X. J. Harbin Inst. Technol. Ser. 2020, 27, 116–135. [Google Scholar]
3GPP. Architecture Enhancements for 5G SYSTEM (5GS) to Support Network Data Analytics Services; Technical Specification (TS) 23.288, 3rd Generation Partnership Project (3GPP), Version 17.4.0.; 2022. Available online: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3579 (accessed on 5 November 2022).
Alawe, I.; Ksentini, A.; Hadjadj-Aoul, Y.; Bertin, P. Improving Traffic Forecasting for 5G Core Network Scalability: A Machine Learning Approach. IEEE Netw. 2018, 32, 42–49. [Google Scholar] [CrossRef] [Green Version]
Jiang, W. Cellular Traffic Prediction with Machine Learning: A Survey. Expert Syst. Appl. 2022, 201, 117163. [Google Scholar] [CrossRef]
Xu, F.; Lin, Y.; Huang, J.; Wu, D.; Shi, H.; Song, J.; Li, Y. Big Data Driven Mobile Traffic Understanding and Forecasting: A Time Series Approach. IEEE Trans. Serv. Comput. 2016, 9, 796–805. [Google Scholar] [CrossRef]
Huang, C.W.; Chiang, C.T.; Li, Q. A Study of Deep Learning Networks on Mobile Traffic Forecasting. In Proceedings of the 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, QC, Canada, 8–13 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Zhao, N.; Wu, A.; Pei, Y.; Liang, Y.C.; Niyato, D. Spatial-Temporal Aggregation Graph Convolution Network for Efficient Mobile Cellular Traffic Prediction. IEEE Commun. Lett. 2022, 26, 587–591. [Google Scholar] [CrossRef]
Liu, Q.; Li, J.; Lu, Z. ST-Tran: Spatial-Temporal Transformer for Cellular Traffic Prediction. IEEE Commun. Lett. 2021, 25, 3325–3329. [Google Scholar] [CrossRef]
Zhang, C.; Dang, S.; Shihada, B.; Alouini, M.S. Dual Attention-Based Federated Learning for Wireless Traffic Prediction. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar] [CrossRef]
Ismail, A.; Amin, R. Malicious Cluster Head Detection Mechanism in Wireless Sensor Networks. Wirel. Pers. Commun. 2019, 108, 2117–2135. [Google Scholar] [CrossRef]
Zhang, C.; Zhang, H.; Qiao, J.; Yuan, D.; Zhang, M. Deep Transfer Learning for Intelligent Cellular Traffic Prediction Based on Cross-Domain Big Data. IEEE J. Sel. Areas Commun. 2019, 37, 1389–1401. [Google Scholar] [CrossRef]
Zhou, X.; Zhang, Y.; Li, Z.; Wang, X.; Zhao, J.; Zhang, Z. Large-Scale Cellular Traffic Prediction Based on Graph Convolutional Networks with Transfer Learning. Neural Comput. Appl. 2022, 34, 5549–5559. [Google Scholar] [CrossRef]
Barlacchi, G.; De Nadai, M.; Larcher, R.; Casella, A.; Chitic, C.; Torrisi, G.; Antonelli, F.; Vespignani, A.; Pentland, A.; Lepri, B. A Multi-Source Dataset of Urban Life in the City of Milan and the Province of Trentino. Sci. Data 2015, 2, 150055. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Harvard Dataverse. Italia, T. Telecommunications-SMS, Call, Internet-MI; 2015. Available online: https://doi.org/10.7910/DVN/EGZHFV (accessed on 5 November 2022).
Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Harvard Dataverse. Italia, T. Telecommunications-SMS, Call, Internet-TN; 2015. Available online: https://doi.org/10.7910/DVN/QLCABU (accessed on 5 November 2022).

Figure 1. Network architecture: Core network and its connection with the external network under a distributed intelligence solution.

Figure 2. The process of sampling and prediction when

p = 2, q = 2, r = 1

.

Figure 2. The process of sampling and prediction when

p = 2, q = 2, r = 1

.

Figure 3. Scenario diagram: In the core network, the node (middle) receives models from the subnet (ACP-Nets) with abundant computing power (left) and distributes customized models to corresponding subnets (LCP-Nets) with limited computing power (right).

Figure 4. The proposed framework: local models (green) and global models (dark red) are obtained through intra-cluster federated learning.

Figure 5. The sampling process of the intra-week distribution of traffic data, one type of regional feature, in w weeks.

Figure 6. Geographical locations and traffic distributions reflect regional features of Duomo (blue), Navigli (orange), and Bocconi (green). (a) Geographical locations of Duomo, Navigli, and Bocconi. (b) The intra-day distribution of standardized traffic volume. (c) The intra-week distribution of traffic volume.

Figure 7. The autocorrelation coefficient of traffic at Duomo, Navigli, and Bocconi.

Figure 8. Aggregation strategies of model A and model B include fixed weights (black lines), trainable weights (blue lines), the weights and weight vectors of model A and model B (green and orange), and the new weight vectors generated by aggregation (blue). (a) Fixed aggregation. (b) Trainable aggregation.

Figure 9. Sampling result of the MI dataset, 100 ACP-Nets (blue) and 20 LCP-Nets (red).

Figure 10. Clustering results of 100 randomly selected ACP-Nets in the MI dataset based on autocorrelation coefficient, intra-week distribution, and intra-day distribution of the Internet service traffic. (a) Clustering result based on autocorrelation coefficient. (b) Clustering result based on intra-week distribution. (c) Clustering result based on intra-day distribution.

Table 1. Comparisons of prediction performance, amount of calculation, and the number of parameters of different network structures.

Model Structure	Call In	Internet	Flops	Params
Model Structure	$RMSE$	$RMSE$	Flops	Params
MLP-S (6, 16, 16, 1)	1.79	15.97	368	401
MLP-L (168, 128, 128, 1)	2.12	17.01	38.0k	38.3k
LSTM-S (6, 8, 1)	2.37	15.87	584	521
LSTM-L (168, 16, 1)	3.26	18.53	12.0k	12.0k

Table 2. Prediction performance comparisons of the call-in traffic and the internet service traffic among different methods on the Milan and Trentino datasets.

Methods	Call-In—Milano			Internet—Milano			Call-In—Trentino			Internet—Trentino
	$MAE$	$RMSE$	$R^{2}$	$MAE$	$RMSE$	$R^{2}$	$MAE$	$RMSE$	$R^{2}$	$MAE$	$RMSE$	$R^{2}$
SMA	7.13	9.61	0.54	39.88	47.78	0.58	2.67	3.43	0.52	19.79	23.52	0.51
ARIMA	5.62	7.70	0.71	29.70	36.79	0.75	2.04	2.74	0.69	15.03	18.59	0.69
STNN ( $α = 0, β = 150$ )	4.39	6.09	0.82	22.09	27.88	0.86	1.18	1.76	0.87	11.12	14.46	0.81
ICFed-F ( $α = 50, β = 0$ )	6.46	8.73	0.62	35.65	43.19	0.66	2.36	3.15	0.59	17.79	21.65	0.59
ICFed-F ( $α = 100, β = 0$ )	6.15	8.16	0.67	32.98	40.10	0.71	2.17	2.93	0.65	16.68	20.35	0.63
ICFed-T ( $α = 100, β = 50$ )	5.29	7.32	0.73	27.12	33.93	0.79	1.85	2.54	0.74	14.09	17.58	0.73
ICFed-T ( $α = 100, β = 150$ )	4.44	6.22	0.81	22.42	29.48	0.84	1.54	2.17	0.81	12.12	15.48	0.79

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, P.; Shi, Y.; Xing, Y.; Liao, C.; Yu, M.; Guo, C.; Feng, L. Intra-Cluster Federated Learning-Based Model Transfer Framework for Traffic Prediction in Core Network. Electronics 2022, 11, 3793. https://doi.org/10.3390/electronics11223793

AMA Style

Li P, Shi Y, Xing Y, Liao C, Yu M, Guo C, Feng L. Intra-Cluster Federated Learning-Based Model Transfer Framework for Traffic Prediction in Core Network. Electronics. 2022; 11(22):3793. https://doi.org/10.3390/electronics11223793

Chicago/Turabian Style

Li, Pengyu, Yingji Shi, Yanxia Xing, Chaorui Liao, Menghan Yu, Chengwei Guo, and Lei Feng. 2022. "Intra-Cluster Federated Learning-Based Model Transfer Framework for Traffic Prediction in Core Network" Electronics 11, no. 22: 3793. https://doi.org/10.3390/electronics11223793

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intra-Cluster Federated Learning-Based Model Transfer Framework for Traffic Prediction in Core Network

Abstract

1. Introduction

2. Related Work

3. Problem Formulation

3.1. Single Subnet Traffic Prediction

3.2. Multiple Subnets Scenario

4. Proposed Framework

4.1. Intra-Cluster Federated Learning-Based Model Transfer Framework

4.2. Multidimensional Traffic Feature Extraction

4.2.1. Regional Feature

4.2.2. Statistical Feature

4.3. Multi-Model Aggregation Strategy

4.4. Clustering, Federated Learning and Proposed Algorithm

4.4.1. Preliminaries of Clustering

4.4.2. Preliminaries of Federated Learning

5. Experimental Results and Discussions

5.1. Dataset and Preprocessing

5.2. Setting of Baseline Methods and Evaluation Metrics

5.2.1. Setting of Baseline Methods

5.2.2. Evaluation Metrics

5.3. The Result of Experiment

5.3.1. The Result of Clustering

5.3.2. The Result of Traffic Prediction

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI