1. Introduction
With the rapid development of communication technology, mobile internet has been closely linked to our daily lives. There will be 5.5 billion mobile phone users in 2021 [
1], and mobile data traffic will increase sevenfold. In China, by the end of March 2019, the number of 4G network users had reached 1.204 billion [
2] with a per capita monthly flow of 7.27 GB, and it is still maintaining a strong growth trend. Rapidly growing user numbers and cellular traffic data have put a lot of pressure on existing network architectures and devices. However, mobile internet plays an important role in people’s life, and its stable operation and security has become very important.
There are many reasons for communication network accidents, such as the weather, malicious attacks, system failures, etc. System failure accounted for more than 60% of communication network accidents in EU countries in 2017. One of the main reasons for mobile internet system failure is that the main control card and baseband card in the base station are overloaded by a sudden increase in traffic demand. As a result, the light network is congested, which leads to a decrease in the internet access speed of mobile phones and affects user experience. In serious cases, it will lead to network equipment failure, which will reduce the connection rate of mobile phones and cause users to be unable to access the internet and make calls. When cell phones are not connected to each other, people often try to connect to the network again and again, which has a huge secondary shock on wireless networks and core networks. For example, when a concert is being held in a gym, the network traffic suddenly increases due to early failure in trying to support communication, resulting in overload of the main control card and baseband card in the base station equipment, causing card congestion failure. Due to this failure, the speed of internet access sharply reduces, and users cannot access the internet and make calls.
The demand for mobile network traffic data presents a remarkable spatiotemporal characteristic. A sudden increase in traffic data demand often occurs in settings with changing flow of people, such as commercial streets, stadiums, and railway stations. The prediction of cellular traffic data in settings with changing traffic flow allows operators to understand network traffic demand in advance and achieve on-demand distribution through resource scheduling. This could effectively alleviate network pressure and reduce the impact and damage caused by a sudden increase in traffic demand on the network as well as improve user experience. The radio system has been optimized for maximum load, which results in excessive energy waste in case of low traffic [
3]. The information and communication technology (ICT) industry consumed 3–4% of the world’s electricity in 2008, and this use has been growing rapidly by doubling every decade [
4]. Prediction of cellular traffic could help to allocate network resources on-demand, selectively shut down and rest base stations, and also reduce the power consumption of networked devices. Therefore, mobile network traffic prediction is not only of great significance in network security but also in energy conservation.
There have been some relevant reports on prediction of cellular traffic data [
5,
6,
7,
8,
9]. Wu et al. [
5] described the regularized orthogonal matching tracking (BT-ROMP) cellular flow prediction method based on threshold control. The theoretical basis of this method is compressed sensing technology; the principle is complex, and the calculation is time-consuming. In addition, the rational setting of threshold requires mass experimentation. He and Li [
6] reported a mobile communication base station traffic prediction method based on the vector autoregressive model. The autoregressive model is simple and easy to implement, but the learning ability is weak. Meanwhile, the prediction accuracy needs to be improved. Loumiotis et al. [
7] investigated the backhaul resource allocation problem on one side of the base station. Flow prediction model was established using an artificial neural network. Compared with the autoregressive model, the prediction performance was improved. However, artificial neural networks are difficult to train, and they can easily fall into the local optimum. In order to achieve higher performance, Qiu et al. [
8] employed multiple recurrent neural network (RNN) learning models to construct traffic prediction models by exploring spatiotemporal correlations between base stations. The structure of RNN neurons is more complex than traditional artificial neural networks. This method constructs multiple RNNs, which makes the prediction model more complicated. In addition, the model’s training and testing only take 15 days, and its performance regarding flow prediction needs to be further verified. Therefore, a novel and efficient flow forecasting method is proposed in this paper.
Extreme learning machine (ELM) [
10], a typical single-hidden layer feedforward neural network (SLFN), learns extremely fast and does not fall into the local optimum compared with backpropagation (BP) algorithms. In order to further improve the performance of ELM, Huang et al. [
11] proposed a kernel ELM (kELM) by referring to the idea of feature mapping of a support vector machine (SVM), which retains the high efficiency of ELM and inherits the excellent learning ability of SVM. The technique has been successfully used for classification [
12,
13,
14,
15,
16] and regression tasks [
17,
18,
19,
20,
21]. Therefore, in this study, we employed kELM to predict cellular flow data. There are four kernel functions in the classical SVM. In order to achieve optimal prediction of traffic data by kELM, four classical kernel functions were used for flow prediction. The meta-inspired optimization algorithm [
22] was used for parameter optimization, and the kernel function with the optimal result was selected. Wolpert and Macready [
23] as well as other authors have proven that none of the optimization algorithms can search for the optimal results of all optimized problems. Therefore, we chose the classic particle swarm optimization method (PSO) [
24] as well as two heuristic optimization algorithms with strong searchability, namely, multiverse optimizer (MVO) [
25] and moth–flame optimization (MFO) [
26], for parameter optimization of ELM with different kernel functions.
The rest of the paper is structured as follows. In
Section 2, network accidents caused by traffic are described, and the proposed method is introduced.
Section 3 outlines the experimental preparation for validation of the proposed method. In
Section 4, the experimental results are presented and analyzed.
Section 5 concludes the paper.
4. Experiment Results and Analysis
4.1. Performance Analysis of ELM with Different Kernel Functions
In this section, we investigate the predictive performance of kEML for cellular flow under different kernel functions. As the prediction performance of kEML is different when the parameters of different kernel functions take different values, we used the optimal prediction of flow of different kernel functions for comparison. Three metaheuristic optimization algorithms with excellent performance were used for parameter optimization of kEML. In the process of parameter optimization, each experiment was independently repeated 10 times. Finally, the optimal results were determined. The curves of MAPE of the training set the test sets of kEML with different kernel functions are shown in
Figure 4 and
Figure 5.
In
Figure 4, it can be seen that when the kernel function of ELM was Gaussian and polynomial, the value of MAPE appeared to increase with the number of iterations of the optimization algorithm. However, as shown in
Figure 5, all kernel functions decreased with the number of iterations of the optimization algorithm. This is because we used the minimum MAPE of the test set as a function of the fitness of the optimization algorithm to prevent model overlearning.
From
Figure 4 and
Figure 5, it can be intuitively seen that, in the initial stage of parameter optimization, the value of MAPE was largest when the kernel function of kELM was sigmoid. The parameters for the three optimization algorithm searches led to its MAPE being greater than 60%. Then, its MAPE decreased rapidly as parameter optimization was carried out, which meant that the sigmoid kernel function predicted the cellular flow more sensitively to the parameters in the given variable interval. In addition, the linear kernel function had relatively minimal variation of the optimal MAPE for the search of the three optimization algorithms, but its optimal MAPE was the largest for both the training and test sets. The optimal results of the four kernel functions of MVO, MFO, and PSO for optimized kELM are shown in
Table 5.
Table 5 shows in detail the optimal results of kELM under multiple optimizations of the four kernel functions by the three metaheuristic algorithms. From
Table 4, we can see that when the kernel function was Gaussian, the MAPE of the MFO search was the smallest at 11.150%. When the kernel function was polynomial, the MAPE of the MVO search was the smallest at 11.495%, and the result of the MFO search was relatively large at 11.611%. This indicates that the ability of the same metaheuristic optimization algorithm to search parameters of different kELM kernel functions is different, which is the main reason we chose multiple search algorithms for parameter optimization in this study.
kELM predicts cellular traffic in the public setting. For the test set, the Gaussian kernel function had the minimum MAPE at 11.150%, while the polynomial kernel function had the second smallest MAPE at 11.495%. The MAPE was the largest when the kernel function was linear, both for the test and training sets. In order to prevent overlearning, the fitness function of each optimization algorithm should have the minimum MAPE in the test set. Therefore, the kernel function of the minimum MAPE of the test set and the optimization algorithm were combined for cellular flow prediction of the base station in the public setting. In other words, we used the results of Gaussian kernel function of MFO-kELM optimization for the following experimental comparison.
The “time” in
Table 5 is the time consumed in the parameter optimization process. From this table, we can also see that no matter which optimization algorithm was used, when the kernel function of kELM was linear, the optimization time was the lowest, followed by the Gaussian kernel function. At the same time, the parameter optimization time of the polynomial and sigmoid kernel functions were relatively high. As can be seen from
Table 5, when the kernel function of kELM was linear, only one parameter needed to be optimized, whereas two parameters needed to be optimized for the Gaussian kernel function and three parameters needed to be optimized for the polynomial and sigmoid kernel functions. This indicates that the kELM parameter optimization time is closely related to the number of parameters optimized. Furthermore, although the kELM parameter optimization process took up to 100 s, the time was mainly consumed in the process of optimal algorithm optimization for selecting the best parameters.
4.2. Study of the Prediction Using Other Regression Algorithms
As noted in the previous section, we investigated the performance of kELM’s predicted cellular traffic data under different kernel functions. As a second step, we used v-support vector regression (vSVR), backpropagation neural networks, and basic ELM for predicting cellular traffic data in public settings and compared the results. To study the predictive traffic data of kELM, we optimized its parameters. To make the comparison accurate, we first searched for the optimal prediction of traffic data by vSVR, BP, and ELM.
To optimize the parameters of
vSVR, we chose MFO as the search algorithm. All the variable settings were kept the same for optimization of MFO by
vSVR as those for optimization of kELM parameters except for the variable boundary settings. The upper and lower boundary settings of parameters c, g, and v to be optimized were [1500, 10,001] and [0.01, 0.01], respectively. To prevent overlearning, the MAPE for the MFO test set was used as the fitness function. The MAPE curve in the optimization process is shown in
Figure 6.
In pattern recognition or machine learning, the performance of BP neural network and ELM is closely related to the nodes in the hidden layer. A small number of nodes will lead to weak model learning, increase the model training time, and even lead to overlearning, which will degrade the model performance. Therefore, the BP neural network and ELM should select the appropriate number of hidden layer nodes. In our experiment, the implementation of BP was done in a MATLAB neural network toolbox. The learning rate was set to 0.2, the maximum fail was set to 30, the hidden layer and output layer activation function selection was {‘logsig’,’tansig’}, and other parameters were set to default.
As the initial part of ELM weights and all the weights of the BP neural network are random, these will affect its performance, especially the BP neural network. When the initial weights are not set properly, the model will be difficult to train and may not even converge. Therefore, we independently repeated 100 trials under different neural nodes (which means that the number of our experiments was as high as 100 × 150 + 100 × 60). Except for the nonconvergence, the results took the mean. The mean absolute error (MAE) curves of the training and test sets for BP and ELM at different hidden layer nodes are shown in
Figure 7 and
Figure 8.
In
Figure 5, although the fitness function of the MFO-optimized
vSVR was the MAPE for the test set, its training set MAPE decreased with the increase in iteration times. At the end of optimization, the values of parameters c, g, and v were 1371.092, 0.024, and 0.891, respectively.
In
Figure 6, it can be seen that the average error of ELM of the training set decreased with the increase in hidden layer nodes. However, the error of the test set barely decreased when the number of hidden layer nodes was greater than 100. Therefore, the hidden layer node value of ELM was 100. In
Figure 7, the average error of the BP training set decreased with the increase in hidden layer nodes. When the number of nodes was greater than 30, the error of the test set increased with the increase in nodes, showing that the model had a learning phenomenon. In conclusion, when predicting cellular flow data, the hidden layer node of BP should be set to 30.
From
Figure 7 and
Figure 8, we can also see that the consumption time of ELM had an approximately linear relationship with the number of hidden nodes. In contrast, for BP, there was an exponential relationship, and the training time of BP was much longer than that of ELM.
4.3. Comparison with Other Regression Algorithms
We studied the predictive cellular flow data of kELM and other regression algorithms and compared the prediction results of other regression algorithms with those of kELM, as shown in
Table 6.
In
Table 6, the parameters of MFO-ELM (Gaussian) and MFO-
vSVR are [C ag] and [c, g, v], while the parameters for ELM and BP are the number of hidden layer nodes. “Time” is the parameter optimization time.
From
Table 6, it can be seen that MFO-
vSVR had the smallest MAPE in the test set at 11.082%, while the MFO-kELM had the smallest MAPE in the training set at 9.411%. However, kELM was more efficient, and its optimization time was 149.49 s, which was much less than the optimization time of 11,405.70 s for
vSVR. The worst performance for cellular flow prediction was ELM, which had the largest MAPE in both the test and training sets. Differences between the results predicted by each regression algorithm and actual value of cellular traffic is shown in
Figure 9. The standard deviation of kELM and vSVR was 0, while that of BP and ELM was not 0, i.e., the training results of kELM and
vSVR were more stable.
In addition, from
Table 6, we can also see that the introduction of kernel functions in ELM to map features to high-dimensional space not only significantly improved the prediction accuracy of ELM but also eliminated the uncertainty brought by the random initial weight to model prediction performance.
5. Conclusions
In mobile network operations, network failures caused by a sudden increase in cellular traffic data demand often occur. Therefore, it is of great practical significance to predict the flow of mobile networks in settings with changing traffic flow for stable network operation and resource scheduling. In order to realize accurate prediction of cellular flow data, this study analyzed the performance of kELM in predicting cellular traffic data with different kernel functions. In the experiment, to determine optimal parameters of kernel function, three metaheuristic optimization algorithms (PSO, MVO, and MFO) were adopted.
The results showed that kELM optimized by MFO with Gaussian as kernel function had the smallest test set MAPE (11.150%). Moreover, we used ELM, BP, and SVR for flow prediction to verify the performance of kELM. The optimal prediction results of ELM, BP, and SVR for cellular flow data were obtained through a large number of experiments. kELM had a significant advantage in prediction accuracy compared with ELM and BP. Although the prediction accuracy of SVR was as good as that of kELM, the optimization time for SVR was very long, and its prediction efficiency was low.
We studied an efficient prediction model of mobile traffic based on the kELM algorithm. A commercial street in Huainan was selected to verify the effectiveness of the model through experiments. The proposed kELM-based traffic forecasting method will allow operators to prepare for coping with upcoming congestion and improve service quality. Meanwhile, it could also guide network operators to rationally allocate network resources, effectively saving energy and reducing operating costs.