WVETT-Net: A Novel Hybrid Prediction Model for Wireless Network Traffic Based on Variational Mode Decomposition

Guo, Jiayuan; Tang, Chaowei; Lu, Jingwen; Zou, Aobo; Yang, Wen

doi:10.3390/electronics13163109

Open AccessArticle

WVETT-Net: A Novel Hybrid Prediction Model for Wireless Network Traffic Based on Variational Mode Decomposition

by

Jiayuan Guo

,

Chaowei Tang

^*,

Jingwen Lu

,

Aobo Zou

and

Wen Yang

School of Microelectronic and Communication Engineering, Chongqing University, Chongqing 400044, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(16), 3109; https://doi.org/10.3390/electronics13163109 (registering DOI)

Submission received: 6 July 2024 / Revised: 31 July 2024 / Accepted: 1 August 2024 / Published: 6 August 2024

(This article belongs to the Section Microwave and Wireless Communications)

Download

Browse Figures

Versions Notes

Abstract

:

Precise prediction of wireless communication network traffic is indispensable in the operational deployment of base station resources and improvement of the user experience. Cellular wireless network traffic has both spatial and temporal characteristics. The existing modeling algorithms have achieved good results in extracting the spatial features, but there are still deficiencies in the extraction models for the time dependencies. To resolve these problems, this paper proposes a novel hybrid neural network prediction model, called WVETT-Net. Firstly, variational mode decomposition (VMD) is used to preprocess network traffic, and the whale optimization algorithm (WOA) is used to select the optimal parameters for VMD. Secondly, the local and global features are extracted from each subsequence by a temporal convolutional network (TCN) and an improved Transformer network with a multi-head ProbSparse self-attention mechanism (Pe-Transformer), respectively. Finally, the extracted feature representation is enhanced by using an efficient channel attention (ECA) mechanism to achieve accurate wireless network traffic predictions. Experimental results on two wireless network traffic datasets show that the proposed model (WVETT-Net) outperforms the traditional single or combined models in wireless network traffic prediction.

Keywords:

wireless network traffic prediction; temporal convolutional network; transformer; efficient channel attention; variational mode decomposition

1. Introduction

Because of the wide application in wireless networks, the network operators are experiencing a surge in traffic, which not only increases the operational difficulties and resource scheduling demands of cellular networks but also ushers in challenges for the functional management of base stations. Therefore, reasonable optimization of the wireless networks is a top operational priority. Wireless traffic prediction, as an indispensable part of communication network operations, enables effective bandwidth allocation and congestion control, thus reducing network congestion and packet loss [1]. Base station traffic prediction enables automatic switching between the sleep and wake modes to save power, while improving the user experience [2,3,4].

In wireless network communications, cellular traffic is generally viewed as predictable chronological data. For traffic time series, the accumulation of past values causes changes in future values. Therefore, the time-dependent features of time series data, such as trends and periodicities, need to be analyzed. After fitting the models, the sequences from the past can be used to predict the sequences in the future.

Contemporary scholarly approaches to wireless network traffic prediction can be divided into two main categories which are linear statistical models and nonlinear artificial neural network models, respectively. The representative linear model is the traditional autoregressive model [5]. Moayedi and Masnadi-Shirazi [6] introduced the traditional autoregressive moving average (ARIMA) model to achieve wireless network traffic prediction. Nevertheless, due to the nonstationarity and nonlinearity of the traffic on wireless networks, linear correlation models are prone to a significant reduction in the prediction performance of complex data. Therefore, deep learning models based on nonlinear theory have been increasingly explored for network traffic prediction. Ramakrishnan and Soni [7] summarized the long short-term memory network (LSTM) as well as the gated recurrent unit (GRU) or other variant structures based on the recurrent neural network (RNN) to extract the internal features of network traffic. Gao et al. [8] proposed a convolutional neural network (CNN) based on a residual network for cellular traffic prediction. The above two types of deep learning models can learn more features than the linear models.

However, with the various complex changes brought about by the surge in wireless network traffic, it is difficult for a single prediction model to achieve comprehensive extractions of features. Therefore, researchers have proposed a fusion prediction model to overcome this difficulty. Shawel et al. [9] proposed a combined prediction model of dual seasonal ARIMA and LSTM, where the dual seasonal ARIMA extracts the seasonality of the network traffic and LSTM extracts the data of the nonlinear residual component to achieve a prediction effect that exceeds that of applying either of these models. Wang et al. [10] combined a data augmentation model based on a generative adversarial network with LSTM to achieve multistep accurate prediction of cellular traffic while also protecting network security. Bi and Zhang et al. [11] introduced a temporal convolutional network (TCN) [12] module, which performs better than LSTM or the GRU in time series predictions in some cases, to extract the local features. Simultaneously, the LSTM module is designed to capture the long-term dependencies, and, together with the denoising module, called the Savitzky-Golay filter, achieves more precise network traffic prediction than a single traditional model such as LSTM or a TCN.

Although existing fusion models have achieved better prediction accuracy than single models, they are still insufficient for multilevel extractions of local and global wireless traffic information. In recent years, Transformer [13] and Informer [14] models have been successively used for long-term series prediction and achieved good results, where the ProbSparse self-attention mechanism of the Informer can reduce the computational complexity and avoid overfitting. Furthermore, the variational mode decomposition (VMD) algorithm improved with the whale optimization algorithm (WOA) can remove excessive noise interference in time series predictions to achieve high accuracy [15,16,17].

Consequently, based on the WOA-VMD algorithm, this paper proposes a novel hybrid neural network prediction framework, WVETT-Net, which combines efficient channel attention (ECA) [18], a TCN, and an improved Transformer network with multi-head ProbSparse self-attention mechanism (Pe-Transformer) modules. First, VMD, which is robust to noise, is used in data preprocessing to achieve adaptive decomposition of the components in the frequency domain of the traffic sequences. In this process, the VMD parameters are optimized using the WOA to obtain the optimal intrinsic mode function (IMF), and a hybrid neural network is constructed for each mode. Second, the TCN is adopted for the extraction of local features. Simultaneously, the Pe-Transformer network is adopted for global feature extraction, which solves part of the high computational complexity problem while retaining the advantages of the traditional Transformer network for long sequence prediction. In the next step, ECA is added to learn the relationships between various channels with the weights of the channel features being adjusted to improve the feature representation. Finally, the prediction results of each module are summed to obtain the prediction results. The most important research contributions of this paper can be summarized as follows:

The WOA-VMD technique is introduced to obtain features in different modes with optimal adaptive decomposition and effectively reduce the high complexity of wireless traffic sequences.
The proposed hybrid prediction model, WVETT-Net, provides better extraction of features at different time scales. The ProbSparse attention reduces the time complexity, and the efficient channel attention mechanism enhances feature representation in wireless traffic sequences.
The performance of the WVETT-Net model is analyzed on two wireless network datasets, and extensive experimentation verifies that the proposed model in this paper outperforms the baseline model with excellent prediction results.

The remaining sections are organized as follows: Section 2 presents the current state of major scholarly research in wireless network traffic prediction. Section 3 elaborates on the WVETT-Net model as a whole. Section 4 evaluates the performance of the proposed model on two datasets. Section 5 summarizes this paper and offers an outlook on future research directions.

2. Related Works

In this section, the current research of other scholars on wireless network traffic prediction models is introduced. The single or combined models used are classified into three major types, which are based on a RNN, CNN, and attention mechanisms. Researchers have established learning models for the seasonality, trend, and periodicity of network traffic sequences.

The structure of the RNN and its variants has a cyclic connection function that considers previous information and uses the output of the previous step as the input of the next step. As a variant structure, LSTM can avoid the gradient explosion problem of the standard RNN through gate control. Zhu and Wang [4] proposed the use of LSTM for predicting cellular traffic, which served the purpose of managing the sleep state in the base station and reducing unnecessary power consumption. Hachemi et al. [19] introduced the fast Fourier transform for preprocessing, which extracted the most relevant features from traffic sequences while eliminating the relevant noise and then predicted traffic sequences using the LSTM. As experimentally demonstrated, the complexity of the model is reduced and the prediction performance is improved. In [20], the dynamic modification neural network model was proposed for processing the predicted values by employing a modification module on top of neural networks such as LSTM. This modification module can generate linear discrete dynamic parameters with adjusted values based on the dynamic features of the traffic sequences, thus reducing the prediction value error at the inflection point. Following experimental verification, it can be demonstrated that the precision of the proposed method is generally greater than that of the LSTM.

CNNs and related structures are relatively simple, faster to compute, and less complex than RNN-based models. Due to the internal properties of the convolutional kernel, CNN-based structural models can sense the data changes in the past period and make predictions for the future. Dauphin et al. [21] proposed a gated convolutional network based on a variant of a CNN, which achieved good results in the prediction of corresponding sequences but still suffered from the vanishing gradient problem in the case of multilayer superposition. Zhang and Patras [22] proposed a spatiotemporal neural network structure, combining a 3D-CNN and a convolutional long short-term memory network, which solved the gradient vanishing and the high complexity problems of LSTM to learn long-term traffic sequences, while the 3D-CNN learned short-term traffic sequences. Experiments showed that this model reduced the prediction error by 61% compared to the common prediction models on the traffic dataset. In [23], a multivariable wireless traffic prediction model was combined with a TCN and graph attention network, in which the dilated convolution and other introduced structures could extract more features. However, these structures still cannot sufficiently extract global features.

The attention mechanism concentrates on important parts when processing wireless traffic sequences, which addresses the difficulty of high accumulative errors over time, thus enhancing the performance of the neural network. Gao et al. [24] proposed an improved LSTM network model based on the attention mechanism to predict wireless traffic, where the attention mechanism can represent the connection between the target value and trained value, and thus expands the time memory length. Shen et al. [25] proposed a time-wise attention mechanism to assist CNNs in cellular traffic prediction. The time-wise attention mechanism is located in the time embedding block, which is capable of connecting past and present information to capture long-term dependencies to achieve high prediction accuracy.

In summary, in terms of wireless traffic prediction, many of the existing single or combined models have several disadvantages. This paper proposes the WVETT-Net model, which incorporates several advantages of the existing models and introduces modules that can compensate for the drawbacks of the models to achieve accurate prediction results.

3. Proposed Method

3.1. Data Preprocessing Method

3.1.1. Variational Mode Decomposition

In the wireless network traffic prediction stage, the traffic is subjected to varying degrees of noise interference at different time intervals. Therefore, noise reduction preprocessing is required for nonstationary and complex traffic sequences. The VMD module is robust to noise, and can adaptively decompose the corresponding traffic sequences into a set of bandwidth-limited mode components to avoid generating mode aliasing problems [16]. The VMD steps are as described below:

(1) The analytical sequence of each mode of the input traffic sequence

Z (t)

after the Hilbert transform is mixed into an estimated center frequency, as shown in Formula (1):

[(δ (t) + \frac{j}{π t}) ⨂ u_{k} (t)] e^{- j ω_{k} t}

(1)

where

δ (t)

is the Dirac function,

{ω} = {ω_{1}, ω_{2}, . . ., ω_{K}}

represents the center frequency of each decomposed component,

{u} = {u_{1}, u_{2,} . . ., u_{K}}

denotes the various components of the traffic sequence and

K

is the mode number of the decomposition.

The sum of the

K

modes is the input traffic

Z (t)

, and the sum of the estimated bandwidths of each mode is minimized. The constrained variational problem is constructed in the following way:

\{\begin{matrix} \min_{\{u_{k}\}, \{ω_{k}\}} {\sum_{k = 1}^{K} | | \partial_{t} [(δ (t) + \frac{j}{π t}) ⨂ u_{k} (t)] e^{- j ω_{k} t} {| |}_{2}^{2}} \\ s . t . \sum_{k = 1}^{K} u_{k} = Z (t) \end{matrix}

(2)

(2) The constrained variational problem of Formula (2) is transformed into an unconstrained variational problem, as shown in Formula (3):

L (u_{k}, ω_{k}, λ) = α \sum_{k = 1}^{K} | | \partial_{t} [(δ (t) + \frac{j}{π t}) ⨂ u_{k} (t)] e^{- j ω_{k} t} {| |}_{2}^{2} + | | \dot{Z (t) - \sum_{k = 1}^{K} u_{k} (t)} {| |}_{2}^{2} + [λ (t) Z (t) - \sum_{k = 1}^{K} u_{k} (t)]

(3)

where

α

is a quadratic penalty factor to ensure the reconstruction accuracy of the input traffic

Z (t)

in the presence of mixed noise, and

λ (t)

is a Lagrange multiplier to keep the constraints strict.

(3) The alternating direction multiplier is designed to update

{\hat{u}}_{k}^{n + 1}

,

ω_{k}^{n + 1}

, and

{\hat{λ}}^{n + 1}

by alternately iterating to find the optimal solution in Formula (2). After optimization, the following formulas are obtained:

{\hat{u}}_{k}^{n + 1} (ω) = \frac{\hat{Z} (ω) - \sum_{i \neq k} {\hat{u}}_{i} (ω) + \frac{\hat{λ} (ω)}{2}}{1 + 2 α {(ω - ω_{k})}^{2}}

(4)

ω_{k}^{n + 1} = \frac{\int_{0}^{\infty} ω {|{\hat{u}}_{k} (ω)|}^{2} d ω}{\int_{0}^{\infty} {|{\hat{u}}_{k} (ω)|}^{2} d ω}

(5)

{\hat{λ}}^{n + 1} (ω) = {\hat{λ}}^{n} (ω) + h [\hat{Z} (ω) - \sum_{k} {\hat{u}}_{k}^{n + 1} (ω)]

(6)

In the above formula,

n

is the number of iterations,

h

is the noise tolerance and

{\hat{u}}_{k} (ω)

,

{\hat{u}}_{k}^{n + 1}

,

\hat{Z} (ω),

and

\hat{λ} (ω)

are the results of Fourier equidistant transformations on

u_{i} (ω)

,

u_{k}^{n + 1}

,

Z (ω),

and

λ (ω)

, respectively.

(4) Formula (7) is employed to ascertain whether the result of the aforementioned loop iteration converges. If convergence is achieved, the computation is terminated. Otherwise, the computation is continued until convergence is reached.

\sum_{k} \frac{{∥ {\hat{u}}_{k}^{n + 1} - {\hat{u}}_{k}^{n} ∥}_{2}^{2}}{{∥ {\hat{u}}_{k}^{n} ∥}_{2}^{2}} < ε

(7)

Before the VMD algorithm is executed,

K

and

α

need to be set correctly because their reasonable values have crucial impacts on the predictive accuracy.

3.1.2. Whale Optimization Algorithm

The WOA simulates the process of whale predation, which mainly includes three steps: prey sensing, bubble net predation, and prey searching. The WOA has solved the optimization problem in many fields and has better results than the traditional optimization algorithms [17]. The three steps are described below:

(1) When searching around the global solution space, it is necessary to derive the position of the target prey, i.e., the optimal solution, and then encircle it. The process of position updating is as follows:

D (t) = | C (t) X^{*} (t) - X (t) |

(8)

X (t + 1) = X^{*} (t) - A (t) D (t)

(9)

A (t) = 2 a (t) r (t) - a (t)

(10)

C (t) = 2 r (t)

(11)

In the above formula,

t

is the number of iterations;

D (t)

is the step size of the enclosure;

A (t)

and

C (t)

are coefficient vectors;

X^{*} (t)

is the position vector of the currently obtained optimal solution;

X (t)

is the position vector of the searching whale; the initial vector size of

X^{*} (t)

and

X (t)

are both 10;

a (t)

is the convergence factor; and

r (t)

is a random vector in the interval

[0, 1]

. During the iteration process, the size of

a (t)

decreases linearly from 2 to 0. In order to obtain

X^{*} (t)

, the corresponding fitness value can be calculated at each movement of the whale where the corresponding principle can be referred to Section 3.1.3.

X^{*} (t)

is obtained by updating the position after comparing the fitness values of individual whales for each iteration and stopping when the fitness value reaches minimum value.

(2) The position update between the whale and the prey can be described by the logarithmic spiral Formula (12):

\{\begin{matrix} X (t + 1) = D^{'} e^{b l} \cos 2 π l + X^{*} (t) \\ D^{'} = | X^{*} (t) - X (t) | \end{matrix}

(12)

where

D^{'}

is the distance between the current searching individual and the current optimal solution,

b

is the spiral shape parameter and

l

is a random number with a uniform distribution in the value domain

[- 1, 1]

.

Next, the decision of whether the bubble net is predatory or contraction encircling is made based on the decision probability

p

.

X (t + 1) = \{\begin{matrix} X^{*} (t) - A (t) D (t), p < 0.5 \\ D^{'} e^{b l} \cos (2 π l) + X^{*} (t), p ⩾ 0.5 \end{matrix}

(13)

(3) The optimization algorithm randomizes the search by updating the position based on the distance between the whales, thus enhancing the ability to search all the whales in the range.

\{\begin{matrix} D^{″} = | C (t) X_{rand} (t) - X (t) | \\ X (t + 1) = X_{rand} (t) - A (t) D (t) \end{matrix}

(14)

where

D^{″}

is the distance between the current searching whale and the randomized whale, and

X_{rand} (t)

denotes the current position of the randomized whale.

3.1.3. WOA-VMD Algorithm

To avoid mode aliasing or frequency crossover after VMD of the traffic sequences, envelope entropy is introduced as the target fitness function. The smaller the local envelope entropy, the lower the randomness and complexity of the traffic. Therefore, the envelope entropy minimum is chosen to determine the mode number

K

and penalty factor

α

. The envelope entropy is calculated as Formula (15):

\{\begin{matrix} E_{i} = - \sum_{j = 1}^{N_{s}} p_{i, j} \lg p_{i, j} \\ p_{i, j} = \frac{a_{i} (j)}{\sum_{j = 1}^{N} a_{i} (j)} \end{matrix}

(15)

where

i

is the number of decomposition layers,

p_{i, j}

is the sequence of the probability distribution,

a_{i} (j)

is the envelope sequence of the

i

th IMF after the Hilbert transform,

E_{i}

is the envelope entropy computed for

p_{i, j}

and

N_{s}

is the number of samples. The specific steps of WOA-VMD are shown in Figure 1, where the corresponding variables and formulas are described in Section 3.1.1 and Section 3.1.2.

3.2. Model Prediction Method

3.2.1. Temporal Convolutional Network

The temporal convolutional network consists of one-dimensional causal convolutions and dilated convolutional layers, which use residual connectivity to solve the vanishing gradient problem [12]. In causal convolutions, the state of each layer at moment

t

is related only to the inputs before moment

t

. Due to the limited size of the convolutional kernel, the complexity of the network trained to capture local features increases. Therefore, dilated causal convolution is introduced, as shown in Figure 2.

The novelty of dilated convolutions is that the input data can be sampled at intervals based on the dilation factor

d

, which grows exponentially as the network layer increases, thus increasing the receptive field while using as few convolutional layers as possible. For input time series

x

, the causal dilated convolution is shown as follows:

F (t) = \sum_{i = 0}^{k_{t c n} - 1} f (i) x_{t - d i}

(16)

where

F (t)

is the output after one dilated convolutional operation,

d

is the dilation factor,

k_{t c n}

is the convolutional kernel size, and

x_{t - d i}

represents the past data.

The TCN consists of multiple residual blocks, where the overall structure is shown in Figure 3. Each residual block consists of two one-dimensional dilated causal convolutional layers. The weight norm layer in each layer manages the gradient explosion problem, and the ReLU activation function strengthens the nonlinearity and the regularization added by the dropout algorithm avoids overfitting. The output layer of the TCN introduces a residual connection, which connects the input

x

to the output

F (x)

of the convolutional network as follows:

o = Activation (x + F (x))

(17)

where

o

is the final output, and

F (x)

is the result after one-dimensional convolution.

3.2.2. Efficient Channel Attention

The application of attention mechanisms has gradually increased in recent years. The squeeze-and-excitation network (SENet) is proposed to enhance the functionality of convolutional networks by setting up channel attention for convolutional block learning during the process of aggregating and calibrating features [26]. However, SENet controls the complexity of the original model while reducing its predictive performance through dimensionality reductions.

Therefore, the ECA module, which can capture local cross-channel information interactions to improve performance without increasing complexity, is introduced to overcome the drawbacks of SENet [18]. The ECA attention mechanism improves the efficiency of the CNN in utilizing the channel information by performing a one-dimensional convolution operation on the channel dimensions, dynamically learning the weights of each channel, and then adjusting the feature maps according to the size of the weights. The corresponding work process is shown in Figure 4. First, the original feature is subjected to global average pooling (GAP). Moreover, the dimensionality of the obtained features is not reduced, where

H

and

W

denote the length and width of the feature map. Next, the local cross-channel interaction coverage,

k_{e c a}

, is adaptively chosen based on the size of channel

C,

to capture the local cross-channel interaction information. Next, the attention weight parameter of each channel is learned and shared using the sigmoid function, where the weight

w_{i}

of the

i

th channel

y_{i}

can be shown as

w_{i} = σ (\sum_{j = 1}^{k_{e c a}} α^{j} y_{i}^{j}), y_{i}^{j} \in Ω_{i}^{k_{e c a}}

(18)

where

Ω_{i}^{k_{e c a}}

is the set of

k_{e c a}

channels adjacent to

y_{i},

and

α^{j}

is the weight parameter shared by all the channels.

σ

represents the sigmoid activation function. Finally, the channel attention weights are applied to the original feature maps, and the feature maps of each channel are scaled by multiplying the channel attention weights so that the important channels are strengthened and the unimportant channels are suppressed to obtain final results.

3.2.3. Pe-Transformer Network

Traditional Transformers [13] specialize in long-time series, capturing the global dependencies of wireless network traffic based on an attention mechanism while allowing for parallel computations. A Transformer consists of encoders and decoders. The encoder consists of multiple stacked network layers. Each network layer comprises a multi-head attention mechanism and a feed-forward network, with residual connections and normalization between the two components.

Li et al. [27] proposed an improved Transformer model based on sparse attention for the original Transformer model, which reduced the computational complexity. Inspired by the sparse attention mechanism, this paper introduces multi-head ProbSparse attention to the Transformer for wireless network traffic prediction. The traditional model of dot-product self-attention is calculated as follows:

Attention (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(19)

where

Q \in R^{L_{Q} \times d_{k}}

denotes query matrix;

K \in R^{L_{K} \times d_{k}}

represents key matrix;

V \in R^{L_{V} \times d_{v}}

represents value matrix;

L_{Q}

,

L_{K}

, and

L_{V}

denote the length of

Q

,

K

, and

V

; and

d_{k}

and

d_{v}

denote the dimensions of the key matrixes

K

and

V

, respectively. The weighted probability of the queries in the

i

th row is shown in Formula (20):

Attention (q_{i}, K, V) = \sum_{j} \frac{k (q_{i}, k_{j})}{\sum_{l} k (q_{i}, k_{l})} v_{j} = E_{p (k_{j}∣ q_{i})} [v_{j}]

(20)

where

p (k_{j}∣ q_{i}) = \frac{k (q_{i}, k_{j})}{\sum_{l} k (q_{i}, k_{l})}

and

k (q_{i}, k_{j}) = \exp (\frac{q_{i} k_{j}^{T}}{\sqrt{d_{k}}})

denote the uniform and probability distributions, respectively.

q_{i}

,

k_{i}

, and

v_{i}

denote the

i

th row of

Q, K, a n d V

.

The internal computation presents the probability distribution of the correlation between queries and the other indices in the sequences. In [28], researchers found that a self-attention mechanism probability distribution is potentially sparse, exhibiting a long-tailed distribution, with a small number of dot-product pairs concentrating on the majority of the attention weights. Therefore, Kullback–Leibler (KL) divergence is used to calculate the difference between the uniform and probability distributions to obtain the sparsity measurement of the

i

th query so that the first top

N

queries with the highest sparse scores can be selected for computation. Simultaneously, the remaining queries, which are averaged as input, are not computed. The approximate result is as follows:

\bar{M} (q_{i}, K) = \underset{j}{m a x} {\frac{q_{i} k_{j}^{T}}{\sqrt{d_{k}}}} - \frac{1}{L_{K}} \sum_{j = 1}^{L_{K}} \frac{q_{i} k_{j}^{T}}{\sqrt{d_{k}}}

(21)

where

\bar{M} (q_{i}, K)

is the simplified sparsity calculation result of the

i

th query.

The calculation formula of the ProbSparse self-attention mechanism can be expressed as follows:

Attention (Q, K, V) = Softmax (\frac{\bar{Q} K^{T}}{\sqrt{d_{k}}}) V

(22)

where

\bar{Q}

is a matrix combining the top

N

sparse queries with larger weight.

Therefore, the structure of the ProbSparse self-attention mechanism is shown in Figure 5. The computational complexity changes from

O (L^{2})

to

O (L \cdot l o g L)

after optimization [14], where

L

refers to the input sequence length.

As shown in Figure 6, the structure used in WVETT-Net retains only the encoder structure for feature extractions of wireless traffic sequences. The high-dimensional features of a single time sample are obtained through the embedding layer, and positional encoding embeds the position information for the input sequences. Subsequently, the obtained features are weighted and summed after each multi-head ProbSparse self-attention computation that uses multiple sets of learnable mappings instead of a single attention function. The process is as follows:

h e a d_{i} = A (\bar{Q_{i}}, K_{i}, V_{i}) = S o f t m a x (\frac{\bar{Q_{i}} K_{i}^{T}}{\sqrt{d_{k}}}) V_{i}

(23)

MultiHead (Q, K, V) = Concat (h e a d_{1}, \dots, h e a d_{2}) W^{o}

(24)

where

W^{o}

represents the trainable weight matrix and

Concat (\cdot)

represents the integration.

Next, the processed vectors are nonlinearly transformed by the feed-forward network so that the learned features are abstract. Finally, feature mapping is performed through a fully connected layer to get results. Furthermore, residual concatenation and normalization operations taken in the network layer address the gradient explosion problem and ensure output normalization.

3.2.4. WVETT-Net

A novel hybrid prediction model is proposed in this paper, named WVETT-Net, which is a combination of WOA-VMD with TCN-ECA and Pe-Transformer-ECA modules. The flowchart of the combined model is shown in Figure 7, which includes the steps of decomposition, prediction, and reconstruction. VMD parameters are optimally selected by WOA to adaptively decompose the wireless traffic sequences. Subsequently, the VMD algorithm, which achieves noise reduction, is used to obtain the IMF component. After normalizing each component, local and global feature extraction are performed by TCN and Pe-Transformer, respectively. The weights of the extracted features are then optimized by ECA. Finally, the prediction results are obtained through the corresponding fusion output. The specific experimental steps for constructing the model for use are as follows:

(1): Read the input wireless traffic dataset.
(2): Optimize the number of decomposition layers $K$ and the penalty factor $α$ of VMD by using WOA.
(3): Decompose the wireless network traffic sequences using VMD. The original traffic sequences can be denoted as $Z = {z_{1}, z_{2}, \dots, z_{n}}$ , where $n$ denotes the number of observations. The IMF components of different frequencies are denoted as $U$ .
(4): Normalize the traffic value of each IMF component to the interval $[0, 1]$ .
(5): Divide the training set and the test set for each IMF component based on the appropriate ratio. The sliding window strategy is employed in the dataset to predict the future specified step length sequences based on the input sequences.
(6): Construct a combinatorial model for each IMF component, which allows training and prediction of the wireless network traffic at different time steps. The normalized training data $\bar{U}$ are input to the TCN and Pe-Transformer models, respectively. The Pe-Transformer processes and encodes $\bar{U},$ layer-by-layer, to obtain the new feature $\bar{U_{P e T}}$ containing long-term information. Simultaneously, the TCN is extracted to obtain the new feature $\bar{U_{T C N}}$ with short-term information.
(7): The extracted temporal features $\bar{U_{P e T}}$ and $\bar{U_{T C N}}$ are integrated by using ECA, which results in the reallocation of feature weights to obtain $\bar{U_{E P e T}}$ and $\bar{U_{E T C N}}$ . The extracted features from each module are fused, and output after the fully connected layer.
(8): The WVETT-Net model hyperparameters are manually adjusted to seek the best parameters based on the model fit and to model the best parameters.
(9): Perform an inverse normalization operation for each IMF component prediction.
(10): The inverse normalized prediction sequences are merged to obtain the final prediction result $Z_{pr}$ .

4. Experiments

In this section, the proposed WVETT-Net model is tested on two different wireless network traffic datasets. The datasets, metrics, and WOA-VMD preprocessing analysis are presented first. Secondly, the prediction performance of WVETT-Net is compared with baseline models and real traffic value under the corresponding experimental setup conditions. Thirdly, ablation experiments are performed to validate the effectiveness of each module. Lastly, an analysis of the time complexity of the different combined models is carried out.

4.1. Dataset Description

The experiments in this paper are conducted on two datasets, called “Isp” and “Int”. The first dataset, “Isp”, was derived from private ISP wireless network traffic provided by the European City. The “Isp” traffic was collected every five minutes from June 2005 to July 2005, for a total of 14,772 sets of collected data. The second dataset, “Int”, was collected from November 2004 to January 2005 for wireless traffic from the UK Academic Network. The “Int” traffic was collected every five minutes, totaling 19,888 sets of data. The traffic of both datasets represents the size of packets delivered in a wireless network link at a specific time, where both datasets correspond to application scenarios that can be used to predict time-series network traffic. The specific traffic size and trend distributions are represented in Section 4.6.2 in orange, and it can be observed that both datasets have nonlinear, periodic, and bursty phenomena.

4.2. Metrics

To evaluate the accuracy of the individual models on wireless network traffic, the root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are adopted as metrics. RMSE represents the error between the true and predicted values of the wireless network traffic, MAE stands for is the mean absolute value of the difference between the predicted values of the traffic and the observed values of the actual traffic, and MAPE represents the mean percentage of the relative error between the predicted and actual values of the traffic. The values of all three metrics are non-negative and the closer to 0, the better the performance of the model.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(25)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(26)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} | \frac{(y_{i} - {\hat{y}}_{i})}{y_{i}} |

(27)

where

y_{i}

represents the true value,

{\hat{y}}_{i}

represents the predicted value, and

n

represents the number of samples.

4.3. WOA-VMD Analysis of Wireless Traffic Sequences

The WOA can adaptively obtain the optimal parameter combinations

K

and

α

of the VMD algorithm. In each iteration, the WOA algorithm updates the position of the solution according to the relationship between the position of the current solution and the optimal solution and calculates the new fitness value. Through many iterations, the WOA algorithm will gradually approach the optimal parameter combination. The WOA initializes the parameters as follows. The dimension is 2, the number of whale populations is 5, the maximum number of iterations is 15 and the quadratic penalty factor

α

is in the interval [100, 100,000]. The number of modes

K

for decomposition is in the interval [3, 10]. Figure 8 shows the optimal fitness optimization curve, the decomposed mode number optimization curve, and the penalty factor optimization curve of the “Isp” traffic sequences.

As shown in Figure 8a, we take the value of envelope entropy as the optimization objective because the magnitude of the envelope entropy measures the complexity of the pattern. Therefore, as the iteration increases, it can be seen that the value of the envelope entropy gradually decreases and the point with the lowest complexity can be found at the minimum value. The minimum value of the envelope entropy first appears in the twelfth iteration, and the value is 4.2134. From Figure 8b,c, we can find the optimal number of mode decomposition and penalty factor by the iteration values corresponding to the minimum envelope entropy. Therefore, the optimal parameter combinations, namely

K

and

α

, are 3 and 6969, respectively. These values are employed to decompose the original “Isp” traffic data by VMD. Next, the waveforms of the IMF components at different scales are shown in Figure 9. A total of three IMF components are decomposed without additional spurious components or interference from modal aliasing phenomena. IMF1 has the largest average amplitude and lower frequency, which is closest to the characteristics of the original data, and IMF2 and IMF3 have certain periodicity. Consequently, the results show that the decomposition effect of WOA-VMD is good.

The same parameter settings steps are used in the “Int” traffic data, which are shown in Figure 10. The minimum value of the envelope entropy is 4.3074. Furthermore, the optimal

K

and

α

values are 10 and 69,395, respectively.

4.4. Parameter Tuning and Settings

This experiment uses the Python 3.6, TensorFlow 1.5.0, and Keras 2.1.6 as a framework. An Intel Core i7-12700H CPU processor manufactured by Intel Corporation and an NVIDIA GeForce RTX 3060 GPU manufactured by NVIDIA Corporation from Santa Clara, CA, USA, are used on the experimental platform. In both datasets, the data are collected at five-point intervals in the original sampling sequences. The ratio of the training set, validation set, and test set is 4:1:1 for each sequence, respectively, where the validation set is used for hyperparameter tuning. The dropout is 0.1, and the batch size and the number of epochs of all the models are set to 64 and 150, respectively.

We use the manual tuning method to select the hyperparameters of the prediction model and discriminate the model accuracy by the goodness-of-fit. Each time we set different parameter values and run the code, we observe the change in the goodness-of-fit through several trials and select the model parameters corresponding to the best fit from a large number of trials. We have compared the value of validation loss with increasing epochs at different learning rates, which is set to 0.01 based on the performance and speed of the model. Next, we compared the effect of the Adam optimizer, RMSProp optimizer, and SGD optimizer on the validation loss and selected the Adam optimizer with the fastest convergence and lowest loss. In Section 4.3, we find the optimal hyperparameters for the VMD decomposition by the WOA-VMD optimization curve figure. Through training and validation, in the TCN module, there are three residual blocks, and the number and size of convolutional kernels in each residual block are 64 and 3, respectively. The dilation rate of the TCN is 3. In the Pe-Transformer module, the number of encoder layers is 3, and the number of ProbSparse self-attention heads is 4.

4.5. Baselines

(1): ARIMA [6]: ARIMA is a traditional linear regression. The model is built through autoregression, moving average, and differential transformation of traffic data to capture short-term dependencies.
(2): LSTM [7]: LSTM is a variant of RNN that captures long-term dependencies in wireless traffic sequences.
(3): GRU [7]: The GRU is a simplified model of LSTM that reduces the number of parameters to achieve higher computational efficiency and comparable performance to those of LSTM.
(4): TCN [12]: The TCN is a structural variant of the CNN, where multiple convolutional layers are stacked to efficiently capture local features.
(5): Transformer [13]: The Transformer specializes in capturing the long dependencies based on an attention mechanism.
(6): Informer [14]: The Informer is an improved variant of the Transformer-based model, which is suitable for extracting global features of wireless traffic sequences.
(7): ST-LSTM [11]: ST-LSTM is an advanced model that combines a TCN and LSTM with a denoising module for wireless traffic prediction to capture short-term and long-term dependencies, respectively.

4.6. Experimental Result Analysis

4.6.1. Comparison with Baseline Models

The experiments are implemented to compare the performance of WVETT-Net and the baseline models on two datasets, “Int” and “Isp”. Every fifth point in the original sequences is taken as a time step, i.e., 25 min as the neighboring sampling interval. In multistep wireless traffic prediction, a 900 min traffic sequence in the datasets is used as input to predict the 225 min traffic sequence in the future, and the prediction results of multiple time steps in the three metrics are shown in Table 1. Furthermore, the variation in each prediction time step is specifically depicted in Figure 11, Figure 12 and Figure 13, where different colored curves represent the different models.

In Table 1, it can be observed that the ARIMA model exhibits the worst overall performance with the different metrics and the largest increase in error as the time step increases, thus indicating that the ARIMA model is unable to capture complex nonlinear relationships well and has a poor ability to learn long-term dependent features. Figure 11, Figure 12 and Figure 13 show that both the GRU and LSTM outperform the linear ARIMA model in wireless traffic prediction because of their gating mechanisms and memory units, which are better able to handle long-term sequences and nonlinear features. Furthermore, the TCN and Transformer achieve better results in prediction than the above three models due to the unique convolutional structure and self-attention mechanisms, respectively. Informer presents the best results in a single model, which indicates a strong ability in global feature extraction. The ST-LSTM model generally outperforms any of the other single prediction models, which suggests a better effect in the extraction of both the short-term and long-term features. With a gradual increase in the time step, the RMSE and MAPE of WVETT-Net are somewhat bigger than those of the ST-LSTM approach. However, as a whole, WVETT-Net shows a smaller increase in error metrics and exhibits better predictive performance, thus reflecting that the proposed model in this paper can better understand the complex structure and patterns of traffic data than baseline models.

4.6.2. Accuracy Analysis

In Figure 14 and Figure 15, the test values are compared with the real values in the test set of both datasets over a longer time horizon, where the blue curve represents the real wireless traffic values and the orange curve shows the predicted values of WVETT-Net. The experimental results reveal that the WVETT-Net model correctly predicts the various trends of changes on the different datasets and that the accuracy of the prediction of the test values is high. Although the prediction error becomes larger when some of the traffic changes suddenly are large, WVETT-Net has a relatively good overall prediction performance.

4.6.3. Ablation Experiments

In Figure 16 and Figure 17, the effect of WVETT-Net on the prediction performance is examined by removing its various modules. Without changing the other modules, the models with the WOA and WOA-VMD algorithm modules removed are called VETT-Net and ETT-Net, respectively. The models with the ECA and Pe-Transformer modules removed are called WVTT-Net and WVTNE-Net, respectively; the models with the TCN module removed are called WVTPE-Net; and the models with the TCN removed, and replacing Pe-Transformer with the traditional Transformer, are called WVTME-Net. In the ablation experiments, the overall performance of the model with various modules removed is lower than that of the WVETT-Net model in terms of both the RMSE and MAPE metrics, where the performances of the VETT-Net and ETT-Net models can verify that the VMD algorithm can achieve noise reduction and that the WOA can assist the VMD algorithm in achieving a better decomposition effect. In addition, the predictive accuracy of WVTPE-Net and WVTNE-Net, which lack local and global feature captures, decreases to different degrees as the number of time steps increases compared with that of the original model. Then, the comparison of the error metrics of WVTME-Net and WVTPE-Net on the two datasets reveals that the multi-head ProbSparse self-attention mechanism prevents overfitting and improves the prediction effect. Moreover, WVTT-Net verifies the role of the ECA module in important information extraction tasks. In summary, the effectiveness of each module for accuracy improvement has been verified.

4.6.4. Time Complexity Analysis

In Table 2, the computation times of multiple models are compared to evaluate the time complexity of WVETT-Net and to test whether the introduced multi-head ProbSparse module achieves a reduction in computational complexity. In the comparison of the models, the WOA-VMD (WV) module is uniformly added to exclude the effect of the nonexperimental research variables, and WVEHT-Net denotes the model in which the multi-head ProbSparse self-attention mechanism is replaced with the multi-head full self-attention mechanism. The results indicate that WVETT-Net has the lowest inference and training time, thus reflecting that the proposed model possesses a low time complexity in the combined model and that the multi-head ProbSparse attention mechanism has the effect of reducing the computational effort.

5. Conclusions

In this paper, we propose an improved hybrid prediction model based on WOA-VMD for the problem of inadequate feature extraction of wireless network traffic in the temporal dimension. Ablation experiments on two datasets demonstrate that the WOA-VMD module in WVETT-Net can sufficiently reduce the effect of noise and achieve the best mode decomposition. The combination of the TCN, Pe-Transformer, and ECA modules comprehensively extracts multilevel features in the wireless traffic sequences. The multi-head ProbSparse self-attention mechanism reduces the calculation complexity while maintaining, or even improving, the original performance. Compared to the existing commonly used single and combined models, the WVETT-Net model achieves better multistep prediction performances on both datasets, enabling more effective prediction of wireless network traffic. In addition, WVETT-Net has several limitations, and the prediction accuracy decreases notably in the face of sudden changes in the network traffic. In addition, since it is a combination of multiple models, the complexity of the models may be greater than that of the traditional single models, thus affecting the speed of training and inference. The next step of this work will consider the influence of additional external factors on wireless network traffic, such as the weather and holidays. Meanwhile, the wireless traffic characteristics of the multiarea network will be considered for multidimensional and highly accurate prediction work, along with the application of transfer learning and federated learning.

Author Contributions

Conceptualization, J.G. and A.Z.; methodology, J.G. and W.Y.; software, J.L.; validation, C.T.; formal analysis, J.L.; investigation, A.Z.; resources, J.G.; data curation, J.L.; writing—original draft preparation, J.G.; writing—review and editing, J.G.; visualization, A.Z.; supervision, C.T.; project administration, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Key Research and Development Program of China, grant number 2022YFB2901402 and number 2023YFB2904101.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors would like to thank their teachers for their support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Han, G.; Wang, H.; Miao, X.; Liu, L.; Jiang, J.; Peng, Y. A dynamic multipath scheme for protecting source-location privacy using multiple sinks in WSNs intended for IIoT. IEEE Trans. Ind. Inform. 2019, 16, 5527–5538. [Google Scholar] [CrossRef]
Wu, J.; Li, Y.; Zhuang, H.; Pan, Z.; Wang, G.; Xian, Y. SMDP-based sleep policy for base stations in heterogeneous cellular networks. Digit. Commun. Netw. 2021, 7, 120–130. [Google Scholar] [CrossRef]
Wu, Q.; Chen, X.; Zhou, Z.; Chen, L.; Zhang, J. Deep reinforcement learning with spatio-temporal traffic forecasting for data-driven base station sleep control. IEEE ACM Trans. Netw. 2021, 29, 935–948. [Google Scholar] [CrossRef]
Zhu, Y.; Wang, S. Joint traffic prediction and base station sleeping for energy saving in cellular networks. In Proceedings of the ICC 2021—IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
Gan, M.; Peng, H. Stability analysis of RBF network-based state-dependent autoregressive model for nonlinear time series. Appl. Soft Comput. 2012, 12, 174–181. [Google Scholar] [CrossRef]
Moayedi, H.Z.; Masnadi-Shirazi, M.A. Arima model for network traffic prediction and anomaly detection. In Proceedings of the 2008 International Symposium on Information Technology, Kuala Lumpur, Malaysia, 26–28 August 2008; pp. 1–6. [Google Scholar]
Ramakrishnan, N.; Soni, T. Network traffic prediction using recurrent neural networks. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 187–193. [Google Scholar]
Gao, Y.; Zhang, M.; Chen, J.; Han, J.; Li, D.; Qiu, R. Accurate load prediction algorithms assisted with machine learning for network traffic. In Proceedings of the 2021 International Wireless Communications and Mobile Computing (IWCMC), Harbin, China, 28 June–2 July 2021; pp. 1683–1688. [Google Scholar]
Shawel, B.S.; Debella, T.T.; Tesfaye, G.; Tefera, Y.Y.; Woldegebreal, D.H. Hybrid prediction model for mobile data traffic: A cluster-level approach. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Wang, Z.; Hu, J.; Min, G.; Zhao, Z.; Wang, J. Data-augmentation-based cellular traffic prediction in edge-computing-enabled smart city. IEEE Trans. Ind. Inform. 2020, 17, 4179–4187. [Google Scholar] [CrossRef]
Bi, J.; Zhang, X.; Yuan, H.; Zhang, J.; Zhou, M. A hybrid prediction method for realistic network traffic with temporal convolutional network and LSTM. IEEE Trans. Autom. Sci. Eng. 2021, 19, 1869–1879. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
He, F.; Zhou, J.; Feng, Z.K.; Liu, G.; Yang, Y. A hybrid short-term load forecasting model based on variational mode decomposition and long short-term memory networks considering relevant factors with Bayesian optimization algorithm. Appl. Energy 2019, 237, 103–116. [Google Scholar] [CrossRef]
Yu, Y.; Shang, Q.; Xie, T. A hybrid model for short-term traffic flow prediction based on variational mode decomposition, wavelet threshold denoising, and long short-term memory neural network. Complexity 2021, 2021, 7756299. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Hachemi, M.L.; Ghomari, A.; Hadjadj-Aoul, Y.; Rubino, G. Mobile traffic forecasting using a combined FFT/LSTM strategy in SDN networks. In Proceedings of the 2021 IEEE 22nd International Conference on High Performance Switching and Routing (HPSR), Paris, France, 7–10 June 2021; pp. 1–6. [Google Scholar]
Guo, D.; Xia, X.; Zhu, L.; Zhang, Y. Dynamic modification neural network model for short-term traffic prediction. Procedia Comput. Sci. 2021, 187, 134–139. [Google Scholar] [CrossRef]
Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
Zhang, C.; Patras, P. Long-term mobile traffic forecasting using deep spatio-temporal neural networks. In Proceedings of the Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing, Los Angeles, CA, USA, 26–29 June 2018; pp. 231–240. [Google Scholar]
Lin, C.Y.; Su, H.T.; Tung, S.L.; Hsu, W.H. Multivariate and propagation graph attention network for spatial-temporal prediction with outdoor cellular traffic. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, online, 1–5 November 2021; pp. 3248–3252. [Google Scholar]
Gao, Y.; Wei, X.; Zhou, L.; Lv, H. A deep learning framework with spatial-temporal attention mechanism for cellular traffic prediction. In Proceedings of the 2019 IEEE Globecom Workshops (GC Wkshps), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
Shen, W.; Zhang, H.; Guo, S.; Zhang, C. Time-wise attention aided convolutional neural network for data-driven cellular traffic prediction. IEEE Wirel. Commun. Lett. 2021, 10, 1747–1751. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. arXiv 2019, arXiv:1907.00235. [Google Scholar]
Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-document transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]

Figure 1. Flow chart of WOA-VMD.

Figure 2. Dilated causal convolutional structure of a TCN.

Figure 3. TCN residual block structure.

Figure 4. Working principle of the efficient channel attention module.

Figure 5. ProbSparse self-attention mechanism.

Figure 6. Pe-Transformer structure.

Figure 7. Structure of the WVETT-Net prediction model.

Figure 8. WOA-VMD decomposition optimization curves of the “Isp” traffic sequences.

Figure 9. Decomposed waveform of the “Isp” traffic sequences.

Figure 10. WOA-VMD decomposition optimization curves of the “Int” traffic sequences.

Figure 11. RMSE of the variation of the time steps on “Int” and “Isp”.

Figure 12. MAE of the variation of the time steps on “Int” and “Isp”.

Figure 13. MAPE of the variation of the time steps on “Int” and “Isp”.

Figure 14. Wireless network traffic prediction results on “Int”.

Figure 15. Wireless network traffic prediction results on “Isp”.

Figure 16. Ablation experiments of the “Int’.

Figure 17. Ablation experiments of the “Isp”.

Table 1. Performance of WVETT-Net and baseline models on “Int” and “Isp”.

Dataset	Model	Step 1			Step 5			Step 9
		RMSE	MAE	MAPE	RMSE	MAE	MAPE	RMSE	MAE	MAPE
	ARIMA [6]	47.69	35.59	0.81%	79.20	51.85	2.25%	131.09	81.93	3.15%
	LSTM [7]	41.24	31.72	0.76%	66.53	43.57	1.71%	91.33	59.12	2.04%
Int	GRU [7]	40.53	31.17	0.73%	67.04	44.69	1.79%	85.92	57.28	1.97%
	TCN [12]	40.67	31.07	0.67%	55.28	38.58	1.28%	81.24	56.37	1.73%
	Transformer [13]	40.99	31.15	0.71%	55.23	39.83	1.25%	72.33	55.19	1.66%
	Informer [14]	40.64	30.15	0.65%	54.23	37.75	1.22%	71.35	51.21	1.63%
	ST-LSTM [11]	41.77	30.44	0.54%	43.47	31.96	1.09%	61.23	41.65	1.28%
	WVETT-Net	39.12	29.98	0.51%	46.34	32.83	1.03%	51.97	37.12	1.21%
	ARIMA [6]	65.60	41.51	0.91%	113.48	87.29	3.19%	192.35	128.23	4.11%
	LSTM [7]	51.97	32.87	0.77%	96.24	70.12	2.27%	123.07	95.06	3.09%
Isp	GRU [7]	55.26	39.16	0.79%	93.10	66.49	2.19%	126.42	98.37	2.98%
	TCN [12]	44.62	28.66	0.73%	81.85	56.27	1.84%	116.38	88.56	2.79%
	Transformer [13]	45.94	31.43	0.76%	79.42	61.98	1.87%	112.79	83.27	2.66%
	Informer [14]	44.15	25.43	0.71%	79.13	52.94	1.85%	105.83	79.36	2.59%
	ST-LSTM [11]	25.83	16.75	0.69%	58.64	42.56	1.38%	87.95	64.25	2.26%
	WVETT-Net	16.32	10.88	0.65%	54.79	39.13	1.31%	77.87	57.68	2.17%

Table 2. Comparison of the computation time on “Int” and “Isp”.

Dataset	Model	Computation Time
		Train (s/Epoch)	Inference (s/Epoch)
	WV-ST-LSTM	18.7	1.2
Int	WVEHT-Net	14.5	0.7
	WVETT-Net (ours)	12.3	0.3
	WV-ST-LSTM	15.6	1.0
Isp	WVEHT-Net	13.1	0.4
	WVETT-Net (ours)	10.9	0.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, J.; Tang, C.; Lu, J.; Zou, A.; Yang, W. WVETT-Net: A Novel Hybrid Prediction Model for Wireless Network Traffic Based on Variational Mode Decomposition. Electronics 2024, 13, 3109. https://doi.org/10.3390/electronics13163109

AMA Style

Guo J, Tang C, Lu J, Zou A, Yang W. WVETT-Net: A Novel Hybrid Prediction Model for Wireless Network Traffic Based on Variational Mode Decomposition. Electronics. 2024; 13(16):3109. https://doi.org/10.3390/electronics13163109

Chicago/Turabian Style

Guo, Jiayuan, Chaowei Tang, Jingwen Lu, Aobo Zou, and Wen Yang. 2024. "WVETT-Net: A Novel Hybrid Prediction Model for Wireless Network Traffic Based on Variational Mode Decomposition" Electronics 13, no. 16: 3109. https://doi.org/10.3390/electronics13163109

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

WVETT-Net: A Novel Hybrid Prediction Model for Wireless Network Traffic Based on Variational Mode Decomposition

Abstract

1. Introduction

2. Related Works

3. Proposed Method

3.1. Data Preprocessing Method

3.1.1. Variational Mode Decomposition

3.1.2. Whale Optimization Algorithm

3.1.3. WOA-VMD Algorithm

3.2. Model Prediction Method

3.2.1. Temporal Convolutional Network

3.2.2. Efficient Channel Attention

3.2.3. Pe-Transformer Network

3.2.4. WVETT-Net

4. Experiments

4.1. Dataset Description

4.2. Metrics

4.3. WOA-VMD Analysis of Wireless Traffic Sequences

4.4. Parameter Tuning and Settings

4.5. Baselines

4.6. Experimental Result Analysis

4.6.1. Comparison with Baseline Models

4.6.2. Accuracy Analysis

4.6.3. Ablation Experiments

4.6.4. Time Complexity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI