Civil Aviation Passenger Traffic Forecasting: Application and Comparative Study of the Seasonal Autoregressive Integrated Moving Average Model and Backpropagation Neural Network

Gu, Weifan; Guo, Baohua; Zhang, Zhezhe; Lu, He

doi:10.3390/su16104110

Open AccessArticle

Civil Aviation Passenger Traffic Forecasting: Application and Comparative Study of the Seasonal Autoregressive Integrated Moving Average Model and Backpropagation Neural Network

¹

School of Energy Science and Engineering, Henan Polytechnic University, Jiaozuo 454003, China

²

Jiaozuo Engineering Research Center of Road Traffic and Transportation, Henan Polytechnic University, Jiaozuo 454003, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(10), 4110; https://doi.org/10.3390/su16104110

Submission received: 2 April 2024 / Revised: 11 May 2024 / Accepted: 13 May 2024 / Published: 14 May 2024

(This article belongs to the Section Sustainable Transportation)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the rapid development of China’s aviation industry, the accurate prediction of civil aviation passenger volume is crucial to the sustainable development of the industry. However, the current prediction of civil aviation passenger traffic has not yet reached the ideal accuracy, so it is particularly important to improve the accuracy of prediction. This paper explores and compares the effectiveness of the backpropagation (BP) neural network model and the SARIMA model in predicting civil aviation passenger traffic. Firstly, this study utilizes data from 2006 to 2019, applies these two models separately to forecast civil aviation passenger traffic in 2019, and combines the two models to forecast the same period. Through comparing the mean relative error (MRE), mean square error (MSE), and root mean square error (RMSE), the prediction accuracies of the two single models and the combined model are evaluated, and the best prediction method is determined. Subsequently, using the data from 2006 to 2019, the optimal method is applied to forecast the civil aviation passenger traffic from 2020 to 2023. Finally, this paper compares the epidemic’s impact on civil aviation passenger traffic with the actual data. This paper improves the prediction accuracy of civil aviation passenger volume, and the research results have practical significance for understanding and evaluating the impact of the epidemic on the aviation industry.

Keywords:

BP neural network; SARIMA; combined model; civil aviation passenger traffic; epidemic

1. Introduction

China’s civil aviation passenger volume has always been one of the important indicators of national economic development and people’s living standards. With the issuance of the “14th Five-Year Plan for Civil Aviation Development”, China will embark on a new journey of building a strong civil aviation country in many fields. At the same time, with the rapid development of China’s economy and the improvement of people’s living standards, civil aviation passenger traffic also shows a trend of continuous growth. However, since the outbreak of the global epidemic at the end of 2019, people have been worried about the various impacts that the epidemic may bring, so they have reduced their travel, which has caused an unprecedented impact on the aviation industry, and the passenger volume of civil aviation has been greatly reduced.

Therefore, it is particularly important to accurately predict civil aviation passenger traffic. This helps not only to promote the sustainable development of China’s aviation industry, but also helps airlines to adjust flights and models in advance, thereby optimizing passenger carrying rate and improving operational efficiency. At the same time, it is also important to understand the impact of the epidemic on passenger traffic, which enables airlines to adjust their long-term and short-term strategies according to the actual situation. This includes the optimization and adjustment of the route network, the re-evaluation of the aircraft procurement plan and so on.

The primary task of the prediction of civil aviation passenger volume is to select the model. The common prediction methods include the time series analysis model, including the moving average method, the weighted moving average method, the simple exponential-smoothing method, and the ARIMA model. The most classic model is the ARIMA model, which is the most commonly used model in practical cases, and it is also one of the most widely used methods for univariate time series data prediction. It only needs endogenous variables without other exogenous variables. The neural network model is also often used to predict passenger volume. The BP neural network model is a commonly used artificial neural network model with strong nonlinear modeling capabilities. Through the backpropagation algorithm, the model can be trained and learned, thereby improving the prediction accuracy of the model, and can handle a large amount of data. Therefore, this paper chooses ARIMA and BP neural network models, through in-depth analysis of the characteristics of these two models, and combines them to achieve more accurate prediction results.

The main idea of this paper is to first apply the SARIMA model and the BP neural network model to predict and analyze the civil aviation passenger traffic from 2006 to 2019. Subsequently, the prediction residual of the SARIMA model is used as the input data of the BP neural network, and the BP neural network is used to represent the nonlinear characteristics of the civil aviation passenger volume to obtain more accurate prediction results. At the same time, this study also uses this method to predict the passenger volume of civil aviation during the epidemic period from 2020 to 2023, compares the predicted results with the actual data to analyze the specific impact of the epidemic on the passenger volume of civil aviation, and puts forward the corresponding suggestions accordingly. The main contributions of this paper are as follows:

(1): The SARIMA-BP combined model is used to predict the civil aviation passenger volume, improve the accuracy of civil aviation passenger volume prediction, make the airlines adjust their flights and models in advance, and improve operation efficiency.
(2): By predicting the passenger volume during the epidemic and comparing it with the actual passenger transport data, the impact of the epidemic on the passenger volume of civil aviation is demonstrated. These research results can provide some reference information for airlines to help them develop effective strategies and measures to cope with possible future challenges.

The rest of this paper is organized as follows: Section 2 presents the relevant literature. Section 3 describes the methods used to forecast civil aviation passenger traffic and data processing. Section 4 details how to use the SARIMA model, the BP neural network model, and the combined model. This section aims to explain how to use these models for forecasting and compare and analyze the results obtained. Section 5 presents the forecasting using this optimal methodology and analyses the impact of the epidemic on civil aviation passenger traffic by comparing it with the actual data. Section 6 presents the conclusions drawn. Section 7 contains some discussions summarizing the highlights and shortcomings of this paper.

2. Literature Review

Prediction methods are mainly divided into three categories: traditional time series analysis prediction, non-traditional time series analysis prediction, and prediction technology based on machine learning. In traditional time series analysis and prediction, the main methods include various regression models, the moving average method, the autoregressive integrated moving average method (ARIMA), the Holt–Winters method (also known as Winters’ method), and various exponential-smoothing methods. The demand for forecasting based on non-traditional time series analysis puts forward a relatively new forecasting method from the perspective of the multi-disciplinary integration of statistics, system dynamics, and grey system theory.

Although the traditional time series analysis method provides an effective prediction solution to a certain extent, with the exponential growth in the amount of data in the flight-booking process, the nonlinear trend, and the high irregularity and volatility in the data, these methods may encounter difficulties in dealing with modern high-complexity big data. Therefore, the machine learning method provides a new and effective solution to deal with these complex and volatile flight demand forecasting problems, with its powerful nonlinear modeling ability. The neural network model based on the error backpropagation algorithm is a kind of neural network model with a strong nonlinear mapping ability prediction model.

The prediction of civil aviation passenger volume has been widely studied. Scholars usually divide it into two methods: single-model prediction and combined-model prediction. There are many prediction methods for a single model. Yu et al. [1] used the GM (1,1) model to simulate the prediction of civil aviation passenger traffic and corrected it using the GM (1,1) residual model, proving the high accuracy of the prediction formula. Zhang et al. [2] used a BP neural network prediction model to forecast the passenger traffic of civil aviation in Beijing from four aspects: economy, tourism, competition, and airport operational capacity. The ELM prediction model was used to predict civil aviation passenger traffic by Chen et al. [3]. Wu et al. [4] used the LSTM prediction model to predict civil aviation passenger traffic. Their results show that the performance of the model is better than the existing fusion model and stable. Meng et al. [5] used a fuzzy diagonal regression neural network to forecast civil aviation passenger traffic. Ma et al. [6] used a multiple linear regression model to analyze the influencing factors of civil aviation passenger traffic in the Gansu province. Anupam et al. [7] used the NARX dynamic neural network to forecast civil aviation passenger traffic. Li used the SARIMA model and LSTM neural network for prediction, respectively, and the LSTM model was better in predicting the passenger traffic of civil aviation [8]. Kanavos et al. [9] developed an air travel demand estimation and forecasting model using the classical autoregressive integrated moving average (ARIMA), the seasonal approach (SARIMA), and a deep learning neural network (DLNN). In addition, many scholars [10,11,12,13,14] have also used the ARIMA model to forecast the passenger traffic of civil aviation.

Although individual-model prediction methods are straightforward to implement, they often have inherent shortcomings that lead to an insufficient prediction accuracy. Therefore, some scholars choose to use the combined model prediction method to improve the accuracy of their predictions. Chen et al. [15] utilized a combined SARIMA-LR model to forecast civil aviation passenger traffic and analyze the impact of the civil aviation industry during the epidemic. Gan et al. [16] employed a bi-directional LSTM model for prediction, resulting in a high prediction accuracy. Al-Sultan [17] considered a wide range of time series prediction models. An empirical analysis shows that the BSTS model is superior to other time series models in predicting complex time series. Hu [18] used the nonadditive Choquet fuzzy integral to combine the prediction of four commonly used univariate grey prediction models into combined prediction ones. Yao et al. [19] used a combined ARIMA-BP model to predict civil aviation passenger volume, but the modeling process was cumbersome. Yu et al. [20] used the ARIMA-BP combined model to forecast short-term traffic flows, which effectively reduced the error.

The COVID-19 pandemic has had a profound impact on the global development of civil aviation. Su et al. [21] examined the spatial distribution of outbreaks and civil aviation passenger throughput in China utilizing COVID-19 statistical data, alongside socioeconomic development data from various Chinese cities, and integrating the Moran index with econometric models. Deveci et al. [22] investigated the economic ramifications of COVID-19 on the civil aviation sector. Wojcik et al. [23] built a behavioral model of flu search based on survey data linked to users’ online browsing data. The research results of the above-selected parts of the literature are summarized in Table 1.

3. Research Methodology and Data

3.1. Data Source and Processing

This paper selects the monthly data of national civil aviation passenger traffic published by the National Bureau of Statistics from January 2006 to December 2019, through a collation and a summary, as shown in Figure 1.

According to the data shown in Figure 1, it can be observed that the distribution of data points is relatively continuous, and there are no obvious outliers or anomalies, so there is no need for data cleaning. In addition, each month’s data are complete, and there are no missing values, so there is no need for data replenishment processing.

3.2. SARIMA Model

SARIMA is a time series forecasting model for forecasting and analyzing data with seasonal patterns. It is an extension of the ARIMA model to handle seasonal data. Time series data with seasonal components can be supported. Three hyper-parameters

(P, D, Q)

are added to

A R I M A (p, d, q)

, as well as an additional seasonal cycle parameter

s

.

S A R I M A (p, d, q) {(P, D, Q)}_{s}

has a total of seven parameters, which can be classified into two categories, three non-seasonal parameters

(p, d, q)

and four seasonal parameters

{(P, D, Q)}_{s},

S A R I M A (p, d, q) {(P, D, Q)}_{s}

(1)

where

P

is the seasonal autoregression,

Q

is the non-seasonal autoregression,

p a n d q

are the maximum lag order of the moving average operator,

d

is the number of non-seasonal differentials, and

D

is the number of seasonal differentials.

ϕ_{(p)} (B) Φ_{(P)} (B_{s}) {(1 - B)}^{d} (1 - B_{s})^{D} y_{t} = θ_{(q)} (B) Θ_{(Q)} (B_{s}) ϵ_{t}

(2)

We performed

D

seasonal differencing (de-periodization) and d differencing (de-trending) on the time series

{y_{t}}

to obtain the new series

{x_{t}}

, then modeled the differenced

{x_{t}}

as follows:

ϕ_{(p)} (B) Φ_{(P)} (B_{s}) x_{t} = θ_{(q)} (B) Θ_{(Q)} (B_{s}) ϵ_{t}

(3)

where

ϕ_{(p)} (B)

and

θ_{(q)} (B)

are autoregressive and moving average polynomials.

Φ_{(P)} (B_{s})

and

Θ_{(Q)} (B_{s})

are polynomials in seasonal autoregression and the seasonal moving average.

y_{t}

is the observed value, and

ϵ_{t}

is the whiteout sound.

3.3. BP Neural Network Model

The backpropagation neural network is called the BP network, which has been widely used in various applications. It learns and stores a large number of input–output mode-mapping relations. The learning rule is to use the steepest descent method to iteratively adjust the weights and thresholds of the network through backpropagation to minimize the sum of squared errors. Because of the steepest descent method, the BP neural network can solve the problems of a slow learning convergence and a low learning efficiency.

3.3.1. Fundamentals

The neuron model is shown in Figure 2

A BP network consists of an input layer, a hidden layer, and an output layer. The input layer receives the input data, while the hidden layer processes the information. The output layer is the output of the message, which is the result we want. The weights from the input layer to the hidden layer are represented by

υ

while the weights from the hidden layer to the output layer are represented by

ω

.

In Figure 3, the model diagram depicts a neural network with a single hidden layer. The process of the BP neural network can be divided into two stages. The first stage involves the forward propagation of the signal, where the input data pass through the hidden layer and eventually reach the output layer. The second stage is the backward propagation of the error. The error is propagated from the output layer to the hidden layer and then to the input layer. This backward propagation allows for the adjustment of the weights and biases in the hidden layer and the weights in the input layer.

Backpropagation Algorithm

The neural network is trained by a backpropagation algorithm. The algorithm uses gradient descent to adjust the connection weights and biases by minimizing the error between the network output and the actual values. This process consists of iterative steps of forward propagation and backward updating of the weights.

Activation Functions

Common activation functions include Sigmoid, Tanh, ReLU, etc., which are used to introduce nonlinear factors so that the neural network can handle complex nonlinear relationships. The most used function at the moment is the Sigmoid (logistic) function, also known as the S-shaped growth curve, a function which works better when used for classifiers.

f (x) = \frac{1}{1 + e^{x}}

(4)

3.3.2. Training Process

Step 1 Input data: Input data from the training set is fed into the input layer of the network;

Step 2 Forward propagation: Calculate the output of each neuron through the forward propagation of the network;

Step 3 Calculate the error: Compare the network output with the actual value and calculate the error;

Step 4 Backpropagation: Backpropagate using the error information, calculate the gradient, and update the connection weights and bias according to the gradient;

Step 5 Repeat Iteration: Adjust the network parameters through multiple iterations of the training process until the error converges to a satisfactory level;

Therefore, the BP neural network has a strong nonlinear fitting ability and is suitable for complex problems; it has a strong learning ability and a good processing ability for large-scale data sets. However, it is sensitive to the initial weights and learning rate, and it may require larger training data when dealing with some specific problems.

3.4. SARIMA-BP Neural Network Forecasting Model

Due to the pronounced seasonal characteristics of civil aviation passenger traffic, this study initially employs the seasonal ARIMA (SARIMA) model to describe its linear components. However, the model’s predictive accuracy may be compromised when delineating time series changes, as the SARIMA model employs differencing to isolate linear factors and fails to adequately account for the nonlinear elements influencing time series fluctuations. The SARIMA model’s prediction error (residual) serves as the input for the BP neural network. This study utilizes the nonlinear BP neural network model to characterize the nonlinear aspects of civil aviation passenger transportation volume. Concurrently, this approach corrects the SARIMA model’s prediction residuals to enhance the prediction accuracy. The nonlinear BP neural network learns the residual prediction model through training, and the final prediction result is as follows:

y_{i} = a_{t} + e_{i}

(5)

where

e_{i}

, in this paper, is the corrected residual of the prediction SARIMA model, and

a_{t},

in this paper, is the predictions of the SARIMA model.

3.5. Evaluating Indicator

To better evaluate the error and bias of the prediction results and evaluate the performance of the prediction method, this study used five indicators:

E_{k}

,

M R E

,

R^{2}

M S E

, and

R M S E

. It is expressed by Equations (6)–(10).

E_{k} = | \frac{y_{k} - T_{k}}{T_{k}} | \times 100 %

(6)

M R E = \frac{1}{n} \sum_{k = 1}^{n} E_{k}

(7)

R^{2} = 1 - \frac{\sum_{k = 1}^{n} (T_{k} - y_{k})^{2}}{\sum_{k = 1}^{n} (T_{k} - c)^{2}}

(8)

M S E = \frac{1}{n} \sum_{k = 1}^{n} {(y_{k} - T_{k})}^{2}

(9)

R M S E = \sqrt{\frac{1}{n} \sum_{k = 1}^{n} {(y_{k} - T_{k})}^{2}}

(10)

where

y_{k}

, in this paper, is the predicted value of the model;

T_{k},

in this paper, is the true value;

E_{k}

is the relative error;

M R E

is the mean relative error;

R^{2}

is the coefficient of determination; and

M S E

is the mean square error, which can evaluate the degree of change in the data. The smaller the value of

M S E

, the better the accuracy of the prediction model to describe the experimental data. Meanwhile,

R M S E

is the root mean square error, which measures the deviation between the predicted value and the real value and is sensitive to the outliers in the data.

4. Model Application and Analysis of Results

4.1. Forecasting Civil Aviation Passenger Traffic Based on the SARIMA Model

4.1.1. Smoothness Test

In the line graph of the original series (Figure 1), we can observe that the data of the civil aviation passenger transportation volume show a growing trend with the increase in time, indicating that the time series has an obvious linear trend. By scrutinizing the line graph, we find that, after 12 time intervals, the series again shows the same fluctuation pattern, which indicates that the time series of civil aviation passenger traffic has a strong periodicity, where the cycle length is

S = 12

.

Since there is significant seasonal volatility in the civil aviation passenger traffic time series to eliminate the effects of seasonality and trend in the series, we label the original series as

X

. Firstly, we perform a seasonal differencing of the series with a step size of 12, denoted as

D (X, 0,12)

, as shown in Figure 4. Next, a first-order differencing with a step size of 1 is performed, denoted as

D (X, 1,12)

, as shown in Figure 5. These two operations help make the series smoother and easier for subsequent time series analysis and modeling.

Meanwhile, the ADF unit root test is performed on the sequence

D (X, 1,12)

after calculating the difference, and the test results are detailed in Table 2. The absolute values of the t-statistics are smaller than the corresponding t-values of the ADF test when the t-statistics are set to the 1%, 5%, and 10% levels, respectively. In addition, the probability p-value is 0.0000, which is significantly smaller than the usual significance level of 0.05. Combining the results of Figure 5 and the unit root test, it can be seen that the sequence

D (X, 1,12)

exhibits smooth properties.

4.1.2. Model Identification

We identified the model using Box–Jenkins’ model identification method. This method first assumes that the process of generating time series can be approximated by an ARMA model (if it is stationary) or an ARIMA model (if it is non-static). Two diagnostic charts can be used to help select the p and q parameters of ARMA or ARIMA, which are the autocorrelation function (ACF) and the partial autocorrelation function (PACF), respectively. The ACF plot summarizes the correlation between the observations and the lag values. The PACF plot summarizes the correlation of the observations with the lagged values, which are not explained by previous lagged observations. If the ACF drops sharply to near 0 and the PACF quickly converges to 0 when the time interval k is small, then we can use the MA model. If the PACF drops sharply to near 0 and the ACF quickly converges to 0 when the time interval k is small, then we can use the AR model. If the ACF and PACF do not decline sharply but eventually converge to 0, then it is more appropriate to use the ARMA model. A sharp decline refers to a cliff-like decline, does not mean convergence to 0, and may rise later.

The autocorrelation function (ACF) and partial autocorrelation function (PACF) of the

D (X, 1,12)

sequence are shown in Figure 6, with a p-value of less than 0.05 for a non-white noise sequence, which can be modeled; the autocorrelation sequences all converge to 0 after the second period, presenting a certain amount of trail; the partial autocorrelation sequences present a certain amount of trail; and, a preliminary decision is made, selecting the ARMA model.

4.1.3. Model Ordering and Parameter Estimation

In this section, we will analyze the ACF and PACF plots (determining

p

and

q

). The value of the autoregressive term

p

is determined using the PACF plot. In the PACF plot, if all the bars after delay k are close to zero, then

p = k

can be chosen. This means that the first significant non-zero delay in the PACF plot is a candidate value for

p

. The value of the moving average term

q

is determined using the ACF plot. In an ACF plot, if all the bars after delay

k

are close to zero, then

q = k

can be chosen. This means that the first significant non-zero delay in the ACF plot is a candidate value for

q

.

It can be seen from Figure 6 that the model parameters AR can be taken as 2, 11, and 12, and MA can be taken as 2, 3, 11, and 12. Since the autocorrelation function (ACF) of the time series shows a significant correlation at the first lag point after each seasonal cycle, a seasonal moving average term is needed to help the model capture this seasonal effect, so SMA takes 1. Through model debugging, we obtain the model parameters in Table 3.

T h e S A R I M A (12,1, 12) (0,1, 1)_{12}

model is more appropriate, and the

D (X, 1,12)

sequence is modeled as follows:

\begin{matrix} (1 + 0.2749 B^{2} - 0.3387 B^{11} + 0.2264 B^{12}) (1 - B) (1 - B^{12}) X_{t} \\ = (1 - 0.8547 B + 0.3141 B^{11} + 0.4920 B^{12}) (1 + 0.5239 B^{12}) ϵ_{t} \end{matrix}

(11)

4.1.4. Model Testing

Residual Analysis: According to Figure 7, the residual autocorrelation plot of the model’s residuals is examined; it is, indeed, white noise; and, there is no obvious pattern or trend.
Ljung–Box Test: According to the residual autocorrelation plot in Figure 8, the p-value is less than the significance level (usually 0.05), which indicates that there is autocorrelation in the residual series.
AIC Comparison: Using information criteria such as the Akaike Information Criterion (AIC), by comparing the fitting performance of different SARIMA models, the $S A R I M A (12,1, 12) (0,1, 1)_{12}$ model has the model with the minimum AIC.

Prediction Performance: The model is trained using historical data and then used to make predictions of future data. Table 4 shows the prediction results and the relative error. Figure 8 and Table 4 show that the relative error is small and that the predictive performance of the model is good.

4.2. Passenger Traffic Prediction Based on BP Neural Network Modeling

4.2.1. BP Neural Network Design

This paper selects sequence values from the first 12 periods to predict the values of the subsequent period. Specifically, the sequence values from periods 1–12 serve as the input to the network, while the sequence value of the 13th period is designated as the network’s output. Likewise, the sequence values from periods 2–13 serve as the input, with the sequence value of the 14th period being the output, and this pattern continues. According to the “Rule of Thumb,” the number of hidden-layer neurons is typically calculated as 2/3 of the number of input-layer neurons plus 1/3 of the number of output-layer neurons, resulting in either 8 or 9 neurons. Subsequently, an empirical approach is employed to determine the appropriate number of output-layer neurons, which, in this case, is set to nine. Finally, the network configuration consists of 12 input-layer neurons, 9 hidden-layer neurons, and 1 output-layer neuron, with the Sigmoid function selected as the activation function. The training process involves 5000 iterations, with an error threshold of 0.000001 and a learning rate of 0.01.

4.2.2. BP Neural Network Prediction Results

After training on the sample data, the network produces output values and their fitness with the actual values is illustrated in Table 5 and Figure 9. The relative error between the network’s output and the actual values from the BP neural network model training is minimal, suggesting that the neural network can be effectively applied to predicting China’s civil aviation passenger traffic.

4.3. SARIMA-BP Neural Network Prediction Model for Civil Aviation Passenger Traffic Volume

The results demonstrate that individual prediction methods exhibit limited accuracy. Therefore, the ARIMA-BP model combination is employed to forecast civil aviation passenger traffic volume. Residuals are derived from predictions using the seasonal ARIMA model, serving as the desired output for the BP neural network. Subsequently, the original civil aviation passenger traffic data are utilized for training, and the resulting data are fed into the BP neural network for learning modeling to obtain predicted residual sequence values. Finally, MATLAB 2023b outputs the prediction results of the combined SARIMA-BP model. As depicted in Figure 10, the predicted values closely align with the true values, leading to a significant reduction in prediction error and an enhancement in the model’s prediction accuracy.

4.4. Comparison and Analysis of Results

The relative errors of the three models were compared and analyzed and the results of the comparison are shown below (see Figure 11). The evaluation indicators (MRE, R², MSE, RMSE) of the three models are compared in Table 6.

Observing Figure 11 reveals that all the relative errors of the combined model are below 5 percent, whereas the individual prediction models exhibit some significant relative error values.

It can be observed from Table 6 that the prediction results of the combined model are in good agreement with the actual civil aviation passenger volume data. The average relative error is 1.6906%, and the R² value is as high as 0.9816, which is very close to 1, indicating that the model fits well. In addition, the mean square error (MSE) and root mean square error (RMSE) of the model are also significantly lower than other comparison models, which further proves its superiority. The SARIMA-BP model skillfully combines the advantages of the two models and effectively utilizes the prediction information of each model. This combination model greatly improves the accuracy of the prediction, thereby enhancing the reliability of the prediction results. Therefore, it was decided to use the SARIMA-BP model to predict the civil aviation passenger volume during the epidemic period (2020–2023).

5. Analysis of the Impact of the Epidemic on Passenger Transport Volume

We compared the forecast of civil aviation passenger traffic during the epidemic period (2020–2023) with the actual data, as shown in Figure 12.

We observed the severe impact of the epidemic on the aviation industry. Overall, civil aviation passenger traffic suffered significant losses totaling approximately 1347.2 million passengers, particularly in February 2022, when the outbreak losses peaked at 87.62 percent, with approximately 55.8 million passengers, and in February 2020, at the beginning of the outbreak, when the losses were also significant, with a reduction of 85.11 percent, along with approximately 49.81 million passengers lost. However, over time, especially at the beginning of 2023, we could see a gradual recovery in civil aviation passenger traffic, with the smallest loss of 13.37 percent in July 2023 and with a loss of about 9.64 million passengers, followed by a gradual return to normal levels.

Presently, with the risk of the epidemic receding and the steady growth in civil aviation passenger traffic, people’s willingness to travel abroad has increased significantly. Due to the constraints of road and railway transportation, airlines have the opportunity to attract more passengers choosing to fly by launching various promotional activities and improving cabin comfort. In addition, airlines can open up new routes according to changes in market demand or optimize or even discontinue existing routes to more effectively meet the needs of passengers and enhance their market competitiveness.

6. Conclusions

Based on the comparative study of the BP neural network model and the SARIMA model in predicting civil aviation passenger volume as well as the results of combining the two models for simultaneous prediction, we have drawn the following conclusions.

Firstly, when predicting the passenger volume of civil aviation in 2019, we found that the SARIMA-BP combination model performed the best, with a better prediction accuracy than using the BP neural network model or the SARIMA model alone. This shows that the accuracy and stability of prediction can be improved by combining multiple prediction methods according to the characteristics of a single model.

Secondly, for predicting civil aviation passenger volume from 2020 to 2023, we utilized the SARIMA-BP combination model, which had been validated as the best method. Through comparison with actual data, it was observed that the epidemic had significantly impacted the aviation industry, resulting in substantial losses in civil aviation passenger traffic. Particularly in July 2022, during the initial outbreak of the epidemic, the decline in civil aviation passenger traffic reached its peak. However, over time, especially in early 2023, the passenger volume of civil aviation gradually rebounded and eventually returned to normal levels. Airlines can adjust their long-term and short-term strategies according to the actual situation. This includes the optimization and adjustment of the route network, the re-evaluation of the aircraft procurement plan, and so on.

In summary, this study demonstrates the effectiveness of combination models in predicting civil aviation passenger volume and provides an in-depth analysis of the epidemic’s impact on the aviation industry. These findings offer valuable insights for airlines and government departments, enabling them to develop effective response strategies and measures to address similar crises which may arise in the future. Future research could focus on exploring alternative prediction models or integrating multiple methods to enhance the precision and stability of predictions, thereby better adapting to the ever-changing market environment.

7. Discussion

Civil aviation passenger volume shows a significant linear growth trend. The SARIMA model has a high prediction accuracy for time series with regular growth. At the same time, the BP neural network also shows an excellent prediction ability for nonlinear sequences. By combining these two models, we can further improve the accuracy of the prediction. The research literature shows that, compared with the single model, the combined prediction model can usually provide a higher accuracy. As shown in Table 7, the example verifies the advantages of the combined model in the prediction effect.

In this paper, the combination model of SARIMA and a BP neural network is used to predict the passenger volume of civil aviation, and the prediction accuracy is improved. However, this paper has the following shortcomings:

This paper does not try to use a variety of combinations in the prediction.
In the prediction of civil aviation passenger volume, this paper does not take into account the economic, demographic, and other external factors.
The amounts of data used in this paper are relatively limited, including only monthly data but not annual data.

In future research, we can consider introducing more external factors and expanding the types and quantities of data to improve the accuracy of prediction. In addition, the combined-model method can also be applied to the prediction of highway and railway passenger volume.

Author Contributions

Methodology, Z.Z.; software, H.L.; writing—original draft preparation, W.G.; writing—review and editing, B.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was carried out according to the ‘Helsinki Declaration’ and approved by the Institutional Review Board of the International Academy of Sciences.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data available in a publicly accessible repository that does not issue DOIs. Publicy available datasets were analyzed in this study. This data can be found here: https://xxgk.mot.gov.cn/jigou/?gk=5 (accessed on 12 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yu, F. Application of the Remnant Error Model GM(1,1) in Forecasting the Number of Passengers for Civil Aviation Transportation. J. Xihua Univ. (Nat. Sci. Ed.) 2006, 25, 29–30. [Google Scholar] [CrossRef]
Zhang, L.Y.; Guo, M. A BP neural network based passenger traffic prediction for civil aviation in Beijing. Hebei Enterp. 2020, 35–36. [Google Scholar] [CrossRef]
Chen, C.C.; Li, C.; Liu, C.L. Research on Passenger Traffic Forecasting Based on ELM Model. Logist. Sci-Tech 2020, 43, 105–107. [Google Scholar] [CrossRef]
Wu, X.; Xiang, Y.; Mao, G.; Du, M.; Yang, X.; Zhou, X. Forecasting air passenger traffic flow based on the two-phase learning model. J. Supercomput. 2021, 77, 4221–4243. [Google Scholar] [CrossRef]
Meng, J.J.; Yang, Z.Q. Civil aviation passenger traffic volume forecasting based on fuzzy diagonal regression neural networks. In Proceedings of the Multiconference on “Computational Engineering in Systems Applications”, Beijing, China, 4–6 October 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 1771–1775. [Google Scholar] [CrossRef]
Ma, Y.; Tang, Y.T. Analysis of influencing factors of civil aviation passenger traffic in Gansu Province based on multiple linear regression model. Gansu Sci-Tech 2020, 36, 46–49. [Google Scholar] [CrossRef]
Anupam, A.; Lawal, I.A. Forecasting air passenger travel: A case study of Norwegian aviation industry. J. Forecast. 2024, 43, 661–672. [Google Scholar] [CrossRef]
Li, S.R. Prediction of Air Passenger Traffic of SARIMA Model and LSTM Neural Network. Henan Sci-Tech 2021, 40, 18–21. [Google Scholar] [CrossRef]
Kanavos, A.; Kounelis, F.; Iliadis, L.; Makris, C. Deep learning models for forecasting aviation demand time series. Neural. Comput. Appl. 2021, 33, 16329–16343. [Google Scholar] [CrossRef]
Zhao, X.X.; Zhao, H.X. Empirical Analysis and Predictions of Civil Aviation Passenger Traffic Based on ARIMA model. In Proceedings of the 2015 3rd International Conference on Machinery, Materials and Information Technology Applications, Qingdao, China, 28–29 November 2015; Atlantis Press: Zhengzhou, China, 2015; pp. 1869–1873. [Google Scholar] [CrossRef]
Zheng, Y. ARIMA adjustment and regression analysis on time series in aviation industry. J. Qiqihar Univ. (Nat. Sci. Ed.) 2010, 26, 82–85. [Google Scholar] [CrossRef]
Pan, L.; Zhu, H.Q. An empirical study of civil aviation passenger traffic based on the product seasonal model. China Mark. 2014, 2014, 115–117. [Google Scholar] [CrossRef]
Chudy-Laskowska, K.; Pisula, T. Seasonal forecasting for air passenger trafic. In Proceedings of the 4th International Multidisciplinary Sicientific Conference on Social Sciences and Arts SGEM 2017, Albena, Bulgaria, 24–30 August 2017; pp. 681–692. [Google Scholar] [CrossRef]
Wang, T. Modeling and Prediction of Civil Aeronautic Passenger Capacity by Using ARIMA Models. J. Wuyi Univ. (Nat. Sci. Ed.) 2007, 21, 38–42. [Google Scholar] [CrossRef]
Chen, B.; Liu, J.; Ruan, Z.; Yue, M.; Long, H.; Yao, W. Freight traffic of civil aviation volume forecast based on hybrid ARIMA-LR model. In Proceedings of the International Conference on Smart Transportation and City Engineering (STCE 2022), Chongqing, China, 12–14 August 2022; SPIE: Bellingham, WA, USA, 2022; pp. 682–689. [Google Scholar] [CrossRef]
Gan, G.Y.; You, J.G.; Zhang, T. Forecast of civil aviation passenger volume based on bidirectional LSTM. Mod. Electron. Tech. 2022, 45, 175–180. [Google Scholar] [CrossRef]
Al-Sultan, A.; Al-Rubkhi, A.; Alsaber, A.; Pan, J. Forecasting air passenger traffic volume: Evaluating time series models in long-term forecasting of Kuwait air passenger data. Adv. Appl. Stat. 2021, 70, 69–89. [Google Scholar] [CrossRef]
Hu, Y. Air passenger flow forecasting using nonadditive forecast combination with grey prediction. J. Air Transp. Manag. 2023, 112, 102439. [Google Scholar] [CrossRef]
Yao, Y.; Tao, J.; Li, Y. Prediction of Civil Aviation Passenger Transport Volume Based on ARIMA-BP Combined Model. Comput. Technol. Dev. 2015, 25, 147–151. [Google Scholar]
Yu, G.; Zhang, C. Switching ARIMA model based forecasting for traffic flow. In Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada, 17–21 May 2004; IEEE: Piscataway, NJ, USA, 2004; p. 429. [Google Scholar] [CrossRef]
Su, M.; Hu, B.; Luan, W.; Tian, C. Effects of COVID-19 on China’s civil aviation passenger transport market. Res. Transp. Econ. 2022, 96, 101217. [Google Scholar] [CrossRef]
Deveci, M.; Çiftçi, M.E.; Akyurt, İ.Z.; Gonzalez, E.D.S. Impact of COVID-19 pandemic on the Turkish civil aviation industry. Sustain. Oper. Comput. 2022, 3, 93–102. [Google Scholar] [CrossRef]
Wojcik, S.; Bijral, A.S.; Johnston, R.; Lavista Ferres, J.M.; King, G.; Kennedy, R.; Vespignani, A.; Lazer, D. Survey data and human computation for improved flu tracking. Nat. Commun. 2021, 12, 194. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The trend in China’s civil aviation passenger traffic from January 2006 to December 2019.

Figure 2. Neuron model.

Figure 3. BP neural network model.

Figure 4. Seasonal difference sequence

D (X, 0,12)

sequence plot.

Figure 4. Seasonal difference sequence

D (X, 0,12)

sequence plot.

Figure 5. Sequence diagram of a first-order difference sequence

D (X, 1,12)

.

Figure 5. Sequence diagram of a first-order difference sequence

D (X, 1,12)

.

Figure 6. Autocorrelation-biased autocorrelation plot for sequence

D (X, 1,12)

.

Figure 6. Autocorrelation-biased autocorrelation plot for sequence

D (X, 1,12)

.

Figure 7. Residual autocorrelation plot.

Figure 8. SARIMA model fitting effect diagram.

Figure 9. BP neural network model fitting effect diagram.

Figure 10. Combined model prediction result fitting effect plot.

Figure 11. Relative error comparison plot.

Figure 12. Comparison between predicted and actual data from 2020 to 2023.

Table 1. The research results of some works in the literature.

Model Method	Model	Literature Serial Number	Research Object	Research Conclusions
Single-model prediction	BP neural network	[2]	Beijing civil aviation passenger volume	In the short-term forecasting of civil aviation passenger traffic, the BP neural network model can be selected.
	ELM	[3]	Annual civil aviation passenger volume	The predicted value obtained by the ELM model is closer to the real value.
	LSTM	[4,8]	Annual civil aviation passenger volume	The LSTM neural network has a good nonlinear time series prediction ability.
	ARIMA	[10,11,12,13,14]	Annual civil aviation passenger volume	The ARIMA model has a good fitting effect on the original data sequence.
Combined-model prediction	SARIMA-LR	[15]	Monthly civil aviation passenger traffic	The combined model improves the prediction accuracy.
	Bi-directional LSTM	[16]	Civil aviation passenger of the Kunming–Xishuangbanna route	The prediction accuracy of the model is high and feasible.
	Nonadditive forecast combination—Grey	[18]	Annual civil aviation passenger volume	It is noticeably superior to other single models.
	ARIMA-BP	[20]	Short-term forecasting of traffic flows	It effectively reduces the error.

Table 2. Unit root test for the sequence

D (X, 1,12)

.

Table 2. Unit root test for the sequence

D (X, 1,12)

.

		T Statistic	p-Value
ADF test value	Significance level	−4.315059	0.0000
	1% level	−2.581233
	5% level	−1.943074
	10% level	−1.615231

Table 3. Model parameter estimates.

Variant	Regression Coefficient	Standard Error	T Statistic	p-Value
AR(2)	−0.274916	0.094763	−2.901080	0.0043
AR(11)	0.338720	0.100099	3.383835	0.0009
AR(12)	−0.226425	0.082362	−2.749142	0.0067
MA(2)	0.250647	0.088377	2.836128	0.0052
MA(11)	−0.314189	0.084246	−3.729411	0.0003
MA(12)	−0.492060	0.101420	−4.851709	0.0000
SMA(1)	−0.523928	0.073160	−7.161437	0.0000
SIGMASQ	7376.725	887.7838	8.309146	0.0000
AIC Guidelines				11.91021
SC Guidelines				12.06729
H-Q Information Guidelines				11.97401

Table 4. SARIMA model prediction results.

Time	True Value	Predicted Value	Relative Error
2019.6	5341.4	5317.860	0.441%
2019.7	5930.4	5812.355	1.9910%
2019.8	6123.8	6096.753	0.4420%
2019.9	5475.4	5598.248	2.2436%
2019.10	5698.2	5780.196	1.4390%
2019.11	5305.8	5370.382	1.2172%
2019.12	5276	5461.593	3.5177%
Mean Relative Error			2.3070%

Table 5. BP neural network model prediction results.

Time	True Value	Predicted Value	Relative Error
2019.6	5341.4	5358.826	0.3262%
2019.7	5930.4	5779.102	2.5510%
2019.8	6123.8	6237.522	1.857%
2019.9	5475.4	5411.537	1.166%
2019.10	5698.2	5806.595	1.9023%
2019.11	5305.8	5406.492	1.8978%
2019.12	5276	5196.375	1.5090%
Mean Relative Error			2.6002%

Table 6. Comparison of the prediction results of the three models.

Model	MRE	R²	MSE	RMSE
SARIMA Model	2.3070%	0.9734	7356.311	85.76894
BP Neural Network Model	2.6002%	0.9751	9475.743	97.34343
SARIMA-BP model	1.6906%	0.9816	4663.478	68.28966

Table 7. Method comparison.

Literature Serial Number	Model	Research Object	Research Conclusions
[15]	SARIMA-LR	Monthly civil aviation passenger traffic	The combined model improves the prediction accuracy.
[18]	Nonadditive forecast combination—Grey	Annual civil aviation passenger volume	It is noticeably superior to other single models.
[20]	ARIMA-BP	Short-term forecasting of traffic flows	The prediction accuracy is improved compared to that of a single model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, W.; Guo, B.; Zhang, Z.; Lu, H. Civil Aviation Passenger Traffic Forecasting: Application and Comparative Study of the Seasonal Autoregressive Integrated Moving Average Model and Backpropagation Neural Network. Sustainability 2024, 16, 4110. https://doi.org/10.3390/su16104110

AMA Style

Gu W, Guo B, Zhang Z, Lu H. Civil Aviation Passenger Traffic Forecasting: Application and Comparative Study of the Seasonal Autoregressive Integrated Moving Average Model and Backpropagation Neural Network. Sustainability. 2024; 16(10):4110. https://doi.org/10.3390/su16104110

Chicago/Turabian Style

Gu, Weifan, Baohua Guo, Zhezhe Zhang, and He Lu. 2024. "Civil Aviation Passenger Traffic Forecasting: Application and Comparative Study of the Seasonal Autoregressive Integrated Moving Average Model and Backpropagation Neural Network" Sustainability 16, no. 10: 4110. https://doi.org/10.3390/su16104110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Civil Aviation Passenger Traffic Forecasting: Application and Comparative Study of the Seasonal Autoregressive Integrated Moving Average Model and Backpropagation Neural Network

Abstract

1. Introduction

2. Literature Review

3. Research Methodology and Data

3.1. Data Source and Processing

3.2. SARIMA Model

3.3. BP Neural Network Model

3.3.1. Fundamentals

3.3.2. Training Process

3.4. SARIMA-BP Neural Network Forecasting Model

3.5. Evaluating Indicator

4. Model Application and Analysis of Results

4.1. Forecasting Civil Aviation Passenger Traffic Based on the SARIMA Model

4.1.1. Smoothness Test

4.1.2. Model Identification

4.1.3. Model Ordering and Parameter Estimation

4.1.4. Model Testing

4.2. Passenger Traffic Prediction Based on BP Neural Network Modeling

4.2.1. BP Neural Network Design

4.2.2. BP Neural Network Prediction Results

4.3. SARIMA-BP Neural Network Prediction Model for Civil Aviation Passenger Traffic Volume

4.4. Comparison and Analysis of Results

5. Analysis of the Impact of the Epidemic on Passenger Transport Volume

6. Conclusions

7. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI