*Article*

## **Passenger Flow Forecasting in Metro Transfer Station Based on the Combination of Singular Spectrum Analysis and AdaBoost-Weighted Extreme Learning Machine**

#### **Wei Zhou 1,2,3, Wei Wang 1,2,3,\* and De Zhao 1,2,3**


Received: 19 May 2020; Accepted: 19 June 2020; Published: 23 June 2020

**Abstract:** The metro system plays an important role in urban public transit, and the passenger flow forecasting is fundamental to assisting operators establishing an intelligent transport system (ITS). The forecasting results can provide necessary information for travelling decision of travelers and metro operations of managers. In order to investigate the inner characteristics of passenger flow and make a more accurate prediction with less training time, a novel model (i.e., SSA-AWELM), a combination of singular spectrum analysis (SSA) and AdaBoost-weighted extreme learning machine (AWELM), is proposed in this paper. SSA is developed to decompose the original data into three components of trend, periodicity, and residue. AWELM is developed to forecast each component desperately. The three predicted results are summed as the final outcomes. In the experiments, the dataset is collected from the automatic fare collection (AFC) system of Hangzhou metro in China. We extracted three weeks of passenger flow to carry out multistep prediction tests and a comparison analysis. The results indicate that the proposed SSA-AWELM model can reduce both predicted errors and training time. In particular, compared with the prevalent deep-learning model long short-term memory (LSTM) neural network, SSA-AWELM has reduced the testing errors by 22% and saved time by 84%, on average. It demonstrates that SSA-AWELM is a promising approach for passenger flow forecasting.

**Keywords:** automatic fare collection system; passenger flow forecasting; time series decomposition; singular spectrum analysis; ensemble learning; extreme learning machine

## **1. Introduction**

As an import part in urban public transit, metro transit has developed rapidly and attracted a quantity of passengers in recent years. It is a grea<sup>t</sup> challenge for operators and design-makers to optimize the metro schedules and organize the passengers in the stations effectively. Accurate and timely short-term passenger flow forecasting is the fundament of intelligent transport systems (ITS) [1]. The prediction results not only offer evidence for passenger guidance to prevent congestion and trampling [2] but, also, provide necessary information for the metro schedule coordination scheme to match the metro capacity with the passenger flow demand.

As the connections of different metro lines, transfer stations are crucial in metro networks. Some researchers utilized the complex network theory to investigate the characteristics of the metro networks such as Beijing [3], Shanghai [4], Guangzhou [5], and some other cities [6]. The findings

of their studies indicated that transfer stations played the most significant role in the networks. Some of them [3,4] suggested that the transfer stations should be paid more attention to. In addition, the passenger flow in the transfer station is usually much larger than that in a regular station, and the passenger flow increases more rapidly at the rush hours in the morning and evening. This is because transfer stations are usually located in areas with large travel demands—for instance, a transportation hub, business district, and so forth. Therefore, in order to avoid pedestrian congestion or early warnings of burst passenger flows for operators, it is vital to forecast the passenger flow accurately and timely in a transfer station.

The passenger flow is defined as the number of boarding or alighting pedestrians at the target station during a constant interval in the prediction tasks [7,8]. In previous studies, the collection of passenger flows mainly includes two ways, as follows:


The task of passenger flow prediction is quite similar to traffic flow prediction [7,8,12,16], which is only different in the input data of the models. Therefore, many practical models of traffic flow prediction could be referred to as well. In the studies to date, the passenger/traffic flow prediction approaches are roughly classified into four categories, as listed below:


the deep-learning models will require significant resources and training time [32]. In addition, these models are usually regraded as a "black box" [23] and lack interpretability of the results [32].

In recent studies, the combination of time series decomposition approaches is a novel research interest of the hybrid models to make a better predictive performance. The principle of this kind of model is that a complicated time series can be simplified through disaggregating the sequence into multiple frequency components. The decomposed components are forecasted separately, and then, these predicted results are summed as the final outcomes. The widely used time series decomposition methods include: wavelet decomposition (WD) [25,33], empirical mode decomposition (EMD) [2,26,34], Seasonal and Trend Decomposition Using Loess (STL) [35,36], singular spectrum analysis (SSA) [37–39], and so on. Sun et al. [25] and Liu et al. [33] employed the WD approach to decompose the original passenger flow into several high-frequency and low-frequency sequences, and then, these sequences were forecasted based on least squares SVR by Sun et al. [25] and extreme learning machine (ELM) by Liu et al. [33], respectively. Chen et al. [2], Wei and Chen [26], and Chen and Wei [34] all proposed that the passenger flow could be regarded as a nonlinear and nonstationary signal, and they utilized EMD to decompose the original passenger flow into nine intrinsic mode functions (IMF) components and one residue. Wei and Chen [26] predicted the disaggregated components through ANN, while Chen et al. [2] predicted them through LSTM. Qin et al. [35] utilized STL to disaggregate the monthly air passenger flow into three subseries: seasonal, trend, and residual series. Then, they developed the Echo State Network (ESN) to forecast each decomposed series. Chen et al. [36] also employed STL to decompose the daily metro ridership, and LSTM was used in the prediction stage. As for the SSA method, to the best of our knowledge, this method has never been introduced to an analysis passenger flow to date, although this method was devolved for tra ffic flow prediction. Mao et al. [37], Shang et al. [38], and Guo et al. [39] all have developed this method to analyze the tra ffic flow time series and obtained several components with di fferent amplitudes and frequencies. Then, they reconstructed these components into a smoothing part and residue. In this way, the SSA could be regarded as a filter to remove noise from the original sequence. During the stage of forecasting, the denoise data was predicted by ELM [38] and a grey system model [39], respectively. Overall, these studies have clearly indicated that the hybridization of time series decomposition approaches can make an obvious improvement on the predictive accuracy. However, all the aforementioned literatures have failed to investigate the potential characteristic of passenger flow from the decomposed results.

In this study, a novel hybrid model (i.e., SSA-AWELM), SSA combined with an AdaBoost-weighted extreme learning machine (AWELM), is proposed to achieve more accurate predicted results for the metro passenger flow. The experimental data, recorded by the sensors in turnstiles, is collected from an AFC system. The main works of this paper are briefly described as follows:


The rest of this paper is organized as follows: In Section 2, the problem is defined, and the proposed method is formulated. In Section 3, the procedures of data collection, data preprocessing, and design of the experiment are elaborated. The results and findings are analyzed and discussed in Section 4. At last, the conclusions are drawn in Section 5.

## **2. Materials and Methods**

In this section, the AFC system is briefly introduced, and the passenger flow forecasting problem is explained in detail. In particular, the model SSA-AWELM is formulated to improve upon the performance of predictions.

#### *2.1. Automatic Fare Collection Systems*

The automatic fare collection (AFC) systems are established on the Internet of Things (IoT) and wireless sensor networks (WSN). As displayed in Figure 1, a typical AFC system consists of five hierarchical levels: cleaning center (CC), line centers (LC), station computers (SC), station equipment, and smart tickets and cards, from top to bottom [40]. A passenger touches a smart ticket or card, which has an integrated circuit (IC) clip (a type of microsensor) inside, to a turnstile when boarding or alighting; meanwhile, the sensor in the turnstile will respond and record some necessary information. Then, the information will be transmitted to the SC, LC, and, finally, to the CC. In addition, there are a few di fferences between boarding and alighting. When the passenger alights and passes a turnstile, the sensor will compute the traveling mileage and charge the fare automatically, and this transaction could be completed in milliseconds.

**Figure 1.** Brief structure of an automatic fare collection (AFC) system: (**a**) metro station and (**b**) computer cluster.

The AFC system can not only be employed by operators to collect the fares from passengers conveniently. For researchers, what is the most important is that the data mining results from the recorded information could assist with analyzing the operational quality, since the records include the personal identification, boarding/alighting station, boarding/alighting time, and some other useful information. Based on AFC systems, the passenger boarding and alighting information could be recorded by the sensors in turnstiles automatically, and the recorded data could be accessed easily. This makes it possible to realize real-time predictions of the metro passenger flow.

#### *2.2. Passenger Flow Forecasting Problem*

As mentioned in Section 1, passenger flow is the sum of boarding or alighting pedestrians during a constant interval (i.e., 5 min, 10 min, etc.) in the target station. Suppose *xt* denotes the entrance or exit passenger flow at the time *t*, then it is obvious that *xt* varies with the time. The passenger flow forecasting problem can be treated as a time series forecasting task, and the passenger flow time series takes the instinct of temporal dependence. In other words, the passenger flow is highly related to

the historical data. Therefore, the research problem addressed in this paper is to forecast *xt* by the historical passenger flow data {*xt*−1, *xt*−2, *xt*−3, ... , *xt*−*<sup>n</sup>*}, which is formulated as follows:

$$\mathbf{x}\_t = E(\mathbf{x}\_{t-1}, \mathbf{x}\_{t-2}, \dots, \mathbf{x}\_{t-m}) \tag{1}$$

where *x*ˆ*t* represents the predictive value at time *t*, *<sup>E</sup>*(·) represents an established prediction model, and *n* represents the order of time lagging.

Although the single-step passenger flow forecasting has been widely studied, in order to provide travelers and managers with further information about passenger flow, multistep forecasting is necessary. In our study, the iterated strategy, which is widely used in time series predictions [41,42], is adopted for multistep passenger flow forecasting. As Equation (2) expresses, based on the established model with single-step prediction, the iterated strategy inputs a prediction value into the same model to forecast the value at the next time. It continues in this manner until reaching the maximum prediction horizon. The iterated strategy has two outstanding advantages. One is that the model just requires being trained once, and the other is that the prediction steps are unlimited.

...

$$\begin{aligned} \mathfrak{x}\_{t+1} &= E(\mathfrak{x}\_t, \mathfrak{x}\_{t-1}, \dots, \mathfrak{x}\_{t-n+1}) \\ \mathfrak{x}\_{t+2} &= E(\mathfrak{x}\_{t+1}, \mathfrak{x}\_t, \dots, \mathfrak{x}\_{t-n+2}) \end{aligned} \tag{2}$$

#### *2.3. The Proposed Hybrid Model*

## 2.3.1. Singular Spectrum Analysis

Singular spectrum analysis (SSA) is a time series analysis approach without any statistical assumptions [43]. It can decompose the original data into several components. This method has been widely used to decompose the time series including tra ffic flow [37–39]. In this study, this approach is implemented to analyze the passenger flow. Suppose *Y*(*t*) (*t* = 1, 2, ... , *N*) denotes the original passenger flow sequence with length *N*. The processes of the SSA approach contains four steps, as follows:

## **Step 1: Embedding**

The original sequence *Y*(*t*) is transformed into the trajectory matrix *F* ∈ <sup>R</sup>*L*×*K*, which is calculated as the following equation:

$$F = \begin{bmatrix} f\_1 & f\_2 & \cdots & f\_K \\ f\_2 & f\_3 & \cdots & f\_{K+1} \\ \vdots & \vdots & \ddots & \vdots \\ f\_L & f\_{L+1} & \cdots & f\_N \end{bmatrix} \tag{3}$$

where *L* is window length, *K* = *N* – *L* + 1, and *fi* is the *i*th (1 ≤ *i* ≤ *N*) value of the original sequence.

#### **Step 2: Singular Value Decomposition (SVD)**

The SVD algorithm is conducted to decompose the trajectory matrix *F*, computed as follows:

$$F = \mathbf{U} \cdot \boldsymbol{\Sigma} \cdot \mathbf{V}^{\mathrm{T}} = \sum\_{i=1}^{d} \sqrt{\lambda\_i} \mathbf{U}\_i \mathbf{V}\_i^{\mathrm{T}} \tag{4}$$

where **Σ** is diagonal matrix, and the diagonal elements ( √ λ1 ≥ √ λ2 ≥ ... ≥ √ λ*d* ≥ 0) are the singular values of *F*. Vectors *Ui* and *Vi*, which are the *i*th column of matrix *U* and *V*, represent the left and right singular vectors, respectively. *d* represents the number of singular values, and it is also the rank of trajectory matrix *F*. The collection ' *Ui*, √ λ*i*,*V<sup>i</sup>* ( is denoted as the *i*th eigen triple of SVD.

Every eigen triple can reconstruct an elementary matrix *Fi* of trajectory matrix *F*:

$$F\_i = \sqrt{\lambda\_i} \mathbf{U}\_i \mathbf{V}\_i^T \tag{5}$$

Thus, the sum of all elementary matrixes *Fi* is identical to the trajectory matrix *F*. The contribution of elementary matrix *Fi* is measured by the corresponding eigen value (equal to the square of the singular value) as the following equation:

$$\eta\_i = \frac{\lambda\_i}{\sum\_{i=1}^d \lambda\_i} \tag{6}$$

## **Step 3: Grouping**

Indices set *D* = {1, 2, ... , *d*} is divided into *M* disjointed subsets *I*1, *I*2, ... , *IM*. Every indices subset *Im* (*m* = 1, 2, ... , *M*) is regarded as one group, and the elementary matrixes *Fi* (*i* ∈ *Im*) in each group are summed. In previous papers, the w-correlation method [43] is prevalent to split the results set. However, this method is conducted from the perspective of signal analysis, which lacks the interpretability for passenger flow. In this study, the elementary matrixes *Fi* are grouped into three parts of trend *FT*, periodicity *FP*, and residue *FR*, expressed as Equation (7), and this process is detailed in Section 4.1.

$$F = F\_T + F\_P + F\_R \tag{7}$$

## **Step 4: Diagonal averaging**

The grouped matrixes *Fi* (*Fi* ∈ {*FT*, *FP*, *FR*}) are transformed into the one-dimensional time series format by diagonal averaging. Assume *fij* (1 ≤ *i* ≤ *L*, 1 ≤ *j* ≤ *K* ) is the element of matrix *Fi*, *L*∗ = min(*<sup>L</sup>*,*<sup>K</sup>*),*K*<sup>∗</sup> = max(*<sup>L</sup>*,*<sup>K</sup>*), and *f*∗*ij* = *fij*, if *K* > *L*; otherwise, *f*∗*ij* = *fji*. Then, every element *yi* of the time series *Yi*(*t*) is computed as the following equation:

$$y\_t = \begin{cases} \frac{1}{t} \sum\_{m=1}^t f\_{m, t-m+1}^\* & 1 \le t < L^\*\\ \frac{1}{L^\*} \sum\_{m=1}^t f\_{m, t-m+1}^\* & L^\* \le t \le K^\*\\ \frac{1}{N-t+1} \sum\_{m=t-K^\*+1}^{N-K^\*+1} f\_{m, t-m+1}^\* & K^\* < t \le N \end{cases} \tag{8}$$

As such, the original passenger flow *Y*(*t*) is disaggregated into three components of trend *T*(*t*), periodicity *P*(*t*), and residue *R*(*t*).

## 2.3.2. AdaBoost Ensemble Learning

As a strategy of ensemble learning, AdaBoost was originally proposed by Freund and Schapire [44] for classification problems. Drucker [45] developed this algorithm in the application of a regression problem, and it was improved upon by Solomatine and Shrestha [46,47]. With the integration of a few homogenous models (called base learners), this method can improve the performance of base learners. In this study, the AdaBoost algorithm is utilized to assist the ELM to predict the passenger flow more accurately.

Supposing a dataset #(*<sup>x</sup>i*, *yi*)\$*Ni*=<sup>1</sup> with *N* samples, *T* is the maximum iteration number. The specific steps of AdaBoost is presented as the following:

**Step 1:** Initialize the distribution of sample weights:

$$\Gamma\_1 = \begin{bmatrix} \gamma\_{1,1}, \gamma\_{1,2}, \dots, \gamma\_{1,N} \end{bmatrix}^T, \text{where } \gamma\_{1,n} = \frac{1}{N}, n = 1, 2, \dots, N \tag{9}$$

**Step 2:** For the training process of each iteration, *t* = 1, 2, ... , *T*.

**Step 2.1:** Use the dataset with a distribution of **Γ***t* to train the WELM and obtain the base learner *Et*(*x*).

**Step 2.2:** Calculate the absolute relative error of each sample and the error rate of *Et*(*x*):

$$\varepsilon\_{t} = \sum\_{n=1}^{N} \gamma\_{t,n} n : \left| \frac{E\_{t}(\mathbf{x}\_{n}) - y\_{n}}{y\_{n}} \right| > \varphi \tag{10}$$

where """(*Et*(*<sup>x</sup>n*) − *yn*)/*yn*""" represents the absolute relative error of each sample; ε*t* is the error rate of *Et*(*x*); and *n* = 1, 2, ... , *N* is the index of the sample. *n* : """(*Et*(*<sup>x</sup>n*) − *yn*)/*yn*""" > ϕ represents that only the error for any particular sample is greater than the preset error, the so-called threshold ϕ; the corresponding sample will be considered. ϕ is a preset parameter and will be discussed at the end of the present subsection. More details are described in [47].

**Step 2.3:** Calculate the coefficient for updating the sample weights:

$$
\beta\_t = \varepsilon\_t^k \tag{11}
$$

where *k* is the power coefficient of error rate ε*t* requiring to be preset. According to the study of Solomatine and Shrestha [47], *k* is selected from 1 (linear law), 2 (square law), and 3 (cubic law). A high value of *k* may cause the algorithm to become unstable. Thus, *k* is set as 1 in our study.

**Step 2.4:** Update the distribution of sample weights:

$$\gamma\_{t+1,n} = \frac{\gamma\_{t,n}}{Z\_t} \times \begin{cases} \beta\_{t,\prime} & \text{if } \left| \frac{E\_t(\mathbf{x}\_n) - y\_n}{y\_n} \right| \le \phi \\\ 1, & \text{otherwise} \end{cases}, n = 1, 2, \dots, N \tag{12}$$

where *Zt* is a normalization factor, such that !*Nn*=<sup>1</sup> γ*<sup>t</sup>*+1,*n* = 1.

**Step 3:** Update *t* = *t* + 1 and loop **Step 2.1** to **2.4** until reaching the maximum iteration number *T*. Finally, the output is computed as:

$$\log(\mathbf{x}) = \frac{1}{\sum\_{t=1}^{T} \ln \frac{1}{\beta\_t}} \left[ \sum\_{t=1}^{T} (\ln \frac{1}{\beta\_t}) E\_t(\mathbf{x}) \right] \tag{13}$$

The AdaBoost algorithm is sensitive to the threshold ϕ. If the ϕ is too low, the model will be underfitting. On the other hand, too high a value of ϕ will raise overfitting problems. In our study, the threshold ϕ is set adaptively according to the median of absolute relative errors ε*t* during each iteration, expressed as the following equation:

$$\varphi = \mathsf{median}\{\varepsilon\_1, \varepsilon\_2, \dots, \varepsilon\_N\} \tag{14}$$

As presented in the above steps, AdaBoost is an iteration process. The base learner will be trained, and the distribution of the sample weights will be updated during each iteration. Thus, if the base learner is complex and spends lots of computing time, the consuming time of AdaBoost will increase linearly. In this study, ELM is adopted as the base learner, which is famous for its fast training speed. This model is elaborated in the next subsection.

## 2.3.3. Weighted Extreme Learning Machine

Extreme learning machine (ELM) is a kind of single hidden layer feed-forward network (SLFN), which is proposed by Huang at el. [48]. Compared with traditional ANN models, ELM does not need to tune the input weights and hidden layer biases during training. After the initialization of the ELM, the input weights and hidden biases are fixed, and only the output weights are optimized. Therefore, the training process of ELM is faster than the traditional ANN model. Since weighted samples are used to train the base learners of AdaBoost, the weighted extreme learning machine (WELM) is developed in this study.

Assuming a weighted dataset ' *xi*, *yi*, <sup>γ</sup>*i*(*Ni*=<sup>1</sup> with *N* samples, and *xi* = [*xi*,1, *xi*,2, ... , *xi*,*<sup>P</sup>*]*<sup>T</sup>* ∈ R*P*×<sup>1</sup> and *yi* = *yi*,1, *yi*,2, ... , *yi*,*<sup>Q</sup><sup>T</sup>* ∈ R*Q*×1, γ*i* represent the input vector, output vector, and sample weights, respectively. The output of ELM with *h* hidden neurons is expressed as:

$$f(\mathbf{x}\_i) = \sum\_{h=1}^{H} \mathcal{B}\_h \mathbf{g}(\mathbf{w}\_h \mathbf{x}\_i + b\_h), i = 1, 2, \dots, N \tag{15}$$

where *wh* = *wh*,1, *wh*,2, ... , *wh*,<sup>3</sup><sup>T</sup> represents the connection weights from the input layer to the *h*th hidden neuron; *bh* represents the bias in the *h*th hidden neuron; β*h* = β*<sup>h</sup>*,1, β*<sup>h</sup>*,2, ... , <sup>β</sup>*<sup>h</sup>*,*<sup>Q</sup>*<sup>T</sup> represents the connection weights from the *h*th hidden neuron to the output layer; and *g*(·) is the activation function, and the *sigmoid* function is adopted in this study, which is formulated as *g*(·) = 1/(1 + *<sup>e</sup>*<sup>−</sup>*<sup>x</sup>*). Since *wh* and *bh* are assigned initially, Equation (15) can be simplified as:

$$H\beta = \mathbf{Y} \tag{16}$$

where β = [β1, β2, ... , β*H*]T; *Y* = *y*1, *y*2, ... , *<sup>y</sup>N*T; and *H* is the output matrix of the hidden layer, expressed as:

$$H = \begin{bmatrix} \gcd(\mathfrak{w}\_1 \mathbf{x}\_1 + b\_1) & \cdots & \gcd(\mathfrak{w}\_H \mathbf{x}\_1 + b\_H) \\ \vdots & \ddots & \vdots \\ \gcd(\mathfrak{w}\_1 \mathbf{x}\_N + b\_1) & \cdots & \gcd(\mathfrak{w}\_H \mathbf{x}\_N + b\_H) \end{bmatrix} \tag{17}$$

The purpose of ELM is to optimize β with the object of the minimum mean square error cost function, which is expressed as minβ *H*β − **<sup>Y</sup>**<sup>2</sup>. Furthermore, when the samples are weighted with **Γ**, the loss function of every sample requires multiplying with the corresponding sample weight,

$$\min\_{\boldsymbol{\beta}} \text{diag}(\boldsymbol{\Gamma}) \boldsymbol{H} \boldsymbol{\beta} - \boldsymbol{Y}^2 \tag{18}$$

where diag(**Γ**) is the diagonal matrix with the diagonal of **Γ**, and the solution of Equation (18) is:

$$\mathcal{J} = \left(\boldsymbol{H}^{\mathrm{T}} \mathrm{diag}(\boldsymbol{\Gamma})\boldsymbol{H}\right)^{-1} \boldsymbol{H}^{\mathrm{T}} \mathrm{diag}(\boldsymbol{\Gamma}) \,\mathrm{Y} \tag{19}$$

Overall, the output weights β of the WELM can be computed according to Equation (19) directly. It is different to the training process of traditional ANN, which is an iteration process to update connecting weights and neuron biases. This is the reason why ELM costs much less computing time than the traditional ANN.

## 2.3.4. The Hybrid Model

formulated

 as:

The model combination of a singular spectrum analysis and AdaBoost-weighted extreme learning machine is proposed to forecast the passenger flow in this paper, symbolized as SSA-AWELM. The flow chart of this hybrid model is displayed in Figure 2, and the special process of it is described as follows:

**Figure 2.** The flow chart of the hybrid singular spectrum analysis-AdaBoost-weighted extreme learning machine (SSA-AWELM) model: (**a**) SSA-AWELM and (**b**) AWELM.

**Step 1: SSA for decomposition.** The original passenger flow is decomposed into several components by SSA approach, and these components are grouped into three parts of trend, periodicity, and residue.

**Step 2: AWELM for components forecasting.** The WELM improved by AdaBoost (AWELM) is implemented to model and predict the three components, separately.

**Step 3: Integration for final forecasting results.** The final outcomes of forecasting the passenger flow are calculated by summing the predicted results of the three components.

## **3. Empirical Study**

## *3.1. Data Collection*

In this paper, the passengers' alighting and boarding dataset is collected from the AFC system of Hangzhou metro in China. The dataset is online and provided by Ali Tianchi [49]. This dataset recorded detailed information when the passengers passed the turnstiles. The duration of the data was from the 1st to 26th in January 2019. The dataset includes seven fields, and they are listed in Table 1. In addition, some samples of the dataset are provided in Table 2.

**Table 1.** Data fields collected from the automatic fare collection (AFC) system of Hangzhou metro.



**Table 2.** Some samples of the collected data.

## *3.2. Data Preprocessing*

The preprocessing is to obtain a passenger flow time series data from the raw AFC dataset. In this study, the passenger flow data of the Qianjiang Road Station (Q.R. Sta.) and JinJiang Station (J. Sta.) are selected to conduct experiments. As displayed in Figure 3, the Q.R. Sta. is a transfer station between Line 2 and Line 4, and it is located in the Qianjiang New Town Central Business District (CBD). The Jinjiang Station is a transfer station between Line 1 and Line 4, which is located in Wangjiang New Town.

**Figure 3.** The location of the study metro transfer stations: (**a**) the Hangzhou metro network, (**b**) the Qiangjiang Road Station (Q.R. Sta.), and (**c**) the Jinjiang Road Station (J. Sta.).

According to previous studies [11,50], the raw recorded data are usually aggregated into 5-min intervals to obtain the passenger flow sequence. In order to keep the complete cycle periods of the sequence data, three continuous weeks, which were from the 6th to 26th of January, were selected from the AFC dataset. The time range selected was from 6:00 to 23:00 according to the operation time of the Hangzhou metro system, though a few records in the AFC dataset were out of this range. At last, there were 204 samples on average in one day and 4284 samples in total. Furthermore, the exit and entrance passenger flow sequences were computed separately. Hence, four experimental datasets were established, and they were used to test the proposed model, respectively.

The extracted passenger flow sequences are presented in Figure 4. Both the exit and entrance passenger flows on weekdays have distinct peaks in the morning (about from 8:00 to 9:00) and evening (about from 18:00 to 19:00) rush hours, while these patterns disappear on the weekends. Moreover, the peak patterns of exit and entrance passenger flows on weekdays are di fferent. Taking the Q.R. Sta. as an example, the exit passenger flow in the morning rush hour (about 500 pedestrians per 5 min) is approximately 2.5 times larger than that in the evening rush hour (about 200 pedestrians per 5 min). On the contrary, the entrance passenger flow in the evening rush hour (about 300 pedestrians per 5 min) is approximately 1.5 times larger than that in the morning rush hour (about 200 pedestrians per 5 min). These results indicate that most passengers in this station are commuters. This finding agrees with the location of this station, i.e., it is in the Qianjiang New Town CBD and surrounded by numerous office buildings.

**Figure 4.** Passenger flow of the study metro transfer stations: (**a**) exit passenger flow of the Q.R. Sta., (**b**) entrance passenger flow of the Q.R. Sta., (**c**) exit passenger flow of the J. Sta, and (**d**) entrance passenger flow of the J. Sta.

The four datasets are all split into training datasets (i.e., the 6th to 19th of January) and testing datasets (i.e., the 20th to 26th of January). The grid search and 5-fold cross-validation methods are used to evaluate the training performance and determine the hyper-parameters of the models. Then, the models with determined hyper-parameters are evaluated by testing the datasets.

#### *3.3. Comparison Models and Evaluation Measures*

In order to demonstrate the contributions of the proposed SSA-AWELM model, the classical time series model ARIMA and four extra models based on the neural network, including ANN, LSTM, ELM, and AWELM, are tested as benchmarks. They are listed as follows:

• **ARIMA:** ARIMA is a classical statistical model for time series forecasting. It is widely used to predict traffic flow and passenger flow in early studies [17]. The performance of ARIMA is affected by three parameters: autoregressive order *p*, difference order *d,* and moving average order *q*. Generally, *d* is set based on the stationarity test, and the *p* and *q* are selected from the range of [0,12] based on the Bayesian information criterion (BIC) [51].


To make sure that every model can achieve the best performance, the well-established grid search and 5-fold cross-validation methods are adopted to determine the hyper-parameters. The neuron number of the hidden layers in four neural network models are all selected from 2 to 50 with step 2, and the base learner of AWELM is selected from 1 to 20 with step 1. The determined hyper-parameters of each model are displayed in the Appendix A (see Table A1). In addition, the input and the output of the models are respectively set as 12 and 1 during training, and the horizon of the multistep-ahead prediction is set as 6. In other words, the passenger flow data at the last hour is used to forecast the next half-hour.

In order to accelerate learning and convergence during the training model, the min-max normalization approach (expressed as Equation (21)) is employed to scale the input data into the range of [0,1] before feeding it into the models. In addition, to obtain the final prediction results, the outputs of the models are rescaled by the reversed min-max normalization approach (expressed as Equation (21)).

$$\mathbf{x}' = \frac{\mathbf{x} - \min(\mathbf{x})}{\max(\mathbf{x}) - \min(\mathbf{x})} \tag{20}$$

$$\mathbf{x} = \mathbf{x}' \times (\max(\mathbf{x}) - \min(\mathbf{x})) + \min(\mathbf{x}) \tag{21}$$

In order to evaluate the performances among models, two common measures are introduced in this study. They are the mean absolute error (MAE) and root mean square error (RMSE), computed as follows:

$$\text{MAE} = \frac{1}{N} \sum\_{n=1}^{N} \left| y\_n - \hat{y}\_n \right| \tag{22}$$

$$\text{RMSE} = \sqrt{\frac{1}{N} \sum\_{n=1}^{N} \left( y\_n - \hat{y}\_n \right)^2} \tag{23}$$

where *yn* and *y*ˆ*n* are the true value and predicted value, respectively, and *N* is the number of samples.

Besides the aforementioned two measures, the Diebold–Mariano (DM) test [52] is implemented to test the statistical significance between the proposed model and the benchmark models. The null hypothesis is that the prediction accuracy of the tested model *ET*(*x*) is equal to the reference model *ER*(*x*). In this study, the square error is adopted to measure the model loss, expressed as *ei* = (*y*ˆ*i* − *yi*)2. Then, the DM statistic is defined as follows:

$$\text{DM} = \frac{\text{g}}{\sqrt{\mathbb{V}\_{\text{g}}/N}} \tag{24}$$

where *g* = !*Nn*=<sup>1</sup> *gn*/*N*, *gn* = (*y*<sup>ˆ</sup>*Tn* − *yn*)<sup>2</sup> − (*y*<sup>ˆ</sup>*Rn* − *yn*)2, *V*ˆ *g* = γ0 + 2 !*<sup>P</sup>*−<sup>1</sup> *k*=1 γ*k*, and γ*k* is the autocovariance at lag *k,* expressed as γ*k* = (1/*N*) !*Ni*=*k*+<sup>1</sup>(*gi* − *g*)(*gi*−*<sup>k</sup>* − *g*). *y*ˆ*Tn* and *y*ˆ*Rn* respectively represent the predicted values of model *ET*(*x*) and *ER*(*x*), *P* is the prediction horizon, and *N* is the scale of the testing data.

## **4. Results Analysis**

## *4.1. Analysis of SSA Decomposition*

As mentioned in Section 2.3.1., window length *L* is the only parameter that requires to be determined before decomposition. From previous studies [37–39], if the time series data shows obvious periodicity, the window length *L* could be set as one period length. Thus, *L* = 204, because the passenger flow cycles daily (see Figure 4), and 204 samples on average are collected in one day (has been illuminated in Section 3.2.). Then, the original passenger flow can be disaggregated into 204 components. These components are grouped into the three parts of trend, periodicity, and residue, and this is inspired by the study [53] about using SSA to analyze the variants of electricity prices. To facilitate the analysis, taking the dataset of the Q.R. Sta. as an example, the eigen values of each component are plotted in Figure 5.

**Figure 5.** Eigen values of decomposed components (Q.R. Sta.): (**a**) exit passenger flow and (**b**) entrance passenger flow.

Taking Figure 5a as an example, it is clear that the first eigen value is significantly larger than the others, and the corresponding component is extracted separately as trend parts. Moreover, the eigen values curve declines slowly after the 23rd component, and the 23rd is regarded as the "break point". Then, the components from the 2nd to 23rd are reconstructed into periodic parts, and the remainder components from the 24th to 204th are reconstructed into residual parts. This is the same way as the entrance passenger flow in Figure 5b, and the "break point" is 13. Then, the components from the 2nd to 13th are reconstructed into periodic parts, and the remainder components from the 14th to 204th are reconstructed into residual parts. Finally, the obtained trend, periodicity, and residues of the original passenger flow are displayed in Figure 6.

As shown in Figure 6, every component can reveal different patterns of the original passenger flow. The trend represents the overall tendency, and the periodicity represents the variants within a day. Furthermore, it could be found in the trend that the passenger flow on weekdays is larger than that on weekends. In the periodical component, the passenger flow shows distinct peaks in the morning and evening rush hour on weekdays, but this is not obvious on the weekends. The peak patterns are different between the exit and entrance passenger flows: the exit passenger flow in the morning rush hour is much larger than that in the evening rush hour, and the entrance passenger flow is contrary to that. As for the residue, it fluctuates irregularly and can be treated as noise.

**Figure 6.** Decomposition results: (**a**) exit passenger flow of the Qianjiang Road Station, (**b**) entrance passenger flow of the Q.R. Sta., (**c**) exit passenger flow of the J. Sta., and (**d**) entrance passenger flow of the J. Sta.

## *4.2. Analysis of Hyper-Parameters*

The performance of SSA-AWELM is highly dependent on the forecasting model AWELM of each component, and the AWELM has two hyper-parameters: the number of base learners (i.e., WELM) *T* and the hidden neurons of WELM *H*. The well-established grid search and five-fold cross-validation methods are adopted to determine the *T* and *H*. The *H* is selected from 2 to 50 with step 2, and *T* is selected from 1 to 20 with step 1. Taking the dataset of the Q.R. Sta. as an example, the process of hyper-parameter selection is displayed in Figure 7, and the log transformation is applied to the MSE to distinguish different values clearly. It could be found that AWELM is sensitive to the hidden neurons *H* but insensitive to base learner number *T*. Finally, the determined hyper-parameters *H* and *T* of AWELM are provided in Table A1 (see Appendix A).

#### *4.3. Analysis of Forecasting Results*

For the sake of a comparison analysis, the average evaluation measures of the forecasting results across all the six prediction horizons are presented in Table 3, and the scatter points of the true and predicted values are displayed in Figure 8. From Table 3, it is worth noting that the proposed SSA-AWELM performs best among all the models, followed by LSTM, ANN, AWELM, ELM, and ARIMA. Compared to LSTM, the RMSE and MAE of SSA-AWELM respectively reduced by 22.5% and 21.3% on average in the case of the Q.R. Sta. and reduced by 23.6% and 20.0% on average in the case of the J. Sta. AWELM performs a little better than ELM, which indicates the AdaBoost algorithm can reduce the prediction errors but with limitations. As expected, ARIMA is always inferior to other models, because it is a linear model. In addition, it can be seen in Figure 8 that the scatter points in SSA-AWELM are closest to the expectation line, and the corresponding coefficient of determination *R*<sup>2</sup> is largest. All the above findings can prove that the proposed SSA-AWELM is an effective approach to improve the accuracy of passenger flow forecasting. Furthermore, to compare the consuming time of different models, the training time is provided in Table A2 (See Appendix A).

**Figure 7.** The process of hyper-parameter selection for SSA-AWELM (Q.R. Sta.): (**a**) exit passenger flow and (**b**) entrance passenger flow.

**Table 3.** Average evaluation measures across all six prediction horizons. RMSE: root mean square error and MAE: mean absolute error. Q. R. Sta.: Qiangjiang Road Station and J. Sta.: Jinjiang Road Station. ARIMA: Auto Regressive Integrated Moving Average. ANN: Artificial Neural Network. LSTM: Long Short-Term Memory neural network. ELM: Extreme Learning Machine. AWELM: AdaBoost-Weighted Extreme Learning Machine. SSA-AWELM: the proposed model combining Singular Spectrum Analysis with AdaBoost-Weighted Extreme Learning Machine.


**Figure 8.** *Cont*.

**Figure 8.** Prediction results: (**a**) exit passenger flow of the Q. R. Sta., (**b**) entrance passenger flow of the Q.R. Sta., (**c**) exit passenger flow of the J. Sta., and (**d**) entrance passenger flow of the J. Sta. ARIMA: Auto Regressive Integrated Moving Average. ANN: Artificial Neural Network. LSTM: Long Short-Term Memory neural network. ELM: Extreme Learning Machine. AWELM: AdaBoost-Weighted Extreme Learning Machine. SSA-AWELM: the proposed model combining Singular Spectrum Analysis and AdaBoost-Weighted Extreme Learning Machine.

#### *4.4. Analysis of Multistep-Ahead Forecasting*

In order to analyze the multistep forecasting errors, the evaluation measures of each prediction horizon are displayed in Figure 9. The DM test results of the comparison between the proposed SSA-AWELM and benchmarks are presented in Table 4. From Figure 9, it can be seen that the prediction errors of every model increase along the prediction horizons. This is caused by the cumulative errors, which stems from feeding prediction values into the models for multistep-ahead forecasting, and the cumulative errors are inevitable. What stands out in Figure 9 is that the proposed SSA-AWELM always performs best in every prediction horizon, and the errors increase slowest in comparison with the other models. This indicates that the SSA-AWELM can improve the robustness and restrict the propagation of the cumulative error during multistep-ahead forecasting. A reasonable explanation for this finding is that the SSA can decompose the original passenger flow into the three components of trend, periodicity, and residue. Each component holds individual characteristics that can be modeled more easily than the original complex data. Furthermore, compared with ELM, AWELM preforms slightly better. It suggests that AdaBoost can improve the accuracy of ELM but with limitations. Only combining with AdaBoost cannot promote the forecasting accuracy significantly. From Table 4, generally speaking, the proposed SSA-AWELM almost always outperforms the other models at a highly significant level. There are some exceptions when compared with LSTM for the exit passenger flow. In these situations, SSA-AWELM still performs better than LSTM but not always with a highly significant level. This might because LSTM has the advantage of capturing more temporal characteristics in terms of the exit passenger flow. Overall, these findings sugges<sup>t</sup> that the proposed SSA-AWELM is outstanding during multistep-ahead predictions. These results prove that the SSA-AWELM is a robust approach for passenger flow forecasting.

**Figure 9.** *Cont*.

**Figure 9.** Evaluation of the multistep predictions: (**a**) exit passenger flow of the Q.R. Sta., (**b**) entrance passenger flow of the Q.R. Sta., (**c**) exit passenger flow of the J. Sta., and (**d**) entrance passenger flow of the J. Sta.

**Table 4.** Diebold–Mariano (DM) test results of the comparison between the proposed SSA-AWELM and benchmarks.


\*\*\* represents the rejection of the null hypothesis at the 0.01 level, \*\* represents the rejection of the null hypothesis at the 0.05 level, and \* represents the rejection of the null hypothesis at the 0.1 level.

## **5. Conclusions**

This paper studied the passenger flow forecasting and proposed a novel model SSA-AWELM. In the model, the SSA was developed to decompose the original data into the three components of trend, periodicity, and residue; then, the AWELM was developed to forecast each component separately. The three predicted results were summed as the final outcomes. In order to demonstrate the effectiveness of the proposed model, the passenger flow in two transfer stations, which were extracted from an AFC system, were utilized to carry out prediction testing and a comparison analysis. The main conclusions are drawn and listed as follows:


The proposed method in this paper still retains two limitations that will be addressed in the future. One is that the testing cases are in only two transfer stations with large travel demands, and the other is that the passenger flows are collected under regular conditions. Thus, in further studies, more cases including regular stations will be tested and discussed. In addition, the passenger flows during some special incidents, such as extreme weather, passenger control, etc., will be focused on to extend the proposed model.

**Author Contributions:** Conceptualization, W.Z., W.W., and D.Z.; methodology, W.Z.; software, W.Z.; validation, W.Z.; data curation, W.Z. and D.Z.; writing—original draft preparation, W.Z.; writing—review and editing, W.Z., W.W., and D.Z.; visualization, W.Z. and D.Z.; supervision, W.W. and D.Z.; project administration, W.W.; and funding acquisition, W.W. and D.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China, gran<sup>t</sup> numbers 51878166 and 71701047.

**Acknowledgments:** The authors are grateful to Ali Tianchi for opening the AFC datasets.

**Conflicts of Interest:** The authors declare no conflicts of interest.

## **Appendix A**


**Table A1.** The determined hyper-parameters of the experimental models.

*H* represents the neuron number of the hidden layer, and *T* represents the number of base learners.


**Table A2.** The training time of the experimental models.

The experiment is conducted in the environment: programming language: *Python*; Main Packages: *Statsmodels*, *Scikit-learn*, *Keras*, and *TensorFlow*; OS: Windows 10 (64bit); RAM: 8G; CPU: Intel Core i5-8300H 2.30 GHz; and GPU: NVIDIA GeForce GTX 1650Ti.
