1. Introduction
In recent years, with the continuous improvement of the metro network, the metro has become one of the basic guaranteed facilities for the traffic operation of large cities in China. By the end of 2021, there were 283 [
1] metro operating lines opened in 50 cities nationwide, with a total operating line length of 9206.8 km. Due to its on-time, fast, and comfortable features attracting a large number of passengers, the national urban metro completed 23.71 billion passenger trips in 2021, an increase of about 35% compared with 2020. At the same time, as more and more residents take metro as the preferred mode of transportation for daily travel, its share of public transportation is also growing. The share ratio of national metro passenger volume to total public transportation passenger volume has increased from 38.7% in 2020 to 43.4% in 2021, among which eight cities, namely Beijing, Shanghai, Guangzhou, Chengdu, Shenzhen, Nanjing, Nanning, and Hangzhou, have a share ratio of metro passenger volume to public transportation of more than 50%, which indicates that China’s demand for metro is growing and it is playing an extremely important role in the transportation of many big cities.
However, the huge passenger flow adds more challenges to the daily operation of the metro. Crowded carriages and waiting lounges, difficult walking facilities in and out of stations, and transfer lanes have led to a series of problems such as reduced efficiency, service level, and safety of passengers entering and exiting stations and transferring to and from stations. The study of walking time in metro stations is useful for operators to make preliminary assessments of the level of walking service in stations and improve station services. At the same time, passenger flow is the basis for developing and coordinating operation plans, and walking time, as a highly variable part of the travel time chain, is an important basis for improving the accuracy of passenger flow allocation. Finally, for companies providing passenger path planning, such as Baidu, Gaode, and other route planning companies, they need to use walking time parameters to improve the reference time for path planning, and thus provide more accurate planning services.
In the past, walking time was usually obtained by manual investigation. Du et al. and Zhao [
2,
3] designed a sampling survey method based on the influencing factors of passenger walking distance and walking speed to investigate the travel time of transfer passengers at transfer stations, and obtained the conclusion that the transfer travel time approximately obeys lognormal distribution; Zhou et al. [
4] obtained the relationship between the transfer walking speed and passenger volume of different types of facilities by investigating the data of station passenger flow and facility types, and finally calculated the transfer time model parameters using the curve fitting method. With the accumulation of AFC data, scholars found that the data records of a large number of passengers’ travel details can provide enough evidence to mine potential passenger behaviors, which provides the possibility of walk time imputation. Among the methods of using AFC data mining, scholars have mainly studied from the perspective of reducing the unknown quantity by combining travel time chains, which are mainly divided into three categories. One is to obtain part of the data through manual survey. Sun et al. [
5] proposed a method based on splitting travel time into timestamps of each constituent element and studied by combining travel time in AFC with partial data from field survey. Zhou et al. [
6] extrapolated the station walk time of all stations by manually investigating the access walk time of a station based on the travel time chain relationship of each combination in the transportation network. The second is to use the train operation diagram to derive the exit walking time data. Tong et al. [
7] designed a sampling survey method to investigate the travel time of transfer passengers in two directions of the Chongqing Metro transfer station. After verifying that the transfer travel time obeys normal distribution or lognormal distribution, the maximum likelihood estimation method is used to estimate the transfer travel time parameters in other directions. Scholars [
8,
9,
10] argue that most passengers do not stay in the station after arrival, so the difference between the arrival time of the train and the departure time of the passenger can be used to project the departure walking time of the passenger. Paul et al. [
11] found upon calculating walking time ratios for access, exit, and transfer based on manual research, and after matching exit walking times with train schedules using AFC transaction data, that the ratios were used to derive walking times for access and transfer. Similar, Eltved et al. [
12] put forward an estimation of transfer time distribution from bus station to train platform based on the matching of smart card data and vehicle automatic positioning data. Yan [
13], based on the assumption that passengers’ arrival obeys uniform distribution, calculates the distribution of passengers’ outbound travel time by using train arrival data and passengers’ outbound time, and calculates the distribution parameters of inbound travel time by using OD travel time without transfer after the train departure interval is known. Third, the variables were reduced by combining travel time chain components. Wu [
14] divided the travel time into two components: fixed on-train time and random access and exit plus waiting time, and assumed that all ODs obeyed the same normal distribution to find the mean and variance of the random time component, and then, by inputting the in/out station and first station waiting time distributions, deriving the transfer time distribution. Based on the assumption of equal walking time in and out of stations with similar station layouts, Liang [
15] and Jia et al. [
16] used the travel time of passengers without waiting for trains in ODs with unique transfer paths to derive the walking time in and out of stations at different times, and verified that the walking time in and out of stations obeyed a normal distribution. Zhao [
17] calculated the minimum access and exit walking time of each station by selecting the minimum travel time between ODs, considering that the waiting time is zero at this time, and based on the constraint that the sum of access walking time of all stations in the whole metro network is equal to the sum of exit walking time, the minimum access and exit walking time of each station is calculated using the Nanjing Metro, and then the minimum transfer walking time is found.
Furthermore, passenger flow is often considered an important influencing factor in walk time studies. Zeng et al. [
18] believes that there is an interactive relationship between traffic objects. Harris [
19], after collecting the walking time of passenger flow in London urban rail stations using the tracking survey method, studied the walking time of passengers under different passenger flows by constructing a passenger walking simulation model. Lam et al. [
20] studied the pedestrian flow characteristics of different pedestrian facilities in Hong Kong, and conducted a study on the effects of different facility locations and through-flow on walking time based on three elements of traffic flow. Feng et al. [
21] drew the relationship curve between passenger flow speed and density in the Beijing subway collecting-distributing area and upstairs. Wang et al. [
22] studied the relationship between walking speed and passenger flow for pedestrian facilities such as upward stairs and downward stairs in one-way transfer passages in urban rail stations in Shanghai. Li et al. [
23] studied the relationship between passenger flow and walking time per unit time in one-way transfer stations in Shanghai based on the relationship between passenger flow and walking time and fitted the correlation function between passenger flow and walking time. Therefore, this section investigates the distribution characteristics of passenger flow at stations and quantifies the effect of passenger flow density on walking time.
To sum up, in the study of the estimation method of walking time, it is proposed to split the components of travel time and calculate the rest time on the basis of obtaining some data, which provides a theoretical basis for the split of travel time in this paper. At the same time, in order to reduce the unknown variables in the composition of travel time, reasonable assumptions are usually made on waiting time or walking time in and out of the station, which also provides reference for the treatment of travel time composition in this paper. It is worth noting that although the calculation of travel time is simplified based on the minimum travel time, it cannot reflect the influence of passenger flow on travel time. At the same time, when studying the influence of passenger flow on travel time, we can find that there is a great correlation between travel time and passenger flow per unit time. It is necessary to consider the influence of passenger flow when calculating station travel time, but the cost of manual investigation is high. This paper will use AFC data to quantify this influence. Based on the summary and analysis of the above literature, this paper will propose a method for projecting walking time in stations considering the influence of passenger flow density. Under the premise of known train departure interval and on-train time, a walking time projection model is constructed based on multi-station combination passenger travel time chain splitting and the influence of station passenger flow density on walking time, and finally an example analysis is conducted. The details are as follows:
- (1)
Travel time chain analysis. Usually, the travel time chain consists of walking time in and out of the station, waiting time, and on-board time, and for the travel time that requires transfer, this also includes transfer walking time and transfer waiting time. Python crawler technology can obtain the train departure interval and on-train time from the official website, which provides the basic data for the following walking time model construction.
- (2)
Influence of passenger flow density. The walking time in the same station is mainly affected by the density of passenger flow, and there is a spatial and temporal unevenness in the distribution of passenger flow in the station. On this basis, the influence of passenger flow on walking time at stations is analyzed to determine the threshold value of passenger flow, which provides judgment data for the following walking time model constraints.
- (3)
Design of walking time projection method. Based on the multi-station combination of passenger travel time chain splitting to initially build a regression model, and combined with the impact of passenger flow density to add walking time constraints to solve the problem of unsatisfactory rank of the model, so as to obtain the walking time imputation model.
- (4)
Example validation. Based on the swipe data of the top five lines of daily average passenger flow in Guangzhou in 2018, the example validation analysis is conducted and the model is verified in terms of the accuracy of the results and the validity of the constraints.
3. Impact of Station Passenger Flow on Walking Time
3.1. Metro Traffic Data Selection and Pre-Processing
The subway data set used in this paper includes the subway credit card data records from 23 April 2018 to 11 May 2018. Because passenger flow is the main influencing factor of travel time, the greater the passenger flow, the more obvious the influence of passenger flow density on walking time. Therefore, this paper selects the top five lines (Line 1, Line 2, Line 3, North Extension Line of Line 3, and Line 5) of the Guangzhou Metro as the research object.
There are inevitably anomalies in the data, which affect the accuracy of the walk time and therefore need to be processed. Pre-processing of AFC data consists of three main steps as follow:
Step 1: Clean up the missing data of keywords. In this paper, we select four fields: access station, exit station, access swipe time, and exit swipe time as the key fields. If the key fields are missing, it is not possible to ask passengers to OD the travel time, so we clean this kind of data.
Step 2: Clean up the non-operating time data. Since the AFC data is recorded in all the entry and exit data, the data of maintenance personnel entering and exiting the station during non-operating hours is also recorded and needs to be cleaned.
Step 3: Clean up unreasonable data. According to the official regulations of the Guangzhou Metro, staying in the subway for more than 270 min is regarded as abnormal, and a fine will be imposed. Therefore, this paper thinks that the data exceeding this threshold is unreasonable and should be excluded.
3.2. Analysis of the Relationship between Station Passenger Flow and Walking Time
It has been mentioned in the introduction above that the travel time is often influenced by passenger flow, and this section will quantify the influence of passenger flow on the travel time.
3.2.1. Distribution Characteristics of Passenger Flow in Time and at Access and Exit Stations
The top six stations of daily average passenger flow of the Guangzhou Metro were selected as the research object, and the distribution of passenger flow was analyzed from the perspective of different time periods and different directions of access and exit of passenger flow. At the same time, due to the different travel purposes and the different characteristics of travel time, the study is divided into two parts from the date type: working day and weekend.
Through the comparison of
Figure 2a,b, it can be seen that the time distribution of passenger flow is different. On weekdays, the travel time is mainly concentrated at 8:00 and 18:00 in the morning and evening peaks, and the peak can reach more than 6000 people per 30 min. Meanwhile, on weekends, the passenger travel time is scattered, with the largest number of people traveling around 18:00, but the peak only reaches about 4000 people per 30 min.
In addition, the imbalance between inbound and outbound stations is different: the distribution of passenger flow in and out of the station is more balanced on weekends, but the imbalance between inbound and outbound stations at the same time on weekdays is more obvious. For example, at Zhujiang New Town Station, commuters mainly arrive during the morning rush hour, and the inbound passenger flow at the station is much larger than the outbound passenger flow during the morning rush hour, with the largest difference of 3523 passengers. During the evening rush hour, as commuters return to their places of residence, the passenger flow at the exit is greater than that at the entrance, with a maximum difference of 1168 passengers.
3.2.2. Total Distribution Characteristics of Passenger Flow at All Stations
The top five stations of daily average passenger flow of the Guangzhou Metro are selected as the research object, and the cumulative frequency distribution of passenger flow in and out of stations is plotted, and the results are shown in
Figure 3.
It can be found that, whether on weekdays or weekends, the passenger flow in most stations is concentrated within 1500 passengers every 30 min, which can be considered as the general operation hours of the station, where the density of passengers in the station is low and the walking time is less influenced by the passenger flow. Therefore, this paper will quantify the relationship between the two and find the threshold of passenger flow that significantly affects the walking time, so as to provide a basis for judging the walking time constraint of the model.
3.2.3. Identify Passenger Flow Thresholds That Affect Walking Time
Due to the uneven distribution characteristics of station passenger flow at different time periods and in different environments in and out of the station, the walking time may vary at different time periods and in different directions in and out of the station within the same station. Xie [
24], through a simulation study of passenger flow and walking time in the Changsha Metro Wanjiali square station, found that when the passenger access volume reaches 6000 passengers per hour, the walking time of passengers will increase with the increase of passenger flow due to the restricted area of the station. In this paper, the initial threshold value of passenger flow is set to 3000 passengers per 30 min for calculation with reference to this value. To further verify the reasonableness of this value, the validity of the results of more than 1500, 2000, 2500, 3500, and 4000 thresholds are compared in units of 500 passengers. (Unit: passenger per 30 min).
5. Example Validation Analysis
The above calculation method is applicable to all metro stations. In this paper, the lines with high passenger flow in the Guangzhou Metro are selected as the research objects, and there are 87 stations in 5 lines, among which 7 stations are interchange stations in the research line network. The operation period, 6:00–23:59, is divided into 38 time periods labeled by 30 min as a unit time, so theoretically, there are a total of 3306 research objects. Due to the low number of OD trips in some time periods, the final result of 2547 valid station time periods is obtained.
5.1. Accuracy Validation of Model Results
This subsection will verify whether the results of the model are correct. Since the transfer walking time in the station walking time has a reference value in the route planning, it can be compared with the calculated transfer walking time by crawling this data to evaluate whether the results of the model in this paper are accurate. It will also be verified from two date types: working day and weekend.
Analysis of the mean transfer walking time results. Taking the walking time at the Gongyuanqian transfer station as an example, the OD travel data for only one transfer (and the transfer station was Gongyuanqian) were used to derive the transfer walking time for each time period at this station, and the calculation results of weekdays are shown in
Table 2.
For weekdays, a total of 24 time periods were obtained by calculating the average transfer walking time, which is 45.94 s. Using the Baidu map to check the walking time and walking distance of the transfer station, the walking length is 71 m and the walking time is 1 min, because the time accuracy is 1 min, so it may include 0–59 s, and 45.94 s is in the interval, so it can be considered that the walking time has reference. For weekends, due to the small amount of data, only six periods of data are obtained, but from the average value of the data in six periods, the walking time in Gongyuanqian Station on weekends is 39.12 s, which is also accurate. However, there are not enough data from the weekend result for feature distribution analysis. Therefore, the results of working days are distributed and verified below, but it does not mean that the results of weekends are unreliable.
Analysis of the results of transfer walking time distribution. The results were analyzed using SPSS to obtain frequency histograms, as shown in
Figure 5.
According to the histogram, it is estimated that the station transfer walking time may obey normal distribution. Using SPSS to further the KS normal test for the park front transfer walking time, the results show that the significance level is 0.2, which is greater than 0.5, so it can be considered to obey normal distribution, further drawing a Q-Q diagram (see
Figure 6). The graph scatter and straight line better match also verifies its normal distribution. This result is consistent with the existing walking time. The results are consistent with the results of existing studies on walking time distribution, which also note the reasonableness of the results.
5.2. Model Validity Verification
5.2.1. Model Validity Verification
Because the stations with large passenger flow are excluded from the constraints, it is necessary to verify the validity of the results of such stations.
Validation idea: Screening the data of stations with access or exit passenger flow greater than 3000 passengers per 30 min; classifying the data into “access passenger flow greater than exit passenger flow” and “access passenger flow less than exit passenger flow” according to the amount of access or exit passenger flow. If the direction is the same, the hypothesis is considered valid, and the proportion of validity is calculated and averaged for the two types of results.
The applicability of the results of the station during the heavy passenger flow period on weekdays and weekends is verified below. The calculation results are as follows:
As can be seen from
Table 3, for weekdays, there are 29 station periods in the first category that have a high level of access passenger flow, and 83% of them have walking differences in the same direction as the passenger flow difference. In the second category, there are four station periods and the walking differences in the station periods are in the same direction as the passenger flow difference. In other words, on average, 91.5% of the station periods have walking variation in the same direction as the passenger flow, so it is reasonable to assume that the passenger flow density has an impact on the walking time in the station. At the same time, the average delay value for the first category is 44.54 s; the average delay value for the second category is 113.66 s. The lower delay value for the impact of access passenger flow may be due to the fact that the passenger flow is buffered from the short-time accumulation of passenger flow by actions such as flow restriction and security checks before entering the station.
However, for weekends, the average effective rate is 0.625, which may be caused by the small amount of weekend data. From the perspective of passenger flow, it can be seen that even in the case of large passenger flow, passenger flow basically does not delay the travel time.
5.2.2. Comparison of the Effectiveness of Multiple Passenger Flow Thresholds
In the model, station walking time constraints need to be evaluated based on passenger flow period thresholds. In this section, we select several large passenger flow thresholds, such as 1500, 2000, 2500, 3000, 3500, 4000, etc., as the basis for judgment and compare the effectiveness of the thresholds on the model constraints (unit: passenger per 30 min). The validity results of different passenger flow thresholds on weekdays are as follows:
As can be seen from
Table 4, the validity of the constraints increases with the increase of the threshold value of large passenger flow, but the time slots of the stations that satisfy the conditions decrease with the increase of the threshold value. When the threshold value of large passenger flow reaches 3000 passengers/30 min, the accuracy of the constraints increases most obviously, and the accuracy rate reaches 92%. If the threshold value of passenger flow continues to increase, although the accuracy rate is increasing, the change is not much and the stations can satisfy too few time slots, which cannot well reflect the impact of actual passenger flow on walking time. Therefore, this paper considers that “when the station passenger flow access or exit reaches 3000 passengers per 30 min, it will have an impact on the walking time” has some reference significance. At the same time, when the same verification is carried out on weekends, although the efficiency is low, the same trend can be obtained, so I will not repeat it here.