6.1. Estimation Method
Considering the irregular distribution of the response variables, the quantile regression method is applied to model the departure delay on different percentiles of departure delay, while OLS regression is taken as a benchmark. The resulting coefficients yield elasticities with regard to the continuous explanatory variables. The OLS model equation is represented as follows:
The QR model equation is represented as:
where
represents flight departure delay;
indicates the intercept;
denote the slopes of the covariates; and
refers to the error term. In the QR model specification,
denotes the
th quantile. Equation (3) gives one coefficient for each variable. In OLS regression,
is assumed to be Identical Independent Distributed (IID) across observations with a zero mean and constant variance. Our preliminary investigation, by performing the White test, suggests that error heteroskedasticity could be present. To mitigate this issue, OLS was performed together with the White heteroskedasticity consistent estimator [
26]. For Equation (4), a separate equation was specified and estimated for each quantile. Considering that severe delays always result in a large amount of costs to the civil aviation industry, three quantiles were added above the 90th quantile to investigate how severe a delay can be caused. Therefore, twelve quantiles are taken into consideration in the paper: 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, 90th, 93rd, 96th, and 99th percentiles.
In this paper, quantile regression is adopted to capture the tail features of the distribution, which is a significant characteristic and represents a serious problem in airline operations. Compared to OLS regression, the QR estimator is regarded as a consistent estimator in the presence of heteroskedasticity or non-normally distributed errors. Therefore, QR can be used to analyze the change in coefficients with quantiles even in the skewed tails.
6.2. OLS Estimation Results
The estimation results are shown in
Table 6. Over the course of estimating the departure delay model as specified below, the coefficients obtained from the OLS model invariably manifested the anticipated signs and were of statistical significance.
Figure 4 shows the impact of all the dummy variables. The signs of coefficients were expected for these variables.
A variety of delay propagation-related factors, including ArrDelay_PreFlt, Dif_TT and FirstFlight, demonstrate the impact of the previous flight due to the chain effect. There is a possibility that the previous arrival delay of an aircraft might propagate to the next flight leg if the turnaround time is limited. As shown by the result in
Table 6, a minute of ArrDelay_PreFlt will lead to 0.575 min (
p < 0.01) of departure delay, which suggests that a 100-min arrival delays tend to cause the next departure to suffer a 1-hour delay. There is another crucial delay propagation-related factor, that is, ground buffer time—or Dif_TT, in this paper. It explains how the previous arrival delay of an aircraft is absorbed during the turnaround time. In addition to the MCT of each flight, the buffer time will be applied to recover some previous arrival delays. One minute of buffer time will lead to the reduction of 0.081 min (
p < 0.01) of the departure delay. If a flight is allowed an hour of buffer time, only a 4.8-min departure delay can be reduced, on average. Supposing that a flight is in the first order of a day, it will suffer no previous arrival delays. In the absence of delay propagation-related effects, a flight is more likely to be punctual. The estimation shows that flights in the first order will take 3.178 min less on average.
CumDepDelay variable is used to capture airport congestion for departure delays. The accumulated departure delays at an airport for the previous hour might also to propagate to a scheduled flight. Despite not being operated by a single aircraft, these flights share the same airport and terminal areas. It is likely that a more congested airport means a longer queue for flights to take off. As revealed by the estimation result, CumDepDelay will lead to 0.029 min (
p < 0.01) of departure delay per minute of airport accumulated delay, which implies that if there are 100 min of CumDepDelay, a flight will face 2.9 min of departure delay. According to the CAAC rules, flights are prohibited from take-off when facing convective weather conditions. As shown in
Table 6, the anticipated result is listed for ConvecWeather effect. It causes an additional 15.519 min (
p < 0.01) of departure delay and is the dominant variable in all the dummies, which is consistent with the strict requirements of the CAAC.
The daytime variable is used to determine whether a flight scheduled for departures in the daytime, since operating in the busy period, could suffer more delays. It is observed that each flight carried out in the daytime has an extra 7.479 min of delay (p < 0.01) on average. Human behavior is one of the reasons for people to choose flights operating during daytime, due to their business. Another time variable, weekday, also causes 2.776 min (p < 0.01) of departure delay. One potential reason for weekday to cause more delays is that the higher demand for people to deal with their business on weekdays leads to more severe airspace congestions.
The coefficient of the AirlineBase confirms the view that having a base in an airport could reduce departure delay on average, which implies that airlines with the airport as the base have a shorter departure delay of 5.285 min on average. Wide-body aircrafts experience a longer departure delay—of 11.281 min, on average—than narrow-body aircrafts. Besides, they requires more services offered on ground, and have a larger number of passengers on board, which may extend the delay.
6.3. QR Estimation Results
QR represents an extension to the linear model, for the purpose of estimating rates of change in all parts of the distribution of a response variable. In this paper, an analysis is conducted on the departure delay distribution from the 10th percentile to the 99th percentile. A total of four quantiles are set in the long delays (from 90th to 99th), since understanding the factors that affect extreme delays is important to the prevention of such extreme situations.
Table 7 presents quantile regression estimation for each of the twelve quantile models considered in this paper. According to
Table 7, the changes in coefficients with quantiles are shown in
Figure 5.
In this paper, quantile regression is performed by R Language [
27]. In each plot of
Figure 5, the regression coefficient at a given quantile indicates the effect on departure delay of a unit change in that variable, assuming that all the other variables are fixed, with 95% confidence interval bands.
Figure 5 presents a summary of quantile regression results for departure delay. For each of the nine factors, 12 distinct quantile regressions are estimated for p ranging from 0.10 to 0.99 as the solid curve with filled dots. For every single covariate, it is possible for these point estimates to be interpreted as the impact of a one-unit change of the covariate on departure delay, with other covariates remaining unchanged. Thus, each of the plots is assigned a horizontal quantile in delays to indicate the covariate effect. The orange line in each plot indicates the OLS estimation of the conditional mean effect, while the orange shadow part represents conventional 95% confidence intervals. The covariate changes with quantile, as shown in the blue line. As in OLS estimation, the blue shadow part indicates a 95% pointwise confidence band for the QR estimates.
6.4. Results Comparison
The quantile regression estimation results are listed in
Table 7. Three delay propagation-related variables listed in
Table 7 show different tendencies. ArrDelay_PreFlt effect continues an upward trend, but at a slower pace, especially after the average effect. It reaches the maximum at 70% marginal effect on departure delay after the 80th quantile of the response variable. Moreover, ArrDelay_PreFlt represents a statistically significant factor for the whole quantile models (
p < 0.01), indicating that previous arrival delays can affect departure delay through all parts of the distribution. Dif_TT is typically significant at 10% level below or equal the 80th quantile (
p < 0.1). Nevertheless, the delay reduction effect is mitigated with the increase of quantile. Notably, the Buffer effect enters an area with a positive sign in the right tail. Moreover, the coefficients after the 90th quantile are made insignificant, suggesting that more buffers are incapable of taking delay recoveries better for long delays. Therefore, it is not regarded as an effective way to allocate more buffers to improve on-time performance for flights facing long delays. FirstFlight overweighs relative to short and median delays, with significant estimations from the 10th to the 80th quantile (
p < 0.01,
p < 0.05 or
p < 0.1 respectively). The significant part matches the common sense, and the negative effect is shown, which indicates that the first-order flights suffer less delays.
Turning now to other variable impacts, airport congestion is represented by CumDepDelay. Severe delays at an airport are speculated to bear relation to some unexpected situations, such as airspace congestion, terminal area congestion, or air traffic management procedures. Furthermore, the significance of all QR estimations reveal that flights are easily affected by the state of airport traffic (p < 0.01). After the cross of two type of regression curves, the impact of CumDepDelay increases at a faster pace, except for the extreme delay.
Convective weather makes a huge impact from the 70th to the 96th percentile, reaching the mean effect between the 70th and the 80th quantile. Thunderstorms contribute almost an hour (52.5 min, p < 0.05) to flight departure delay at the 90th percentile of the conditional distribution, thus reaching its maximum effect. However, ConvecWeather effect is found insignificant before the 70th quantile. This conforms to our expectations that convective weather has a higher likelihood to contribute to severe delays. It is noteworthy that the estimation at the 99th quantile is not significant, indicating that convective weather is not a contributing factor for extreme delay. One possible explanation for this situation is that some of the other unobserved factors, such as equipment malfunction or military activities, could be contributory to the extreme delay as well.
With respect to the temporal characteristics, DayTime variable is involved to characterize flights operation in the daytime (from 6 a.m. to 8 p.m.). The coefficients exhibit all positive effect, but no significant variation for all quantiles. On the other hand, the weekday effect shows a significant increase on weekdays after the 90th quantile, and even the efforts in the models are made much more moderate compared to other variables for short delays (statistically insignificant). It is worth noting that this effect is significantly enhanced and tends to be one of the major causes of an extreme delay (at the 99th quantile).
AirlineBase determines whether a flight can be obtained with better ground service. As anticipated, Airl effect bears a negative association with a continuous decrease trend on the departure delay in all quantiles (p < 0.01), except for the insignificant estimation at the 10th and 99th quantiles. Meanwhile, the coefficients in the right-tail show a steep decrease, which conforms to the fact that setting an airline base at an airport will be beneficial for airlines to reduce risks when emergencies occur. WideBody effect also exhibits an increasing trend from 2.8 min to 30 min (p < 0.01 or p < 0.05, respectively), although no statistical significance was found for the extreme delays, as demonstrated by the factor of airline base.
With QR analysis, variables that affect departure delay in different quantiles can be analyzed, which helps airlines study how these cause impact departure delay specifically. Moreover, it provides airlines with a different perspective to prevent flight delays. For short delays (less than 15 min), delay propagation-related factors play an important role from the 10th to the 60th quantile. Daytime, AirlineBase, and WideBody are also non-negligible factors in delays. All variables show the significances from the 70th to the 80th quantile. Convective weather and aircraft type begin to present much more influence. Flight buffer and FirstFlight are not efficient anymore when facing long delays (above 65 min). Besides, weekday impact grows increasingly, while daytime is almost insignificant.