*Article* **Time-Series Prediction of Intense Wind Shear Using Machine Learning Algorithms: A Case Study of Hong Kong International Airport**

**Afaq Khattak 1,\*, Pak-Wai Chan <sup>2</sup> , Feng Chen 1,\* and Haorong Peng <sup>3</sup>**


**Abstract:** Machine learning algorithms are applied to predict intense wind shear from the Doppler LiDAR data located at the Hong Kong International Airport. Forecasting intense wind shear in the vicinity of airport runways is vital in order to make intelligent management and timely flight operation decisions. To predict the time series of intense wind shear, Bayesian optimized machine learning models such as adaptive boosting, light gradient boosting machine, categorical boosting, extreme gradient boosting, random forest, and natural gradient boosting are developed in this study. The time-series prediction describes a model that predicts future values based on past values. Based on the testing set, the Bayesian optimized-Extreme Gradient Boosting (XGBoost) model outperformed the other models in terms of mean absolute error (1.764), mean squared error (5.611), root mean squared error (2.368), and R-Square (0.859). Afterwards, the XGBoost model is interpreted using the SHapley Additive exPlanations (SHAP) method. The XGBoost-based importance and SHAP method reveal that the month of the year and the encounter location of the most intense wind shear were the most influential features. August is more likely to have a high number of intense wind-shear events. The majority of the intense wind-shear events occurred on the runway and within one nautical mile of the departure end of the runway.

**Keywords:** wind shear; time-series modeling; machine learning; Bayesian optimization

### **1. Introduction**

Wind shear is a potentially hazardous meteorological occurrence characterized by sudden changes in wind speed and/or direction. If this event occurs below 500 m (1600 feet) above the ground, it is classified as low-level wind shear; if its magnitude exceeds 30 knots, it is known as intense wind shear [1]. It is one of the most worrisome phenomena for an aircraft because it creates violent turbulence and eddies as well as dramatic shifts in the aircraft's horizontal and vertical progression, which can ultimately result in a frequent missed approach, touching down short of the runway (loss of lift), or deviation from the true flight path during landing descent, as depicted in Figure 1. The intense wind shear has two potentially dangerous effects on landing aircraft: aberration of the flight path and deviation from the set approach speed [2]. Due to unanticipated changes in wind speed or direction, the pilot may perceive immense pressure during the landing phase when the engine power is low and the airspeed is close to stall speed.

Numerous airports around the world have reaped substantial benefits from the availability of precise, high-resolution, remote sensing technologies such as the Terminal Doppler Weather Radar (TDWR) [3] and the Doppler Light Detection and Range (LiDAR) [4,5]. By a significant margin, the most prevalent methods for detecting wind shear are TDWR, ground-based anemometer networks, and wind profilers. Since the mid-1990s, this method

**Citation:** Khattak, A.; Chan, P.-W.; Chen, F.; Peng, H. Time-Series Prediction of Intense Wind Shear Using Machine Learning Algorithms: A Case Study of Hong Kong International Airport. *Atmosphere* **2023**, *14*, 268. https://doi.org/ 10.3390/atmos14020268

Academic Editors: Duanyang Liu, Hongbin Wang and Shoupeng Zhu

Received: 27 December 2022 Revised: 25 January 2023 Accepted: 27 January 2023 Published: 28 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

has proved effective for alerting airports to wind shear, particularly during the passage of tropical cyclones and thunderstorms. Clear weather prevents the TDWR system from providing accurate wind data. However, certain wind-shear events are associated with airflow reaching the airport from rugged terrain. To address these circumstances, a new method of detection independent of humidity must be developed. For this purpose, the LiDAR system has been added to the TDWR as a booster in order to detect and warn of wind shear in clear skies. Doppler LiDAR can detect return signals from aerosols and provide precise Doppler wind measurements when the air is clear. Although these tracking or observation-based technological advances are effective at detecting wind shear in the vicinity of an airport, they are unable to predict when the next wind-shear event will occur, or which risk factors contribute to its occurrence [6]. Forecasting intense wind shear in the vicinity of the airport runway and the factors that contribute to the occurrence of intense wind shear are of the utmost importance, as their occurrence can cause significant challenges for departing and approaching flights.

**Figure 1.** Intense wind shear effect on landing aircraft.

The development of a framework for the prediction of intense wind shear requires a substantial amount of historical data on wind-shear events. Despite the fact that numerous researchers in the power and energy domain have attempted to forecast wind speed due to the demand for wind energy electricity generation and advancements in wind energy competitiveness [7–9], few researchers have attempted to forecast wind-shear events in the vicinity of airport runways [10,11]. For time-series modeling, several statistical and mathematical techniques have been employed in the past, such as autoregressive integrated moving average (ARIMA) [12–14], Kolmogorov–Zurbenko filters [15,16], exponential smoothing [17,18], and others. These often result in good forecasting accuracy. However, machine learning algorithms have recently been applied in various domains due to their high forecasting precision and improved operational efficiency [19–24]. Therefore, in this study, we propose the development of time-series prediction models of intense wind shear using machine learning algorithms. The study employed Doppler LiDAR data from 2017 to 2010 and machine learning algorithms including the Adaptive Boosting (AdaBoost) [25], Light Gradient Boosting Machine (LightGBM) [26], Categorical Boosting (CatBoost) [27], Gradient Boosting (XGBoost) [28], Random Forest [29], and Natural Gradient Boosting (NGBoost) [30] methods, optimized via a Bayesian optimization approach [31], as shown in Figure 2.

In addition to evaluating the performance of models in order to select the optimal model, crucial factors that contribute to the occurrence of intense wind shear are also revealed. Researchers in the field of civil aviation safety should seize this opportunity as understanding the complex interactions between multiple risk factors that determine the occurrence of intense wind shear is essential for aviation and meteorological applications.

**Figure 2.** Framework for the time-series prediction of intense wind-shear event.

### **2. Data and Methods**

### *2.1. Study Location*

Hong Kong International Airport (HKIA) is among the most susceptible airports in the world to the occurrence of wind-shear events, and from 1998 to 2015 a significant number of intense wind-shear events were documented. Wind-shear events occur once every 400 to 500 flights, according to HKIA-based pilot flight reports [32]. The airport is situated on Lantau Island, surrounded on three sides by open sea water and by mountains to the south that reach heights of more than 900 m above sea level. As is illustrated in Figure 3, the mountainous terrain to the south of the HKIA exacerbates wind shear by disrupting the flow of air and producing turbulence along the HKIA flight paths. Previously, HKIA had two runways: the north and south runways. However, a newly constructed runway (third runway) implies that the former north runway is now designated as the central runway. These are oriented at 070 degrees and 250 degrees. There are a total of eight possible configurations because each runway can be utilized for takeoffs and landings in either direction. For instance, runway '07LA' indicates landing ('A' refers to arrival), with a heading angle of 070◦ (abbreviated to '07') utilizing the left runway (hence 'L'). This depiction demonstrates aircraft landing on the North Runway from the western side of the HKIA. Similarly, an aircraft taking off from the South Runway in the west would use runway 25LD.

**Figure 3.** HKIA and surrounding terrain.

### *2.2. Data Processing from Doppler LiDAR*

The Doppler LiDAR at the HKIA detects the magnitude and reports the location of occurrence of wind-shear events. Figure 4 depicts an illustration of a radial velocity plot obtained from a Plan Position Indicator (PPI) scan of the HKIA's south runway LIDAR at an elevation angle of 3◦ from the horizon. To the west and south of the location, three nautical miles (5.6 km) west-southwest of the western end of the south runway, there was a huge area of winds in the opposite direction (colored green in Figure 4) to the dominant east–southeast airflow.

**Figure 4.** Wind shear detection by LIDAR.

The development of our time-series prediction models required a substantial amount of intense wind shear data for our research. Therefore, we first extracted the 2017 to 2020 wind shear data from LiDAR and filtered it to obtain only intense wind-shear events, i.e., wind shear with a magnitude greater than or equal to 30 knots. The filtration produced 3781 intense wind shear data points, which are presented in Table 1. Previous research [11] on the wind shear prediction utilized hourly data from pilot reports and weather reports, which resulted in lower accuracy due to the transient and sporadic nature of wind shear. In several instances at the HKIA, the Doppler LiDAR reported intense wind shear intervals as short as 1 min; consequently, we have considered these instances. As an example, from Table 1, we can observe that on 29 March 2019 intense wind-shear events of 37 knots and 39 knots were detected at 10:12 PM and 10:14 PM (at a 2 min interval) on runways 07CA and 07RA, respectively. The encounter locations are designated as either RWY, MD, or MF, as is shown in Figure 5. The rectangle in gray denotes the runway (RWY). On the right side of the runway, the rectangles indicate the distance in miles to the final approach (1-MF is equal to 1 nautical mile to the final approach). Likewise, the rectangles on the left indicate the distance from the runway's departure end. For instance, 2-MD indicates two nautical miles from the runway's edge at the departure end.


**Figure 5.** Schematic diagram for the representation of intense wind shear encounter locations.

### *2.3. Machine Learning Regression Algorithms*

In this study, six machine learning regression algorithms were employed for the timeseries prediction of intense wind-shear events, including LightGBM, XGBoost, NGBoost, AdaBoost, CatBoost, and RF. The fundamentals of the regression algorithm are described as follows:

#### 2.3.1. Light Gradient Boosting Machine (LightGBM) Regression

LightGBM is a gradient learning framework that is based on decision trees and the concept of boosting. It is a variant of gradient learning. Its primary distinction from the XGBoost model is that it employs histogram-based schemes to expedite the training phase while lowering memory usage and implementing a leaf-wise expansion strategy with depth constraints. The fundamental concept of the histogram-based scheme is to partition continuous, floating-point eigenvalues into '*k*' bins and build a histogram with a width of *k*. It does not require the additional storage of presorted outcomes and can also save the value after the partitioning of features, which is usually adequate to store with 8-bit integers, thereby lowering memory consumption to 1/8 of the original. This imprecise partitioning has no effect on the model's precision. It is irrelevant whether the segmentation point is accurate or not because the decision tree is a weak study model. The regularization effect of the coarser segmentation points can also successfully prevent over-fitting.

Several hyperparameters must be adjusted for the LightGBM regression model to prevent overfitting, reduce model complexity, and achieve generalized performance. These hyperparameters are *n\_estimators,* which is the number of boosted trees to fit, *num\_leaves*, which is the maximum number of tree leaves for the base learners, *learning\_rate*, which controls the estimation changes, *reg\_alpha*, which is the L1 regularization term on weights, and *reg\_lambda*, which is the L2 regularization term on model weights.

### 2.3.2. Extreme Gradient Boosting (XGBoost) Regression

XGBoost is a tree-based boosting technique variant. Fundamentally, XGBoost reveals the functional relationship, Γ, between the input factors *x* and the response *y* via an iterative procedure wherein individual, independent trees are trained in a sequential manner on the residuals from the preceding tree. The mathematical expression for the tree-based estimates is given by Equation (1).

$$\hat{Y} = \Gamma(X) = \frac{1}{n} \sum\_{k=1}^{n} \Gamma\_k(X) \tag{1}$$

where *Y*ˆ represents the predictions and *n* illustrates the total number of trees. The regularized objective function, Ψ(Ω), is minimized to learn the set of functions Γ*k*, which are employed in the model, as shown by Equations (2) and (3).

$$\Psi(\Omega) = \sum\_{i} \lambda(\hat{y}\_{i\prime} y\_{i}) + \sum\_{k} \Pi(\Gamma\_{k}) \tag{2}$$

$$
\Pi(\Gamma\_k) = \phi T + \frac{1}{2}l||\omega||^2 \tag{3}
$$

where *λ* represents the differentiable convex loss function that estimates the difference between the prediction and actual response. The term Π is an additional regularization expression that panelizes the growth of further trees in the model to reduce intricacies and over-fitting. The term *φ* represents the leaf's complexity, and *T* is the total number of leaves in a tree. Likewise, for the XGBoost regression model, hyperparameters including the *n\_estimators, num\_leaves*, *learning\_rate*, *reg\_alpha*, and *reg\_lambda* must be optimized to prevent overfitting and reduce model complexity.

### 2.3.3. Natural Gradient Boosting (NGBoost) Regression

NGBoost is a supervised learning technique with basic probabilistic prediction capabilities. A probabilistic prediction generates a complete probability distribution over a whole outcome space, allowing users to evaluate the uncertainty in the model's predictions. In conventional point prediction configurations, the object of concern is an estimate of the scalar function, Φ(*y*|*x*), in which *x* represents a vector of different factors and *y* is the response, but uncertainty estimates are not considered. In a probabilistic prediction context, on the other hand, a stochastic forecast with a probability distribution, Θ*<sup>θ</sup>* (*y*|*x*), is generated by predicting the parameters *θ*. Provided that NGBoost is intended to be scalable and modular with respect to the base estimator (for instance the decision trees), probability distribution parameter (for instance, normal, Laplace, etc.), and scoring rule, NGBoost can perform probabilistic forecasts with flexible, tree-based models (for instance, the Maximum Likelihood Estimation). As is depicted in Figure 6, the input vector of the different factors *x* in the hybrid NGBoost model is forwarded to the base estimator (decision trees) to generate a probability distribution, Θ*<sup>θ</sup>* (*y*|*x*), over the a whole outcome space, *y*. The models are then improved using a scoring rule, *S*(Θ*θ*, *y*), that produces calibrated uncertainty and point predictions using a maximum likelihood estimation function. Prior to evaluation, the NGBoost regression model parameters *n\_estimators* and the *learning\_rate* must be optimized.

**Figure 6.** Mechanism of NGBoost regression algorithm.

### 2.3.4. Categorical Boosting (CatBoost) Regression

CatBoost is an innovative, gradient-boosting decision tree technique. It is capable of handling categorical factors and employ them in the training phase rather than in preprocessing phase. CatBoost's advantage is that it utilizes a new pattern to determine the leaf values while choosing the tree structure, which aids in reducing over-fitting and enables the utilization of the entire training data set, i.e., it organizes the data of each instance randomly and quantifies the mean value of the instances. For the regression problem, the average of the acquired data must be utilized for a priori estimations. The parameters for the CatBoost regression model that must be optimized prior to evaluation are *n\_estimators*, *max\_depth*, and the *learning\_rate.*

### 2.3.5. Adaptive Boosting (AdaBoost) Regression

Adaptive Boosting Regression is a straightforward ensemble learning model which creates a powerful regressor by integrating several weak learners, resulting in a highaccuracy model. The core concept is to establish the weights of weak regressors and train the dataset at each iteration such that reliable projections of unusual observations may be made. The working principle of AdaBoost is provided below:


training dataset as *<sup>π</sup><sup>k</sup>* <sup>=</sup> *<sup>π</sup>k*−1*exp*−*ψkh*(*xk* ) *<sup>Ω</sup>* ;

• The final output over all the iterations *t* =1, 2, ... , *T* is returned as *f*(*X*) = ∑*<sup>T</sup> <sup>t</sup>*=*<sup>1</sup> πtht*(*X*) and *H*(*X*)= *sign*(*f*(*X*)).

The AdaBoost model uses a decision stump as a weak learner. The critical hyperparameters that need to be tuned during the learning process are the *n\_estimators* and *learning\_rate*. The *n\_estimators* are the number of decision stump to train iteratively and the *learning\_rate* controls the contribution of each learner. There is required to be a trade-off between both the *n\_estimators* and *learning\_rate.*

### 2.3.6. Random Forest (RF) Regression

The RF is an ensemble of tree-based predictors in which each tree is trained with values of an independently sampled random vector that has the same distribution for all other trees in the forest. The *kth* tree is conceptually trained using an independent random vector, *ζk*, with the same distribution as previous random vectors, *ζk*<sup>−</sup>1, resulting in a tree, *ψ*(*X*, *ζk*), in which *X* is the input vector of different factors. When a large number of trees are grown in a forest, their mean predictions are obtained, which improves the accuracy of predictions and prevents over-fitting. Mathematically, it can be illustrated as Equation (4).

$$\hat{Y} = \frac{1}{l} \sum\_{k=1}^{l} \psi\_k(X) \tag{4}$$

where *Y*ˆ represent the response and *l* is the total amount of generated trees (1 ≤ *k* ≤ *l*). The mean squared generalization error of any tree *<sup>ψ</sup>*(*X*) is illustrated as *EX*,*Y*(*<sup>Y</sup>* − *<sup>ψ</sup>*(*X*))<sup>2</sup> for the input vector of difference (*X*) and the response vector (*Y*). As the number of trees in the forest approaches the infinity, the mean squared generalization almost certainly becomes:

$$E\_{X,Y}(Y - \Lambda\_k \psi(X, \zeta\_k))^2 \to E\_{X,Y}(Y - E\_{\zeta} \psi(X, \zeta))^2 \tag{5}$$

A few crucial hyperparameters must be tuned during the learning phase in order to achieve an optimized prediction score for the RF model. These hyperparameters are the *n\_estimators*, which is the number of trees in the forest, and the *max\_depth*, which is the maximum number of levels, or branches between the root node and the deepest leaf node.

### *2.4. Principle of Bayesian Optimization*

The structure parameters of a machine learning model are its hyperparameters. Adapting a machine learning model to multiple situations requires adjusting the hyperparameters [33,34]. In this study, a Bayesian hyperparameter optimization method is implemented. The goal is to establish the mapping, *f*(*x*, *θ*), in which *y* is the response, *x* is the input vector, and the *θ* vector determines the size of the mapping. The core principle of Bayesian optimization is adjusting the hyperparameter of a given model in order to formulate a model of the loss function. It utilizes a loss function to efficiently search for and select the optimal set of hyperparameters. Employing the hyperparameter *θ* in a tree-based machine learning model as one of the points in the multidimensional search space for the optimization, the hyperparameter that minimizes the loss function value, *f*(*θ*), can be found in the set *A* ∈ *Xd*, as shown by Equation (6).

$$\theta^\* = \underset{\theta \in A}{\operatorname{argmin}} f(\theta) \tag{6}$$

Usually, there is no prior information about the model's structure; therefore, it is assumed that the noise in the observation is shown by Equation (7).

$$y(\theta) = f(\theta) + \varepsilon \text{, and } \varepsilon \sim \mathcal{N}\left(0, \sigma\_{noise}^2\right) \tag{7}$$

The Bayesian framework offers two fundamental options. First, a hypothesis function. *p*(*f* |D) (also known as a prior function). must be chosen to represent the hypothesis of the function to be optimized. Second, the posterior model determines the acquisition function for determining the subsequent test point. Using the prior function,*p*(*f* |D), the Bayesian framework constructs a loss function model based on an observed data sample, D. The

prior function model, *p*(*f* |D), chooses between optimization and development based on its characteristics.

### *2.5. Performance Assessment*

The generalization capacity of various machine learning regression models could be synthetically quantified using four different metrics: the mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and the R-square (R2, coefficient of determination). According to Equation (8), the MAE is the average of the individual prediction errors' absolute values across all instances. The average squared difference between observed and predicted values, as shown in Equation (9) is how the MSE computes regression model error. According to Equation (10), the RMSE is the square root of the difference between the observed and predicted values. A regression model's ability to accurately predict values is indicated by *R*2, which ranges from 0 to 1. *R*<sup>2</sup> is provided by Equation (11).

$$\text{MAE} = \sum\_{\chi=1}^{\Phi} \frac{|y\_{\chi} - \hat{y}\_{\chi}|}{\chi} \tag{8}$$

$$\text{MSE} = \frac{1}{\chi} \sum\_{\chi=1}^{\Phi} \left( y\_{\chi} - \mathfrak{f}\_{\chi} \right)^{2} \tag{9}$$

$$\text{RMSE} = \sqrt{\sum\_{\chi=1}^{\Phi} \frac{\left(y\_{\chi} - \mathcal{Y}\_{\chi}\right)^{2}}{\chi}} \tag{10}$$

$$R^2 = 1 - \frac{\sum\_{\boldsymbol{x}=1}^{\Phi} (\boldsymbol{y}\_{\boldsymbol{\mathcal{X}}} - \boldsymbol{\mathcal{y}}\_{\boldsymbol{\mathcal{X}}})^2}{\sum\_{\boldsymbol{\mathcal{X}}=1}^{\Phi} (\boldsymbol{y}\_{\boldsymbol{\mathcal{X}}} - \boldsymbol{y}\_{\boldsymbol{avg}})^2} \tag{11}$$

where *χ* is the total number of observations, *y* represents the actual observation value, and *y*ˆ represents the predicted value.

### **3. Results and Discussion**

The LiDAR data of 2017 to 2020 from the Hong Kong Observatory and the aviation weather forecast department at HKIA were used to train and test six different machine learning regression models with the goal of determining how well these models can predict the occurrence of intense wind-shear events. Figure 7a depicts the total LiDAR-obtained intense wind-shear data from 1 January 2017 to 31 December 2020. The data from 1 January 2017 to 31 December 2019 are the training set, which is depicted by the black line in Figure 7b, while the data from 1 January 2020 to 31 December 2020 are the test set, which is depicted by the green line. The vertical red line with dashes divides the training data from the test data.

**Figure 7.** LiDAR data: (**a**) 2017–2020 intense wind shear data; (**b**) splitting data into train and test sets.

The statistical information of the intense wind shear dataset is shown in Table 2. The machine learning models, coupled with Bayesian optimization and a 5-fold cross validation, provide the predicted results based on the optimal hyperparameters. The Hyperopt python package was used for the implementation of Bayesian optimization. The optimal hyperparameters with search space are shown in Table 3. Table 4 shows the comparison of the prediction performance of the machine learning regression algorithms. The predicted intense wind shear values, based the on machine learning regression algorithms, are plotted in Figure 8, and the residual errors by the machine learning models are shown by the scatter plots (Figure 9). In addition, feature importance and contribution are illustrated by Figure 10, and the effect of important factors is shown by Figure 11.

**Table 2.** Statistical information of intense wind shear from HKIA-based LIDAR.


**Table 3.** Optimal hyperparameters of machine learning regression algorithms.


Table 4 demonstrates that the Bayesian optimized-XGBoost model outperforms other machine learning models with a minimum MAE value of 1.764, an MSE value of 5.611, an RMSE value of 2.368, and a maximum R-square value of 0.859. The AdaBoost model, with an MAE of 1.863, MSE of 6.815, RMSE of 2.610, and an R-square value of 0.549, performs the worst. In addition, an analysis of Figure 8 reveals that XGBoost appears to provide a better fit of the actual test intense wind shear time-series and a smaller residual error, represented by red dots closer to horizontal line, when compared to other forecasting results (Figure 9).

**Table 4.** Performance assessment of Bayesian optimized machine learning models.


**Figure 8.** Predictions using machine learning models: (**a**) prediction of intense wind shear by XGBoost; (**b**) prediction of intense wind shear by LightGBM; (**c**) prediction of intense wind shear by CatBoost; (**d**) prediction of intense wind shear by Random Forest; (**e**) prediction of intense wind shear by NGBoost; and (**f**) prediction of intense wind shear by AdaBoost.

**Figure 9.** Residual analysis by machine learning regression models; (**a**) NGBoost; (**b**) LightGBM; (**c**) CatBoost; (**d**) Random Forest; (**e**) XGBoost; and (**f**) AdaBoost.

The importance and contribution of the factors are depicted in Figure 10 and are based on the importance score that was determined by the Bayesian optimized-XGBoost model and the XGBoost-based SHAP contribution plot, respectively. In both cases, it was observed that the month of year was the most significant feature, with an importance score of 0.33, followed by the location of intense wind shear (0.19), the hour of the day (0.18), and runway orientation (0.16). Figure 10b revealed that months of the year coded by lower values are less likely to cause intense wind shear, in contrast to those with medium values. Similarly, the location of an encounter with intense wind shear, represented by higher values, is more likely to cause intense wind shear. In the following section, each important feature that plays a role in the occurrence of intense wind shear is discussed in more detail.

**Figure 10.** Importance and contribution plots: (**a**) XGBoost-based feature importance plot and (**b**) XGBoost-based SHAP contribution plot.

Figure 11a,b depict the scatter plot of two significant factors. Figure 11a illustrates that the highest number of intense wind-shear events were recorded in August. The intense wind shear in August might be due to cross-mountain airflow, which occurs over the HKIA in August and September, during the south-west monsoon, or during passages of tropical cyclones. These terrain-disrupted airflows cause a number of intense wind-shear events, which negatively impact HKIA's flight safety and operations. This is also consistent with the previous study [11,35].

On the RWY and 1-MD from the edge of the RWY, a large number of intense windshear events are observed, as shown in Figure 11b. A small number of intense wind-shear events were observed as the distance increases from the RWY. To the best of our knowledge, none of the previous studies have pinpointed the location where intense wind shear is most prevalent. Nevertheless, our research indicates that RWY and 1-MD from edge of RWY are crucial to the occurrence of intense wind shear. Pilots must maintain vigilance at 1-MD during takeoff.

**Figure 11.** Effect of factors on the Intense wind shear: (**a**) month of year and (**b**) encounter location of intense wind shear.

### **4. Conclusions and Recommendations**

This study is a first attempt at developing a time-series prediction model of intense wind-shear events based on HKIA-based LiDAR data. Six state-of-the-art machine learning regression algorithms, optimized via the Bayesian optimization approach, were employed in this regard. The HKIA-based LiDAR data from 2017 to 2020 was used as the input. From this study, the following conclusions can be drawn:


For aviation authorities and researchers interested in aviation safety, the methodology put forth in this study can be used to conduct an extensive investigation of intense wind shear. The study covered in this paper was the time-series prediction of intense wind shear using six machine learning models coupled with a Bayesian optimization approach. Future research might use an amalgamation of a stacking ensemble and various other machine learning ensemble algorithms with a number of additional risk factors, such as the impact of atmospheric pressure and temperature. In addition, the causes of the occurrence of wind shear (weather- or terrain-induced) could be used in future research.

**Author Contributions:** Conceptualization, A.K. and P.-W.C.; data curation, P.-W.C.; formal analysis, A.K.; funding acquisition, F.C.; investigation, P.-W.C.; methodology, A.K.; project administration, F.C.; resources, H.P.; software, F.C.; supervision, P.-W.C.; validation, F.C.; visualization, H.P.; writing original draft, A.K.; writing—review & editing, H.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by National Natural Science Foundation of China (U1733113), the Shanghai Municipal Science and Technology Major Project (2021SHZDZX0100), the Research Fund for International Young Scientists (RFIS) of the National Natural Science Foundation of China (NSFC) (Grant No. 52250410351) and the National Foreign Expert Project (Grant No. QN2022133001L).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data are not publicly available due to restrictions.

**Acknowledgments:** We are grateful to the Hong Kong International Airport Observatory for providing us with LiDAR data.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
