Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data

Zhuang, Zibo; Li, Haosen; Shao, Jingyuan; Chan, Pak-Wai; Tai, Hongda

doi:10.3390/app14114419

Open AccessArticle

Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data

by

Zibo Zhuang

¹

,

Haosen Li

²,

Jingyuan Shao

³,

Pak-Wai Chan

^4,*

and

Hongda Tai

¹

Institute of Aviation Meteorology, Civil Aviation University of China, Tianjin 300300, China

²

School of Safety Science and Engineering, Civil Aviation University of China, Tianjin 300300, China

³

Flight School, Civil Aviation University of China, Tianjin 300300, China

⁴

Aviation Weather Services, Hong Kong Observatory, Hongkong 999077, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4419; https://doi.org/10.3390/app14114419

Submission received: 18 April 2024 / Revised: 17 May 2024 / Accepted: 20 May 2024 / Published: 23 May 2024

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

The proposed method can be utilized to determine whether an aircraft encountered turbulence during or after flight, rather than relying on EDR estimation to ascertain turbulence encounters. By integrating swarm intelligence and machine learning and adopting a data-driven approach to turbulence identification, the method addresses previous challenges encountered in turbulence identification, thereby enhancing the efficacy of aviation safety. This approach demonstrates a certain degree of applicability in improving aviation safety.

Abstract

Turbulence is a significant operational aviation safety hazard during all phases of flight. There is an urgent need for a method of airline turbulence identification in aviation systems to avoid turbulence hazards to aircraft during flight. Integrating flight data and machine learning significantly enhances the efficacy of turbulence identification. Nevertheless, present studies encounter issues including unstable model performance, challenges in data feature extraction, and parameter optimization. Hence, it is imperative to propose a superior approach to enhance the accuracy of turbulence identification along airline. The paper presents a combined swarm intelligence and machine learning model based on data mining for identifying airline turbulence. Based on the theory of swarm-intelligence-based optimization algorithm, the optimal parameters of Categorical Boosting (CatBoost) are obtained by introducing the whale optimization algorithm (WOA), and the corresponding WOA-CatBoost fusion model is established. Then, the Recursive Feature Elimination algorithm (RFE) is used to eliminate the data with lower feature weights, extract the effective features of the data, and the combination with the WOA brings robust optimization effects, whereby the accuracy of CatBoost increased by 11%. The WOA-CatBoost model can perform accurate turbulence identification from QAR data, comparable to that with established EDR approaches and outperforms traditional machine learning models. This discovery highlights the effectiveness of combining swarm intelligence and machine learning algorithms in turbulence monitoring systems to improve aviation safety.

Keywords:

turbulence identification; whale optimization algorithm; CatBoost; parameter adjustment; QAR data

1. Introduction

Turbulence is a common risk that poses a serious threat to the safety of civil aviation. It can cause abrupt vibrations in the aircraft, which can appear as vertical or lateral shakes, or it can cause spontaneous tremors that occur during the flight. These tremors can cause serious structural damage to the aircraft and injure both passengers and flight crew [1,2]. An estimated 63,000 encounters with moderate or greater turbulence and 5000 encounters with severe or greater turbulence occur each year [3]. In order to promptly prevent turbulence-related accidents, it is crucial to explore the accurate identification of turbulence. Turbulence observations are routinely given as PIREPs. However, a pilot’s subjective evaluation of the aircraft’s reaction to turbulence encounters determines the turbulence severity in PIREPs, which could bring uncertainty into the turbulence information [4,5,6]. Hence, the exploration of high-precision turbulence identification models is of considerable importance.

The EDR index is a metric recommended by the International Civil Aviation Organisation (ICAO) for determining the degree of aviation turbulence [7]. The identification of turbulence using EDR algorithms relies on a model driven by conventional intuition and hypotheses [8,9]. The uncertainty introduced by these hypotheses may impact the accuracy of identification [10]. In a turbulence model symposium of Aeronautics held in 2017, rather than emphasizing turbulence or descriptive modeling, in [8], Duraisamy et al. focused on turbulence modeling in a predictive context for complex problems. Additionally, it is suggested that hidden relationships can be exacted through data-driven models to generate turbulence models that are unaffected by human intuition. Data-driven approaches can yield useful turbulence identification models [10]. Current methods for identifying turbulence are based on hypotheses about how turbulence causes are represented in specific datasets (such as Doppler radar data), and they are essentially physical in nature [11,12]. It could be hampered by sparse or inaccurate prediction datasets, poor temporal and spatial resolution, a dearth of in situ reporting, etc. [13]. The purpose of the QAR, an aerial flight data recorder, is to facilitate rapid and simple access to unprocessed flight data [14]; with its completeness, high-precision, real-time, and fine-grained features, it offers environmental parameters, equipment parameters, and operating parameters of the aircraft during the flight and serves as a solid data basis for turbulence-related research [1]. Thus, there is a chance to advance the state of the art in turbulence identification by using in-flight data [13].

Modern machine learning algorithms have become highly efficient at identifying patterns in flight data. Routine flights usually gather a lot of data that can be utilized for many different purposes [13]. Since 2002, the aviation meteorological data relay (AMDAR) has included turbulence monitoring data, including EDR, according to the World Meteorological Organisation (WMO) [15]. Currently, the majority of domestic and international efforts are focused on EDR estimation. For example, in [1], Huang et al. attempted to identify EDR using QAR data recorded from flights. In [2], Kim et al. propose QAR data can be used as an additional source for measuring EDR. However, although the current EDR estimation algorithms exhibit overall consistency, potential estimation biases may arise due to disparities in the sources of flight datasets [16]. Therefore, we speculate whether it is feasible to bypass the EDR estimation and leverage machine learning algorithms to learn the patterns inducing turbulence from QAR data.

In recent years, this conjecture has been validated by several machine learning algorithms, applied to the detection and prediction of turbulence in flight data [13,17,18]. In [17], Wu et al. proposed utilizing QAR data to establish a random forest model for detecting flight turbulence, which demonstrated higher accuracy in a balanced sample state. However, the model tended to exhibit lower recognition accuracy when faced with imbalanced positive and negative samples. In addition to the instability of the model’s performance, the authors did not conduct feature selection on the QAR data. This is a highly serious issue, as the success of data-driven modeling is highly dependent on the feature selection during the data process [8]. In [18], Mizuno et al. proposed a machine learning approach utilizing Support Vector Classification (SVC) for predicting flight turbulence. The authors employed feature selection methods to reduce the dimensionality of the original data, enhancing the efficiency of data analysis. However, SVC’s computational accuracy remained to be improved, as it did not address a method for optimizing SVC kernel parameters. In response to this issue, many researchers have opted for swarm-intelligence-based methods to optimize the parameters of machine learning models [19,20,21,22,23,24,25,26,27,28,29,30,31,32].

This paper uses a QAR data-driven optimization approach based on data mining by utilizing machine learning algorithms to train a turbulence identification model around EDR. In this article, using the QAR data prior to training, we apply an RFE method to mine valuable features for turbulence identification in QAR data and utilize these features for machine learning. Subsequently, in order to establish a robust turbulence identification model, we opt for the WOA with rapid convergence to optimize parameters of the highly stable CatBoost model with a QAR data-driven approach. In the experimental part, QAR data with valuable subset of features are used to train the model, and then the trained model is used to process QAR data from other flights to verify its general applicability.

2. Related Work

The list of acronyms used in this article can be viewed in Abbreviations.

2.1. CatBoost Classifier

CatBoost is a Gradient Boosting Decision Tree (GBDT) algorithm specifically designed for handling classification problems. It employs unique principles to enhance the model’s performance [33,34,35,36,37,38,39,40,41,42]. When dealing with numerical feature data. The workflow of the CatBoost classifier algorithm is as follows:

Initialize model parameters: initially, the parameters of the CatBoost classifier need to be set;
Build initial tree: CatBoost employs a GBDT as the fundamental classifier. During the training process, an initial decision tree is constructed as the base model;
Iterative optimization: Through iterative optimization, the initial base model is progressively improved. At each iteration, the CatBoost algorithm calculates residuals (i.e., the differences between predicted values and actual values) and then constructs a new tree to reduce these residuals;
Feature scaling: at each iteration, CatBoost scales the features based on their distribution to enhance the stability and generalization ability of the model;
Ensemble of trees: by integrating the predictions of multiple tree models, during the training process, each new tree model is integrated with the previous tree models, as shown in Figure 1;
Early stopping strategy: it monitors the performance of the validation set during training and stops training early when the model’s performance no longer improves, thus avoiding overfitting;
Model evaluation: after training, the trained CatBoost model can be evaluated using test data to understand its performance on unseen data;
Model application: the trained CatBoost model can be applied to practical classification tasks to predict class labels for unknown data.

Kernel Parameters

In machine learning models, the number of “iterations” has been shown to be one of the most critical hyperparameters directly affecting model performance. When the value of this hyperparameter is very large or very small, it leads to overfitting or underfitting, respectively [43]. To mitigate this issue, optimization of the “early stopping” parameter is employed, Early stopping is a technique for mitigating overfitting, whereby the training process halts when the model fails to improve over successive iterations, aiding in preventing overfitting of the data and enhancing the robustness of the model, thereby improving its performance on unseen data.

Additionally, in the CatBoost model, the number and depth of trees (base model) directly influence classification accuracy [41,42,44]. Increasing the number and depth of trees typically increases the model’s complexity, enabling it to capture complex relationships and patterns in the data more effectively. This may result in better performance on training data but a decrease in classification performance on unknown data. Therefore, when using CatBoost (Version: 1.2), it is necessary to optimize the number and depth of trees to find the optimal balance.

2.2. Whale Optimization Algorithm

The WOA exhibits rapid convergence and effectiveness in solving complex optimization problems [45,46]. The detailed description of the WOA involves the assumption X_i that represents the position of an individual whale within the population, and X* denotes the optimal solution sought. Given that the optimal positions in the search space are unknown a priori, the WOA algorithm posits that the current best candidate solution approximates the optimal solution. Upon defining the best individual, the remaining individuals endeavor to update their positions toward the identified best individual. The following equation mathematically expresses this behavior:

\vec{D} = |\vec{C} \cdot {\vec{X}}^{*} (t) - \vec{X} (t)|

(1)

\vec{X} (t + 1) = {\vec{X}}^{*} (t) - \vec{A} \cdot \vec{D}

(2)

Here, t represents the iteration count,

\vec{A}

and

\vec{C}

denote coefficient vectors, X* is the position vector corresponding to the obtained optimal solution so far,

\vec{X}

is the position vector, || signifies the absolute value, and

\cdot

represents an element-wise multiplication. If positions with better fitness values are identified, the position of X* should be updated in each iteration.

The vectors

\vec{A}

and

\vec{C}

are calculated according to the following expression:

\vec{A} = 2 \vec{a} \cdot \vec{r} - \vec{a}

(3)

\vec{C} = 2 \cdot \vec{r}

(4)

The parameter

\vec{a}

linearly decreases from two to zero during the iteration process (in the exploration phase), and

r

is a random vector in the range [0, 1]. Figure 2a illustrates the fundamental principles of the equation. The position of an individual whale (X, Y) can be updated based on the current best position (X*, Y*). By adjusting the values of vectors

\vec{A}

and

\vec{C}

, the position of the best individual can be updated relative to the current position. The possible updated positions of individual whales in the search space are depicted in Figure 2b.

Through the definition of a random vector

\vec{r}

, arbitrary positions can be attained within the search space between the critical points illustrated in Figure 2. Thus, Equation (2) allows any whale individual to update its position near the current optimal solution, thereby simulating the encircling of prey.

To mathematically model the bubble-netting behavior of whales, a cyclic contraction of encircling circles and a spiral updating of movement positions are used. Consequently, mathematical modeling is performed separately for each of these behaviors. The cyclic contraction of encircling circles is realized by incrementing the value of

\vec{a}

in Equation (3).

\vec{A}

is a random value within the interval [−a, a], where

\vec{a}

linearly decreases from two to zero during the iteration process. Setting the random value

\vec{A}

within [−1, 1], the new position of an individual whale can be defined at any location between its original position and the current best individual’s position. Figure 3 illustrates the potential positions from (X, Y) to (X*, Y*), showcasing the possible locations in the 2D plane space where 0 ≤ A ≤ 1 is manifested.

Next is the spiral updating of movement positions, as illustrated in Figure 4. Initially, it is necessary to compute the distance between the whale at position (X, Y) and the small fish (best position) located at (X*, Y*). Subsequently, a spiral equation is established to simulate the spiral motion of the whale between its position and the prey. The equation is as follows:

\vec{X} (t + 1) = {\vec{D}}^{'} \cdot e^{b l} \cdot c o s (2 π l) + {\vec{X}}^{*} (t)

(5)

The equation

{\vec{D}}^{'} = |{\vec{X}}^{*} (t) - \vec{X} (t)|

represents the distance between the individual whale and the small fish (the current best solution obtained so far),

b

is a constant defining the logarithmic spiral shape,

l

is a random number in the range [−1, 1], and

\cdot

denotes an element-wise multiplication.

To prevent the concurrent manifestation of these two behaviors, it is postulated that a selection between contracting encircling circles and updating positions in a spiral manner occurs with a 50% probability during optimization. The mathematical equation is expressed as follows:

\vec{X} (t + 1) = \{\begin{matrix} {\vec{X}}^{*} (t) - \vec{A} \cdot \vec{D}, & i f p < 0.5 \\ {\vec{D}}^{'} \cdot e^{b l} \cdot c o s (2 π l) + {\vec{X}}^{*} (t), & i f p \geq 0.5 \end{matrix}

(6)

where

p

is a random number in the range [0, 1].

Leveraging the variations in A is employed for prey search, prompting whales to randomly search based on their mutual positions. Consequently, the search mechanism for selecting whale individuals is determined using the random value A. Additionally, the positions of whale individuals in the current stage are updated based on the whale individual randomly chosen. This corresponds to the scenario where A > 1, which facilitates the execution of a global search by the WOA. The mathematical equation is articulated as follows:

\vec{D} = |\vec{C} \cdot \vec{X_{r a n d}} - \vec{X}|

(7)

\vec{X} (t + 1) = \vec{X_{r a n d}} - \vec{A} \cdot \vec{D}

(8)

X denotes the randomly selected position vector (random whale) from the current population.

At each iteration, whale individuals update their positions either based on the randomly selected whale individual or the position of the current best whale individual. When

|\vec{A}| > 1

, the position is updated using the randomly chosen whale individual, whereas when

|\vec{A}| < 1

, the position is updated using the position of the current best whale individual. Furthermore, based on the value of

p

, the WOA algorithm can transition between movement and contraction behaviors. Ultimately, the WOA algorithm concludes the iteration process upon satisfying the termination criteria.

3. Method

3.1. WOA-CatBoost for Turbulence Classification

3.1.1. Fitness Definition

As previously mentioned, inappropriate iteration counts can lead to both overfitting and underfitting of the model. To mitigate this issue, the WOA was employed to optimize the “early_stopping_rounds” parameter of the CaBoost model. Additionally, the depth and number of trees (base model) can also directly influence the accuracy of classification. Therefore, it was necessary to optimize the kernel parameters of CatBoost: “depth” and “iterations”. To realize the interaction of the WOA and the CatBoost model, the next step involved defining the fitness, which measures the accuracy of WOA-CatBoost identifications at each iteration, The fitness is defined as the prediction accuracy of the CatBoost model, that is:

F i t n e s s = a c c u r a c y

.

The accuracy is contingent upon the prediction precision of the CatBoost model. The CatBoost model’s categorization accuracy increases with an increased fitness value.

3.1.2. The Procedures of the WOA-CatBoost Model

This section provides a detailed description of the WOA-CatBoost model. For simplicity, the basic parameters are listed in Table 1. The optimization steps for WOA-CatBoost are outlined as follows:

Step 1. Data preparation: Regarding the dataset, a subset of features extracted from the QAR data from Sanya to Jinan was used as the training set. Since the characteristics and structure of QAR data lead to low data independence, there was no need to randomly divide it. Subsequently, the feature subset of QAR data from Chongqing to Jinan was used as a test set to determine the optimal parameters. The training set was utilized to build the foundational CatBoost model, while the test set was employed to assess the classification accuracy of the CatBoost model.

Step 2. WOA parameter setting and initialization: Define the WOA parameters, comprising the whale population size and maximum iteration count, specify the parameter ranges for adjustment, assign an initial position to each whale, and generate the initial population.

Step 3. We set the iteration: t = t + 1.

Step 4. CatBoost model training and prediction: “early_stopping_rounds”, “depth”, “iterations” are used to calculate the fitness using the WOA. The foundational CatBoost model is established on the training set and then used to make predictions on the test set. Then, the values of prediction accuracy are obtained.

Step 5. Evaluate fitness: After training the CatBoost model on the training set and predicting on the test set, return the fitness as fitness = accuracy, which is the global fitness of whales in the WOA. If the current fitness value exceeds the previous one, the current fitness value is the optimal solution.

Step 6. Update individual positions: All the whales move according to the fitness value and update their positions following Formulas (2), (5) and (8). Each whale transitions to a new location, generating a new parameter at that position.

Step 7. Parameter optimization results checking: The new parameter is utilized for training the CatBoost model and then used to determine the prediction accuracy of the model. Return the new fitness value. If the new fitness value is higher than the previous one, replace the fitness value with the new one and keep the corresponding parameter. Otherwise, continue searching for a higher fitness value using the previous one until the condition of Step 8 is met.

Step 8. Termination condition check: If the current iteration count is less than the maximum iteration count, return to Step 3; otherwise, proceed to the next step.

Step 9. Save the trained model. The workflow diagram of the WOA-CatBoost model is depicted in Figure 5.

4. Experimental Framework

4.1. Data Description

The experimental data were entirely derived from the actual flight data of a certain aviation corporation. This article utilized the original training data from QAR data of flights from Sanya to Jinan on 22 January 2023, while the testing data were obtained from QAR data of flights from Chongqing to Jinan on 1 February 2023. As reported by pilots, turbulence was encountered during both flight phases. The training dataset’s QAR data comprised 11,140 records. After the preprocessing method described in the subsequent sections, which involved a subset of features after feature extraction comprising 23 parameters, 11,100 time-series data and 23 feature parameters remained for the training, as shown in Table 2. Subsequently, a linear interpolation method was employed to align the obtained EDR values with corresponding QAR records.

Each row of time-series data was then classified into two labels based on EDR values, indicating the presence or absence of turbulence. If turbulence was encountered during the aircraft’s movement, it was labeled as “1”; otherwise, it was labeled as “0”. Subsequently, flight segments where turbulence occurred in the flight data of two flight routes were plotted on a map of China. Figure 6 illustrates the airline region map of turbulence occurrences during the flight process. The air orbit where turbulence occurred during the two flight segments are marked on the map, with blue segments denoting turbulence-free air travel and red segments indicating turbulence encountered by the aircraft during the air travel.

4.2. Data Preprocessing

4.2.1. Feature Selection Using RFE Algorithm

Feature selection is used to select small and informative feature subsets [47]. On-board QAR data encompass various types of feature parameter data, exhibiting a mixed data type structure that includes both continuous numerical and discrete numerical variables. Before applying the CatBoost classifier to the QAR data, feature selection needs to be performed. Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems [48].

Among the numerous features in QAR data, some contribute significantly to classification, while others may be redundant. If redundant features are present in the dataset and used for model training, it can lead to longer computation times and reduce classification accuracy. Currently, feature selection primarily employs three methods: wrapper methods, embedded methods, and filter methods. Wrapper methods, particularly the RFE algorithm, demonstrate excellent performance in feature selection [49,50], when compared to the other two methods.

RFE is an advanced method that obtains the weights of each feature and progressively eliminates features with lower contributions through multiple iterations, thereby completing the feature selection process. After standardizing the QAR data, the specific implementation steps of using RFE to find the best subset of features for the CatBoost model were as follows:

Input all features into the CatBoost model and obtain the model’s performance evaluation metric (accuracy).
Based on the weights or importance of features, selectively remove the feature with the lowest performance evaluation metric ranking (the feature with the minimum weight) from the feature set, resulting in a new feature set.
Retrain the model and compute the model’s performance evaluation metric.
Repeat steps 2 and 3, removing one feature at a time, until the number of features reaches the predetermined value, or it is no longer possible to remove features.

During this process, we can record the performance evaluation metric after each feature selection iteration to choose the optimal feature set. In the end, we selected a feature subset containing 23 features, as depicted in Figure 7, which illustrates the pie-chart proportion of feature weights. Based on the findings from the pie chart, certain identification feature parameters unrelated to turbulence, such as departure and destination, were excluded as they did not contribute significantly to the relevance of our identification results.

4.2.2. EDR Data Interpolation

This paper used the EDR values identified for turbulence classification as a reference. To establish the training dataset, we considered data points with EDR values equal to or greater than 0.1, indicative of turbulence. The dataset was initially classified by representing turbulence (“1”) and no turbulence (“0”). Additionally, since EDR values were calculated based on a series of continuous QAR data at discrete time points, they did not directly correspond to QAR data. To address this issue, linear interpolation was employed to expand the EDR dataset. Linear interpolation involves numerically filling values along a straight-line direction between two adjacent data points. This process connects two neighboring data points with a straight line and interpolates values at unknown points along that line. By assuming a linear connection between data points, linear interpolation provides a straightforward approach for interpolating values in the known interval.

Mathematically, linear interpolation can be expressed as follows. Given two points (x₀, y₀) and (x₁, y₁), if x₀ < x < x₁, the interpolated value y lies on the line connecting the two points and can be calculated using the following equation:

y = y_{0} + (x - x_{0}) (\frac{y_{1} - y_{0}}{x_{1} - x_{0}})

(9)

In this equation, y represents the value to be interpolated, x corresponds to the desired input value, and subscripts 0 and 1 refer to the known data points. By substituting the values of x and the known points, the interpolated value within the desired range can be determined. The data after linear interpolation are shown in Figure 8.

4.3. Model Training

To enhance the performance of the CatBoost classification, we introduced the WOA to boost the performance of the CatBoost classifier. The training process of this model is illustrated in Figure 9. Before model training, the critical parameters of the WOA were initialized as follows: the population size was set to 10, and the maximum number of iterations was set to 150. When initializing the population, the learning rate was from 0.0001 to 0.01, the tree depth ranged from 1 to 10, the quantity of tree ranged from 7.500 to 9.000, and early_stopping_rounds ranged from 5 to 20. Before applying the WOA, we defined the prediction accuracy of the evaluation model as the criterion for fitness. The parameter values were set within the optimal range, enabling each iteration of whale individuals to search for optimal parameter values. Subsequently, we evaluated the fitness, using the accuracy returned by the evaluation model in each round as the individual’s fitness. If the new fitness was better than the previous fitness, it replaced the previous value. If the new fitness was worse, multiple iterations were performed until the optimal solution corresponding to the optimal fitness was found, just like the optimal values for each parameter. This process aimed to find the global optimal solution through multiple iterations to enhance model performance.

5. Results

5.1. Evaluation Indicators

This article introduced a set of evaluation metrics to assess the performance of the experimental results, namely, accuracy, precision, recall, and F1 score. The specific formulas associated with these evaluation metrics are as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(10)

P r e c i s i o n = \frac{T P}{T P + F P}

(11)

R e c a l l = \frac{T P}{T P + F N}

(12)

F 1 = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(13)

The abbreviation “TP” stands for instances correctly predicted as aviation turbulence by the classifier when they are indeed turbulent. “FP” represents instances incorrectly predicted as aviation turbulence by the classifier when they are non-turbulent, indicating the false positive count. “TN” indicates instances correctly identified as non-turbulent by the classifier when they are indeed non-turbulent. “FN” denotes instances incorrectly predicted as non-turbulent by the classifier when they are turbulent, indicating the false negative count. Another evaluation metric is the Receiver Operating Characteristic (ROC), utilizing True Positive Rate (TPR) and False Positive Rate (FPR) as key indicators. It measures the performance of classification accuracy and plots the False Positive Rate (FPR) on the Y-axis and True Positive Rate (TPR) on the X-axis. The TPR represents the proportion of actual positive instances correctly classified, while the FPR indicates the proportion of false positive instances incorrectly predicted as positive. The corresponding formulas for these evaluation metrics are as follows:

T P R = \frac{T P}{T P + F N}

(14)

F P R = \frac{T N}{F P + T N}

(15)

The ROC curve of an ideal machine learning model would overlap with the Y-axis, although achieving this scenario is unrealistic for any model. The Area Under the Curve (AUC) is the region enclosed by the ROC curve and the X-axis. It is consistently bounded between zero and one, a visual metric to evaluate the classifier’s performance. A higher AUC indicates stronger classification performance for the machine learning model.

5.2. Fitness Value Curve

According to the fitness curve depicted in Figure 10, the results indicated that as the number of iterations increased from 0 to 150, the fitness values gradually decreased, stabilizing after reaching a certain number of iterations. After 50 iterations, it achieved better stability and convergence, indicating that the WOA continuously searched for optimal values of parameters such as learning_rate, depth, iterations, early_stopping_rounds during the iteration process. Subsequently, using these optimal parameters, the CatBoost model was trained, and the performance of WOA-CatBoost was compared with other machine learning models.

5.3. Comparison Experiments

When evaluating the performance of the WOA-CatBoost model, we conducted comparisons with other machine learning models, including CatBoost, Extra Trees, random forest, Logistic Regression, and SVM, all utilizing default parameters. For the WOA-CatBoost model, we experimented with setting the whale population size in the range from 0 to 50 at intervals of 10. Multiple experimental results indicated that the whale population size had a negligible impact on the model’s classification accuracy, as the whales consistently moved toward the optimal position, leading the entire population to converge to the best location. However, increasing the number of whales significantly extended the program’s runtime. Consequently, we set the whale population size to 10 in subsequent experiments. By training each model on the training set and making classification predictions on the test set, we obtained evaluation metrics such as accuracy, precision, recall, and F1. The results of the comparison related to these four metrics are depicted in Figure 11a. The WOA-CatBoost algorithm demonstrated the highest accuracy, indicated by the red bar, providing intuitive evidence that the WOA-CatBoost model outperforms other classifiers in terms of identification performance.

Secondly, the QAR data of a specific route were used to identify the turbulence occurring on the route from Sanya to Jinan by training the model as shown in Figure 11b, and the corresponding turbulence segments were identified by using label 1 to represent the time segment where turbulence occurred, and label 0 to represent that no turbulence occurred. The trained model was then used on the test dataset (Chongqing to Jinan route) and each time segment where turbulence occurred was labelled as shown in Figure 11c.

Evaluation metrics values are listed in Table 3. WOA-CatBoost achieved the highest accuracy of 0.96, precision of 0.96, recall of 0.90, and F1 of 0.93. These results demonstrated the superior performance of WOA-CatBoost compared to CatBoost, Extra Trees, random forest, Logistic Regression, and SVM. Simultaneously, the correctly classified values (TP + TN) and misclassified values (FP + FN) of WOA-CatBoost indicated that the proposed model outperformed the other five models.

Additionally, based on the confusion matrix (a) and (c) in Figure 12, the overall accuracy for the training and testing data was 99% and 96%, respectively. The WOA-CatBoost classifier generated a 90% TPR and 10% FPR for the testing dataset. These results effectively validated the proficiency of the hybrid model based on WOA-CatBoost in turbulence identification and classification.

To further compare and assess the stability and robustness of the evaluation models, the QAR dataset was subjected to additional algorithms (CatBoost, Extra Trees, random forest, Logistic Regression, SVM), and a corresponding fusion ROC curve was applied to CatBoost for comparison. From Figure 12b, it is observed that these popular algorithms exhibited good performance in the QAR data classification task on the training dataset. However, in Figure 12d, significant differences are apparent in the testing dataset, indicating poorer performance for these popular algorithms. The Logistic Regression classifier had the minimum AUC value of 0.68, while other algorithms surpassed it. WOA-CatBoost achieved the highest AUC value of 0.98, suggesting that the proposed WOA-CatBoost outperformed other classifiers. On the other hand, other machine learning algorithms exhibited better performance on the training set but poorer performance on the testing set, indicating the weaker robustness of these popular algorithms. In contrast, the CatBoost model demonstrated superior overall performance on the training and testing sets. This indicated that CatBoost itself was a stable model. However, the combination with WOA further improved the recognition accuracy of CatBoost and enhanced its robustness. We determined the optimal hyperparameters for CatBoost through experimentation.

5.4. Application

In this study, our objective was to establish a more robust turbulence identification model. To further evaluate the effectiveness and applicability of the model, we applied the proposed turbulence model to a new flight route to identify turbulence during flight: the QAR data from Jinan to Harbin, dated 10 July 2023.

This flight route was entirely different from the two previously selected routes. Consistent with the previous data processing methods, an identical number of feature parameters composed the feature subset, as shown in Figure 13a. The proposed turbulence identification model achieved an accuracy of 98%, remaining superior to other algorithmic models, and further enhancing the identification accuracy of the stable CatBoost model, as shown in Figure 13b. The number of correctly labeled turbulence instances by the model, i.e., the true positives and true negatives were 7727, The generated TPR was 92% and the FPR was 0.03%, as shown in Figure 13c. The proposed model could identify most of the turbulence occurrence times when EDR was greater than 0.1, as shown in Figure 13d. Our approach further confirmed its effectiveness and general applicability in accurately identifying turbulence between different flight routes on new data. This suggests that the proposed model may be generally applicable to all flights, comparable to the EDR method in characterizing turbulence.

6. Discussion and Conclusions

This article introduced a data-driven method for identifying turbulence rather than relying on the EDR estimating method. By combining methodologies from swarm intelligence and machine learning, we optimized the kernel parameters of CatBoost using the WOA. Through searching an optimal balance in parameter values, we effectively alleviated data overfitting, thereby enhancing the robustness and accuracy of the model in identification. The model’s applicability was confirmed by validation on test data and new data, demonstrating its superior robustness and identification accuracy compared to other machine techniques. Given the widespread use of QAR in airplanes, the proposed approach can be used to determine if an aircraft is encountering turbulence during or after a flight, instead of relying on the estimation of EDR. Although the proposed method demonstrated high accuracy in turbulence identification, there are still aspects that require improvement. Due to limitations in the dataset, it is currently only possible to identify the presence or absence of turbulence along flight routes, without determining the severity level, such as weak, moderate, and severe. Therefore, in the future, we will gather additional datasets with characteristics of turbulence severity to enhance the effectiveness of supervised model training for multi-class classification, thereby further enabling the identification of turbulence severity level.

Author Contributions

Conceptualization, Z.Z. and H.L.; data curation, H.L. and J.S.; formal analysis, H.L.; funding acquisition, Z.Z.; investigation, Z.Z.; methodology, H.L.; project administration, Z.Z. and H.T.; resources, Z.Z.; software, H.L.; supervision, P.-W.C.; validation, Z.Z., J.S., P.-W.C. and H.T.; visualization, H.L.; writing—original draft, Z.Z., H.L. and P.-W.C.; writing—review and editing, Z.Z., J.S. and P.-W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Tianjin Municipality, China (No. 21JCYBJC00740), Meteorological Soft Science Project (No. 2023ZZXM29), and Jiangsu Provincial Key Research and Development Program (No. BE2021685).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The QAR flight data used in this study belong to the airlines and may only be made available by the authors on request, with restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Acronym	Full Name
WOA	Whale optimization algorithm
CatBoost	Categorical Boosting
QAR	Quick Access Record
EDR	Eddy Dissipation Rate
RFE	Recursive Feature Elimination algorithm
GBDT	Gradient Boosting Decision Tree
ROC	Receiver Operating Characteristic
TPR	True Positive Rate
FPR	False Positive Rate
AUC	Area Under the Curve
SVM	Support Vector Machine

References

Huang, R.; Sun, H.; Wu, C.; Wang, C.; Lu, B. Estimating Eddy Dissipation Rate with QAR Flight Big Data. Appl. Sci. 2019, 9, 5192. [Google Scholar] [CrossRef]
Kim, S.-H.; Kim, J.; Kim, J.-H.; Chun, H.-Y. Characteristics of the Derived Energy Dissipation Rate Using the 1-Hz Commercial Aircraft Quick Access Recorder (QAR) Data. Atmos. Meas. Tech. Discuss. 2021, 15, 2277–2298. [Google Scholar] [CrossRef]
Sharman, R.; Tebaldi, C.; Wiener, G.; Wolff, J. An Integrated Approach to Mid- and Upper-Level Turbulence Forecasting. Weather Forecast. 2006, 21, 268–287. [Google Scholar] [CrossRef]
Schwartz, B. The Quantitative Use of PIREPs in Developing Aviation Weather Guidance Products. Weather Forecast. 1996, 11, 372–384. [Google Scholar] [CrossRef]
Bass, E.J. Turbulence Assessment and Decision-Making on the Flight Deck and in the Cabin. Hum. Factors Aerosp. Saf. 2001, 1, 267–294. [Google Scholar]
Sharman, R.D.; Cornman, L.B.; Meymaris, G.; Pearson, J.; Farrar, T. Description and Derived Climatologies of Automated In Situ Eddy-Dissipation-Rate Reports of Atmospheric Turbulence. J. Appl. Meteorol. Climatol. 2014, 53, 1416–1432. [Google Scholar] [CrossRef]
ICAO. Annex3: Meteorological Service for International Air Navigation. 2010. Available online: https://www.icao.int/airnavigation/IMP/Documents/Annex%203%20-%2075.pdf (accessed on 17 April 2024).
Duraisamy, K.; Spalart, P.R.; Rumsey, C.L. Status, Emerging Ideas and Future Directions of Turbulence Modeling Research in Aeronautics. Turbulence Modeling Symposium, Ann Arbor, Michigan, American (July 2017). Available online: https://ntrs.nasa.gov/search?q=Status,%20Emerging%20Ideas%20and%20Future%20Directions%20of%20Turbulence%20Modeling%20Research%20in%20Aeronautics (accessed on 17 April 2024).
Zhuang, Z.; Lin, K.; Zhang, H.; Chan, P.W. Detection of Turbulence Anomalies Using a Symbolic Classifier Algorithm in Airborne Quick Access Record (QAR) Data Analysis. Adv. Atmos. Sci. 2024. [Google Scholar] [CrossRef]
Duraisamy, K.; Iaccarino, G.; Xiao, H. Turbulence Modeling in the Age of Data. Annu. Rev. Fluid Mech. 2019, 51, 357–377. [Google Scholar] [CrossRef]
Haverdings, H.; Chan, P.W. Quick Access Recorder Data Analysis Software for Windshear and Turbulence Studies. J. Aircr. 2010, 47, 1443–1447. [Google Scholar] [CrossRef]
Cotter, A.; Williams, J.; Goodrich, R.; Craig, J. A Random Forest Turbulence Prediction Algorithm. In Proceedings of the 5th AMS Conference on Artificial Intelligence Applications to Environmental Science, San Antonio, TX, USA, 14–18 January 2007. [Google Scholar]
Emara, M.; Santos, M.; Chartier, N.; Ackley, J.; Puranik, T.; Payan, A.; Kirby, M.; Pinon, O.; Mavris, D. Machine Learning Enabled Turbulence Prediction Using Flight Data For Safety Analysis. In Proceedings of the 32nd Congress of the International Council of the Aeronautical Sciences, Shanghai, China, 6–10 September 2021. [Google Scholar]
Sun, H.; Jiao, Y.; Han, J.; Wang, C. A Novel Temporal-Spatial Analysis System for QAR Big Data. In Proceedings of the 2017 IEEE 17th International Conference on Communication Technology (ICCT), Chengdu, China, 27–30 October 2017. [Google Scholar]
WMO. Aircraft Meteorological Data Relay (AM-DAR) Reference Manual. 2003. Available online: https://library.wmo.int/viewer/32136/download?file=wmo_958_en.pdf&type=pdf&navigator=1#1.1%20WHAT%20IS%20AMDAR? (accessed on 17 April 2024).
Lee, J.C.W.; Leung, C.Y.Y.; Kok, M.H.; Chan, P.W. A Comparison Study of EDR Estimates from the NLR and NCAR Algorithms. Atmosphere 2022, 13, 132. [Google Scholar] [CrossRef]
Wu, M.; Sun, H.; Wang, C.; Lu, B. Detecting and Analysing Spatial-Temporal Aggregation of Flight Turbulence with the QAR Big Data. In Proceedings of the 2018 26th International Conference on Geoinformatics, Kunming, China, 28–30 June 2018. [Google Scholar]
Mizuno, S.; Ohba, H.; Ito, K. Machine Learning-Based Turbulence-Risk Prediction Method for the Safe Operation of Aircrafts. J. Big Data 2021, 9, 29. [Google Scholar] [CrossRef]
Tuba, E.; Tuba, M.; Simian, D. Adjusted Bat Algorithm for Tuning of Support Vector Machine Parameters. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
Tharwat, A.; Gabel, T.; Hassanien, A.E. Parameter Optimization of Support Vector Machine Using Dragonfly Algorithm. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, Cairo, Egypt, 9–11 September 2017; Advances in Intelligent Systems and Computing. Springer: Cham, Switzerland, 2018; pp. 309–319. [Google Scholar]
Aljarah, I.; Al-Zoubi, A.M.; Faris, H.; Hassonah, M.A.; Mirjalili, S.; Saadeh, H. Simultaneous Feature Selection and Support Vector Machine Optimization Using the Grasshopper Optimization Algorithm. Cogn. Comput. 2018, 10, 478–495. [Google Scholar] [CrossRef]
Barman, M.; Dev Choudhury, N.B. A Similarity Based Hybrid GWO-SVM Method of Power System Load Forecasting for Regional Special Event Days in Anomalous Load Situations in Assam, India. Sustain. Cities Soc. 2020, 61, 102311. [Google Scholar] [CrossRef]
Huang, C.L.; Dun, J.F. A Distributed PSO–SVM Hybrid System with Feature Selection and Parameter Optimization. Appl. Soft Comput. 2008, 8, 1381–1391. [Google Scholar] [CrossRef]
Sarafrazi, S.; Nezamabadi-Pour, H. Facing the Classification of Binary Problems with a GSA-SVM Hybrid System. Math. Comput. Model. 2013, 57, 270–278. [Google Scholar] [CrossRef]
Yang, D.; Liu, Y.; Li, S.; Li, X.; Ma, L. Gear Fault Diagnosis Based on Support Vector Machine Optimized by Artificial Bee Colony Algorithm. Mech. Mach. Theory 2015, 90, 219–229. [Google Scholar] [CrossRef]
Li, C.; An, X.; Li, R. A Chaos Embedded GSA-SVM Hybrid System for Classification. Neural Comput. Appl. 2015, 26, 713–721. [Google Scholar] [CrossRef]
Dong, Z.; Zheng, J.; Huang, S.; Pan, H.; Liu, Q. Time-Shift Multi-Scale Weighted Permutation Entropy and GWO-SVM Based Fault Diagnosis Approach for Rolling Bearing. Entropy 2019, 21, 621. [Google Scholar] [CrossRef] [PubMed]
Kose, U. A Hybrid SVM-WOA Approach for Intelligent Fault Diagnosis Applications. In Proceedings of the 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), Izmir, Turkey, 31 October–2 November 2019. [Google Scholar]
Kong, D.; Chen, Y.; Li, N.; Duan, C.; Lu, L.; Chen, D. Tool Wear Estimation in End Milling of Titanium Alloy Using NPE and a Novel WOA-SVM Model. IEEE Trans. Instrum. Meas. 2020, 69, 5219–5232. [Google Scholar] [CrossRef]
Zhang, F.; Fleyeh, H.; Bales, C. A Hybrid Model Based on Bidirectional Long Short-Term Memory Neural Network and Catboost for Short-Term Electricity Spot Price Forecasting. J. Oper. Res. Soc. 2022, 73, 301–325. [Google Scholar] [CrossRef]
Lan, C.; Song, B.; Zhang, L.; Fu, L.; Guo, X.; Sun, C. State Prediction of Hydro-Turbine Based on WOA-RF-Adaboost. Energy Rep. 2022, 8, 13129–13137. [Google Scholar] [CrossRef]
Luo, J.; Gong, Y. Air Pollutant Prediction Based on ARIMA-WOA-LSTM Model. Atmos. Pollut. Res. 2023, 14, 101761. [Google Scholar] [CrossRef]
Jabeur, S.B.; Gharib, C.; Mefteh-Wali, S.; Arfi, W.B. CatBoost Model and Artificial Intelligence Techniques for Corporate Failure Prediction. Technol. Forecast. Soc. Change 2021, 166, 120658. [Google Scholar] [CrossRef]
Li, H.X. Research on Credit Risk of P2P Lending Based on CatBoost Algorithm. Finance 2019, 9, 137–141. [Google Scholar] [CrossRef]
Ibrahim, A.A.; Ridwan, R.L.; Muhammed, M.M.; Abdulaziz, R.O.; Saheed, G.A. Comparison of the CatBoost Classifier with Other Machine Learning Methods. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 738–748. [Google Scholar] [CrossRef]
Izotova, A.; Valiullin, A. Comparison of Poisson Process and Machine Learning Algorithms Approach for Credit Card Fraud Detection. Procedia Comput. Sci. 2021, 186, 721–726. [Google Scholar] [CrossRef]
Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of CatBoost Method for Prediction of Reference Evapotranspiration in Humid Regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
Daoud, E. Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset. Int. J. Comput. Inf. Eng. 2019, 13, 6–10. [Google Scholar]
Postnikov, E.B.; Esmedljaeva, D.A.; Lavrova, A.I. A CatBoost Machine Learning for Prognosis of Pathogen’s Drug Resistance in Pulmonary Tuberculosis. In Proceedings of the 2020 IEEE 2nd Global Conference on Life Sciences and Technologies (LifeTech), Kyoto, Japan, 10–12 March 2020. [Google Scholar]
Kang, Y.; Jang, E.; Im, J.; Kwon, C.; Kim, S. Developing a New Hourly Forest Fire Risk Index Based on Catboost in South Korea. Appl. Sci. 2020, 10, 8213. [Google Scholar] [CrossRef]
Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for Big Data: An Interdisciplinary Review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. Arxiv Learn. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
Jani, D.; Varadarajan, V.; Parmar, R.; Bohara, M.H.; Garg, D.; Ganatra, A.; Kotecha, K. An Efficient Gait Abnormality Detection Method Based on Classification. J. Sens. Actuator Netw. 2022, 11, 31. [Google Scholar] [CrossRef]
CatBoost.ai. Available online: https://catboost.ai/en/docs/concepts/python-reference_catboostclassifier (accessed on 17 April 2024).
Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Li, Y.L.; Wang, S.Q.; Chen, Q.R. Comparative study of several new swarm intelligence optimization algorithms. Comput. Eng. Appl. 2020, 179, 685–695. [Google Scholar]
Nguyen, B.H.; Xue, B.; Zhang, M. A Survey on Swarm Intelligence Approaches to Feature Selection in Data Mining. Swarm Evol. Comput. 2020, 54, 100663. [Google Scholar] [CrossRef]
Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection: A Data Perspective. ACM Comput. Surv. 2018, 50, 1–45. [Google Scholar] [CrossRef]
Yang, R.; Wang, P.; Qi, J. A Novel SSA-CatBoost Machine Learning Model for Credit Rating. J. Intell. Fuzzy Syst. 2023, 44, 2269–2284. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.F.; Luque-Espinar, J.A.; Chica-Olmo, M.; Mendes, M.P. Feature Selection Approaches for Predictive Modelling of Groundwater Nitrate Pollution: An Evaluation of Filters, Embedded and Wrapper Methods. Sci. Total Environ. 2018, 624, 661–672. [Google Scholar] [CrossRef]

Figure 1. The workflow of the CatBoost classifier algorithm.

Figure 2. Optimal solutions in 2D and 3D. (a) describes 2D position vectors and their possible next locations (X* is the best fitness obtained so far); (b) 3D position vectors and their possible next locations.

Figure 3. Shrinking encircling mechanism.

Figure 4. Behavior of whale spirals when updating the position.

Figure 5. Flowchart of the WOA-CatBoost model.

Figure 6. Map of airline turbulence intelligence areas.

Figure 7. Proportion of each feature.

Figure 8. Linear interpolation of EDR data.

Figure 9. Training process of turbulence identification model.

Figure 10. Fitness value curves for the WOA-CatBoost model.

Figure 11. Model performance comparison and turbulence identification on training and testing datasets. (a) Model evaluation metric comparison; (b) turbulence identification from Sanya to Jinan; (c) turbulence identification from Chongqing to Jinan.

Figure 12. Confusion matrix and ROC curve for the training and testing datasets. (a) Confusion matrix for the training dataset; (b) ROC curves for various algorithms on the training data; (c) confusion matrix for the testing dataset; (d) ROC curves for various algorithms on the testing data.

Figure 13. Evaluation and process of turbulence identification by the model using QAR data from Jinan to Harbin. (a) Workflow of the identification process; (b) ROC curves of different algorithms for new data; (c) confusion matrix for new data; (d) identification results and EDR values for new data.

Table 1. Essential parameters of WOA-CatBoost.

Model	Parameters	Explanations
CatBoost WOA	early_stopping_rounds learning_rate depth iterations population_size iterations fitness	When no improvement is observed in a consecutive set number of iterations, the training process is terminated to prevent overfitting Regulating the learning progress of the model The length of the longest tree that a decision tree can generate Quantity of trees Whale population size Iteration count Individual fitness

Table 2. Parameters used for turbulence identification.

Parameters	Unit	Parameters	Unit
AOA1	deg	Pitch angle	deg
AOA2	deg	Roll angle	deg
Gross weight	kilogram	Wind direction	deg
Altitude	feet	Ground speed	knots
Latitude	deg	Instruction air speed	knots
Longitude	deg	Mach	/
Radio height	feet	True air speed	knots
Total air temp	°C	Wind speed	knots
Static air temp	°C	Vertical acceleration	G
Display heading	deg	Lateral acceleration	G
Drift angle	deg	Longitudinal acceleration	G
Vertical speed	feet/min	/	/

Table 3. Model performance comparison.

Model	Accuracy	Precision	Recall	F1	TP + TN	FP + FN
WOA-CatBoost	0.95644	0.95916	0.89886	0.92803	6718	306
CatBoost	0.86062	0.74675	0.83827	0.78987	6045	979
Extra Trees	0.77919	0.64098	0.66697	0.65372	5528	1496
Random forest	0.76224	0.59865	0.72574	0.65610	5354	1670
Logistic Regression	0.70857	0.52884	0.61822	0.57005	4977	2047
SVM	0.67298	0.48678	0.85558	0.62052	4727	2297

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhuang, Z.; Li, H.; Shao, J.; Chan, P.-W.; Tai, H. Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data. Appl. Sci. 2024, 14, 4419. https://doi.org/10.3390/app14114419

AMA Style

Zhuang Z, Li H, Shao J, Chan P-W, Tai H. Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data. Applied Sciences. 2024; 14(11):4419. https://doi.org/10.3390/app14114419

Chicago/Turabian Style

Zhuang, Zibo, Haosen Li, Jingyuan Shao, Pak-Wai Chan, and Hongda Tai. 2024. "Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data" Applied Sciences 14, no. 11: 4419. https://doi.org/10.3390/app14114419

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work

2.1. CatBoost Classifier

Kernel Parameters

2.2. Whale Optimization Algorithm

3. Method

3.1. WOA-CatBoost for Turbulence Classification

3.1.1. Fitness Definition

3.1.2. The Procedures of the WOA-CatBoost Model

4. Experimental Framework

4.1. Data Description

4.2. Data Preprocessing

4.2.1. Feature Selection Using RFE Algorithm

4.2.2. EDR Data Interpolation

4.3. Model Training

5. Results

5.1. Evaluation Indicators

5.2. Fitness Value Curve

5.3. Comparison Experiments

5.4. Application

6. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI