Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits

Andrés-Pérez, Esther

doi:10.3390/en13215807

Open AccessArticle

Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits

by

Esther Andrés-Pérez

Theoretical and Computational Aerodynamics Branch, Flight Physics Department, Spanish National Institute for Aerospace Technology (INTA), Ctra. Ajalvir, km. 4, 28850 Torrejón de Ardoz, Spain

Energies 2020, 13(21), 5807; https://doi.org/10.3390/en13215807

Submission received: 31 July 2020 / Revised: 28 September 2020 / Accepted: 15 October 2020 / Published: 6 November 2020

(This article belongs to the Special Issue Methods and Numerical Applications in Fluid Mechanics)

Download

Browse Figures

Versions Notes

Abstract

:

Machine learning and data mining techniques are nowadays being used in many business sectors to exploit the data in order to detect trends, discover certain features and patters, or even predict the future. However, in the field of aerodynamics, the application of these techniques is still in the initial stages. This paper focuses on exploring the benefits that machine learning and data mining techniques can offer to aerodynamicists in order to extract knowledge from the CFD data and to make quick predictions of aerodynamic coefficients. For this purpose, three aerodynamic databases (NACA0012 airfoil, RAE2822 airfoil and 3D DPW wing) have been used and results show that machine-learning and data-mining techniques have a huge potential also in this field.

Keywords:

machine learning; data mining; aerodynamic analysis; computational fluid dynamics; surrogate modeling; linear regression; support vector regression

1. Introduction

In the field of aerodynamics, complex steady flows are simulated by computational fluid dynamics (CFD) daily in the industry since CFD tools have already reached an acceptable level of maturity. These simulations are usually performed over full-aircraft configurations or several aircraft components where meshes of hundreds of million points are required in order to provide precise features of the flow. In addition, simulations are performed for different parameters to properly explore the design space. This implies a high computational cost that may be, in certain situations, even infeasible nowadays. To overcome this limitation, the CFD solver could be replaced by a surrogate model which produces a fast prediction of the aerodynamic features, based on previous simulations or wind-tunnel data. Machine learning techniques commonly used in the area of artificial intelligence (AI) and data mining (DM) can represent a valuable support to reduce the computational cost required for aerodynamic analysis.

The objective of this paper is to research in the application of machine learning and data-driven approaches for aerodynamic analysis. While these techniques have been broadly used in other sectors such as finances or risk analysis, the application in the aeronautical sector is still in its infancy. The novelty of this paper is to research on the feasibility and potential benefits of applying these techniques for aerodynamic analysis of aeronautical configurations. Application test cases have been selected amongst those commonly used in the literature for validation purpose, in order to be able to quickly generate the required databases for testing the methods, and to provide comparable results. For the abovementioned purpose, this paper covers all the required aspects in any machine learning project, such as data analysis, feature scaling, model construction, and accuracy measurement.

The main motivation for this research is to analyze the potential of data mining and machine learning techniques for a fast aerodynamic features prediction based on previous and existing aerodynamic data. These techniques may have a huge impact in the aerodynamic design process for achieving novel aeronautical configurations, since they would allow the evaluation of different promising shapes in a quick manner, therefore reducing the associated cost of the overall design stage.

The paper is structured as follows: Section 2 presents a review of the state-of-the-art methods in the technical fields involved in this research, focusing on machine learning and data-driven approaches for aerodynamic analysis. Section 3 describes the followed methodology for data analysis and model construction and validation, together with the numerical results and finally, Section 4 presents the conclusions. As annexed at the end of this paper, the complete databases information is provided to allow other researchers to further exploit the data with other techniques.

2. Brief Review of the State of the Art

This section will review the state of the art in the technical fields involved in this research, namely machine learning and data-driven approaches for aerodynamic analysis.

In the five last years, there has been an increasing interest in the development of techniques to handle aerodynamic data, coming from different sources, such as CFD simulations, wind tunnel experiments or even flight test data. The ability of handling this vast amount of data of a heterogeneous nature is a crucial factor in order to enable machine learning methods to be applied in the aeronautic industry.

In the following Table 1, the most recent state-of-the-art studies in the scientific literature are reviewed:

In summary, from the last 5 years, it can be observed that machine learning techniques have been used to predict the aerodynamic features, to accelerate or improve the precision of turbulence models, to speed-up the shape design optimization process and to quantify manage uncertainties in the flow fields, amongst others.

All the papers found, however, focus more on the application, and do not provide an overall view of all the requires steps and requirements to properly handle and prepare the data for machine learning techniques. This paper aims to fill in this gap and provides, through simple examples, an overall scheme of all the steps needed to obtain proper models by machine learning.

3. Methodology and Results

The Figure 1 shows the main steps of a typical machine learning process:

In this section, each of the steps in the figure above will be explained and applied to the selected aerodynamic databases. It is important to mention that all the research performed in this paper used Scikit-learn 0.22.1, pandas 1.0.1, matplotlib 3.1.3, and python 3.8.2 libraries, all included in the Anaconda distribution. CFD computations for the databases generation were performed with the DLR TAU code (release 2019.1.0, Spalart–Allmaras turbulence model, convergence criteria based on minimum residuals).

In order to allow other researchers to perform experiments on existing aerodynamic databases and avoid the repetition of the CFD simulations required to build such databases, Appendix A provides all the required data for free use within the research community

3.1. Data Polishing and Statistical Analysis

The first step was to quickly explore what the databases looked like. As mentioned previously, the complete databases for all the tested performed in this paper can be found in the annexes.

One of the important issues in databases for machine learning is that there are no empty values because, in case they exist, they should be substituted by the average, by zero or other values, depending on the specific case, in order to not affect the surrogate model performance. Therefore, it was checked if there were empty values for any of the variables and the result, as can be observed in the following Table 2, was that there were no empty values in the databases.

As can be observed in the previous table, all three databases are composed of 4 columns (corresponding to the Mach, AoA, lift coefficient and drag coefficient). The data size varies depending on the configuration, the NACA0012 database has 185 rows (it means CFD computations), the RAE2822 database has 122 samples, and the DPW databases includes 100 samples.

Then, it is possible to have a look at some statistics of the aerodynamic data in the databases Table 3, Table 4 and Table 5:

In the table above, the rows labeled “count”, “mean”, “min”, and “max” correspond to the number of samples, mean value of the parameter, minimum and maximum values, respectively. The row labeled “std” shows the standard deviation of the values for this particular parameter. Rows labeled as 25%, 50%, and 75% show the percentiles, which reflect the value below which a given percentage of observations in a certain group of observations falls.

From these statistics, there is one important aspect to consider. The AoA values have a high standard deviation and this will have to be considered further when deciding how to scale the training data to not affect the model performance.

It is also possible to plot the histograms of each of the considered parameters, to better understand the type of data to deal with. The following Figure 2, Figure 3 and Figure 4 show the histograms of each variable in the database for the three cases considered:

As can be deduced from the pictures above, all the three database were generated with a Latin Hypercube Sampling (LHS) method in parameters AoA and Mach, this is why the histograms show the same bars altitude except for those cases where the CFD solver did not converge (those with highest angles of attack which did not achieve the minimum residual convergence criteria) and were eliminated from the database.

In the case of Cd, the histogram shows a strong concentration of values near to 0, and the Cl histogram shows the main concentration for values between 0.7 and 1.25, especially in the two airfoil databases.

3.2. Splitting Training and Test Sets

The next step was to split the database between train and test sets. In this example, the split was done with a pure random sampling method and considering 80% of the initial samples for the train set and the other 20% for the test set. This step could be improved by using other splitting methods (such as cross-fold validation for instances as the work performed in [21]), but since the purpose of this paper is to give a general overview of the machine learning process for aerodynamic analysis, this random sampling technique was considered.

3.3. Exploring the Training Set

Now, the kind of data that will be used to build the model is explored more in detail, as can be observed in Figure 5:

In addition, since the datasets are of a manageable size, it is also possible to compute the Pearson’s coefficient

r

for every pair of variables, as are displayed in the following Table 6, Table 7 and Table 8:

In addition, one can also plot every numerical variable against any other numerical variable. In this case, since there are 4 attributes, 42 plots are obtained for each database, as can be observed in the following Figure 6, Figure 7 and Figure 8:

Since the main diagonal of these plots would be full of straight lines, instead of showing them, it is displayed as a histogram of each attribute (remember that these histograms look different with respect to the ones showed previously, since now only the training dataset is considered).

From these pictures, it can be observed that, for predicting the lift coefficient, the AoA has a very strong importance, while the Mach number is less important. However, for the prediction of the Cd, both parameters have almost the same importance, especially in the 2D cases. This aspect is less clear in the DPW test case, where the Mach number seems to be less important than the AoA for Cd prediction.

3.4. Preparing the Data for Machine Learning Algorithms

The first step here would be to handle the missing values in the database, but as it was mentioned before, the databases do not have missing values, since all the cases without solver convergence were not incorporated in the database.

Now, it is necessary to apply one of the most important transformations to the data, which is feature scaling. Machine learning algorithms do not behave well when the parameters have different ranges, and this is the case here since the scales of the AoA and the Cd for instance are very different, as it was also mentioned previously. In this research, a standard normally distributed scaling method [22] for each column in the training database (0 mean and unit variance) was used. The selected scaling method does not have a relevant impact on the results; what is really crucial is that all features are scaled in order to help machine learning methods to provide efficient predictions.

3.5. Model Construction

Until now, the problem has been stablished, the data have been obtained and examined, the training set and a test set have been sampled, and feature scaling has been performed to adapt the data for machine learning algorithms. In this subsection, a machine learning model is going to be selected and trained.

Since this paper aims to provide a global overview of the machine learning process, it is not in the scope to provide a deep evaluation of different models and tune the model parameters. Instead, only two regression models are selected and used with the default variables defined in the Scikit-learn packages:

Linear regression [23].
Support vector regression (gamma = “scale”, C = 1.0, epsilon = 0.1 for Cl prediction, 0.01 for Cd prediction)) [24].

3.6. Model Validation

Once the model is trained, the final step is to use it for predicting new values and validate its performance.

First, the linear regression model was tested. Table 9 shows the typical regression metrics for model comparison. Figure 9 shows the comparison between true vs. predicted coefficient values with linear regression method.

Then, the support vector regression model was tested. The following Table 10 shows the typical regression metrics for model comparison. Figure 10 shows the comparison between true vs. predicted coefficient values with SVR method.

As can be observed in the figures above, the SVR model behaves better than the linear regression model, as was expected. The R² metrics show reasonable accuracy values for the constructed model. Of course, a more robust cross-validation strategy, as well as the optimization of the SVR hyper parameters could have been performed, but the objective of this paper was only to give an overall view of the whole machine learning process for analysis of aerodynamic databases and prediction of aerodynamic coefficients.

4. Conclusions

This paper focuses on exploring the benefits that machine learning and data mining techniques can offer to aerodynamicists in order to extract knowledge from the CFD data and to make quick predictions of aerodynamic coefficients. The main objective of this paper has been to introduce all the steps in a typical machine learning process and apply these steps to aerodynamic databases. For this purpose, three aerodynamic databases (NACA0012 airfoil, RAE2822 airfoil and 3D DPW wing) have been used and results have demonstrated the feasibility and potential benefits of applying machine learning and data-driven techniques for aerodynamic analysis of aeronautical configurations.

As the future work, there is still further potential to be exploited: a clever generation of the samples in the initial dataset (not LHS), the use of more robust model validation strategies, such as cross-fold validation, the combination of multi-fidelity data within the aerodynamic database (e.g., CFD, wind tunnel, flight testing data, etc.), the comparison of different regression models and tuning these parameters, etc. In addition, the use of these models for uncertainty quantification is another future research topic to face.

Finally, it is important to mention that all databases used in this paper are freely available for the scientific community.

Funding

This research received no external funding

Conflicts of Interest

The authors declare no conflict of interest

Appendix A. Databases

In order to allow other researchers to perform experiments on existing aerodynamic databases and avoid the repetition of the CFD simulations required to build such databases, this appendix provides all the required data for free use within the research community (databases will be provided on request by email).

Table A1. Aerodynamic databases for naca0012, rae2822 and dpw test cases.

Naca0012				Rae2822				Dpw Wing
Mach	AoA	Cl	Cd	Mach	AoA	Cl	Cd	Mach	AoA	Cl	Cd
0.1	0	0.00034864	0.01355065	0.1	0	0.20478228	0.01150837	0.1	0	0.34490435	0.01532178
0.1	1	0.11786291	0.01348484	0.1	1	0.31598134	0.01169231	0.1	1	0.43094063	0.01811268
0.1	2	0.23511923	0.0140823	0.1	2	0.42641901	0.0120037	0.1	2	0.51661595	0.02160981
0.1	3	0.35163278	0.01537563	0.1	3	0.5356642	0.01246302	0.1	3	0.60181206	0.02580588
0.1	4	0.46693354	0.01738756	0.1	4	0.64335269	0.01310139	0.1	4	0.68640261	0.03069294
0.1	5	0.58047163	0.02021214	0.1	5	0.74901551	0.01395388	0.1	5	0.77025437	0.03626244
0.1	6	0.69169693	0.02392192	0.1	6	0.85205249	0.01506802	0.1	6	0.85323022	0.04250429
0.1	7	0.79989887	0.02866703	0.1	7	0.95153298	0.01652279	0.1	7	0.9351778	0.04940593
0.1	8	0.90439914	0.03453483	0.1	8	1.04615466	0.01843026	0.1	8	1.01593228	0.05695184
0.1	9	1.00444087	0.04161181	0.1	9	1.13396707	0.02096989	0.1	9	1.09530372	0.06512261
0.1	10	1.09908405	0.05002468	0.1	10	1.2116608	0.02446154	0.1	10	1.17308781	0.07389458
0.1	11	1.18733623	0.05986405	0.1	11	1.27319621	0.0295291	0.1	11	1.24904355	0.0832391
0.1	12	1.26794121	0.07127074	0.1	12	1.30757569	0.0374886	0.1	12	1.32291412	0.09312363
0.1	13	1.33977189	0.08428013	0.1	13	1.29496163	0.05107973	0.1	13	1.39439736	0.10351022
0.1	14	1.40154664	0.09897053	0.2	0	0.21152056	0.01002395	0.1	14	1.46316246	0.11435581
0.1	15	1.45193494	0.11539774	0.2	1	0.32545539	0.01018193	0.1	15	1.52884459	0.1256121
0.1	16	1.48977686	0.13355583	0.2	2	0.43869458	0.01044382	0.2	0	0.35784439	0.01123022
0.1	17	1.514051	0.15344376	0.2	3	0.55085001	0.01082799	0.2	1	0.44564584	0.01414844
0.1	18	1.52400049	0.17501892	0.2	4	0.66160695	0.01136067	0.2	2	0.53315257	0.01776694
0.1	19	1.51933408	0.19821272	0.2	5	0.77065115	0.01206581	0.2	3	0.62025489	0.02208108
0.1	20	1.50055195	0.22290925	0.2	6	0.8774021	0.0129811	0.2	4	0.70683211	0.0270855
0.2	0	0.00147839	0.00764385	0.2	7	0.98111069	0.01416774	0.2	5	0.79274664	0.03277378
0.2	1	0.12058562	0.00766263	0.2	8	1.08062166	0.01571391	0.2	6	0.8778417	0.03913748
0.2	2	0.23940088	0.0081628	0.2	9	1.17425782	0.01775307	0.2	7	0.96193191	0.04616506
0.2	3	0.35741302	0.00917954	0.2	10	1.25932828	0.02051788	0.2	8	1.04480182	0.05384133
0.2	4	0.47400585	0.01080112	0.2	11	1.33095237	0.02444987	0.2	9	1.12620348	0.06214616
0.2	5	0.58835445	0.01320961	0.2	12	1.38026534	0.03047922	0.2	10	1.20584063	0.07105415
0.2	6	0.69976032	0.01649634	0.2	13	1.39096829	0.04067699	0.2	11	1.28337244	0.08053246
0.2	7	0.80734698	0.02077916	0.2	14	0.77538343	0.19726277	0.2	12	1.35838641	0.09053913
0.2	8	0.91003064	0.02619192	0.3	0	0.21838947	0.00934052	0.2	13	1.43040181	0.10102278
0.2	9	1.00649057	0.03288686	0.3	1	0.33591982	0.00949857	0.2	14	1.49883792	0.11192032
0.2	10	1.09515088	0.04101942	0.3	2	0.45274708	0.00975815	0.2	15	1.5629855	0.12315464
0.2	11	1.17424983	0.05073368	0.3	3	0.56849657	0.01013881	0.3	0	0.36874228	0.00990418
0.2	12	1.24183166	0.06216799	0.3	4	0.68283083	0.0106665	0.3	1	0.45872605	0.01298114
0.2	13	1.29599939	0.07537635	0.3	5	0.79542628	0.01136966	0.3	2	0.54843083	0.01678315
0.2	14	1.33476649	0.0904007	0.3	6	0.90561674	0.01229446	0.3	3	0.63774344	0.02130707
0.2	15	1.35644989	0.10722269	0.3	7	1.01227207	0.01352695	0.3	4	0.72652433	0.02654955
0.2	16	1.35983594	0.12575123	0.3	8	1.11348966	0.01523225	0.3	5	0.81461785	0.03250621
0.2	17	1.34467883	0.14581214	0.3	9	1.20455267	0.01781221	0.3	6	0.90182815	0.03917107
0.2	18	1.31234511	0.16717247	0.3	10	1.26909232	0.02285614	0.3	7	0.98789493	0.04653312
0.2	19	1.26688364	0.18955789	0.3	11	1.26600957	0.03513813	0.3	8	1.07251145	0.05457734
0.2	20	1.21601526	0.21263805	0.3	14	0.80931463	0.19002002	0.3	9	1.15527777	0.06328186
0.3	0	0.00166224	0.00567437	0.3	15	0.79469274	0.22054701	0.3	10	1.23566974	0.07261592
0.3	1	0.12394694	0.00575906	0.4	0	0.22697387	0.00893865	0.3	11	1.31298013	0.08253557
0.3	2	0.24590258	0.00628682	0.4	1	0.34981432	0.00910732	0.3	12	1.38623968	0.09297645
0.3	3	0.36697165	0.00729793	0.4	2	0.47193074	0.00938471	0.3	13	1.45408922	0.10384448
0.3	4	0.48580126	0.0090936	0.4	3	0.59295454	0.00979211	0.3	14	1.51463925	0.1150027
0.3	5	0.60172345	0.01173898	0.4	4	0.71248721	0.01036197	0.3	15	1.56524945	0.12625974
0.3	6	0.7135568	0.01540221	0.4	5	0.83001058	0.01114133	0.4	0	0.38199054	0.00953887
0.3	7	0.81978422	0.02026493	0.4	6	0.94398001	0.01225255	0.4	1	0.47511661	0.01284081
0.3	8	0.91833746	0.02658771	0.4	7	1.04697298	0.01430937	0.4	2	0.56797754	0.01691617
0.3	9	1.00674362	0.03461385	0.4	8	1.10537962	0.02061118	0.4	3	0.66045687	0.02176407
0.3	10	1.08132847	0.0447432	0.4	13	0.85517366	0.15764748	0.4	4	0.75238654	0.02738474
0.3	11	1.1392022	0.05709218	0.5	0	0.23870972	0.00871551	0.4	5	0.8435564	0.03377798
0.3	12	1.17834998	0.0714785	0.5	1	0.36960458	0.00890705	0.4	6	0.9336623	0.04094201
0.3	13	1.19689698	0.08773689	0.5	2	0.49980719	0.00922257	0.4	7	1.0222786	0.04887145
0.3	14	1.194207	0.10557819	0.5	3	0.62894137	0.00968923	0.4	8	1.10869795	0.05755978
0.3	15	1.1710816	0.12463177	0.5	4	0.75646706	0.01036055	0.4	9	1.1916454	0.06699707
0.3	16	1.1313085	0.14443285	0.5	5	0.87993837	0.01144269	0.4	10	1.26828659	0.07718435
0.3	17	1.08254594	0.16463531	0.5	6	0.97111817	0.01540752	0.4	11	1.33268237	0.0881008
0.3	18	1.03479689	0.18515759	0.5	7	1.00968497	0.02454478	0.4	12	1.37858719	0.09946716
0.3	19	0.99460818	0.20597369	0.5	8	0.98585454	0.04116198	0.4	13	1.40149934	0.11080493
0.3	20	0.96193387	0.22729269	0.6	0	0.25598625	0.00864509	0.5	0	0.40001585	0.00975765
0.4	0	0.00156924	0.00475901	0.6	1	0.39996208	0.00888127	0.5	1	0.49793815	0.01340333
0.4	1	0.12902259	0.00490616	0.6	2	0.5435357	0.00927495	0.5	2	0.59569143	0.0179093
0.4	2	0.25614837	0.00550179	0.6	3	0.68639769	0.0098805	0.5	3	0.69311377	0.02328386
0.4	3	0.38183434	0.00672044	0.6	4	0.82031306	0.01163186	0.5	4	0.789923	0.02954492
0.4	4	0.5044046	0.00885155	0.6	5	0.94888813	0.01607704	0.5	5	0.88528512	0.03675215
0.4	5	0.62258142	0.01207326	0.6	6	1.05501693	0.02528943	0.5	6	0.9752414	0.04521391
0.4	6	0.73415927	0.01667064	0.6	7	1.1061645	0.03897731	0.5	7	1.05347332	0.05529114
0.4	7	0.83619501	0.02298766	0.6	8	1.02831852	0.05725626	0.5	8	1.11687337	0.0668394
0.4	8	0.92427571	0.03152067	0.6	9	0.84746188	0.09203749	0.5	9	1.16481554	0.07950771
0.4	9	0.99410253	0.04254277	0.6	10	0.80219426	0.1243578	0.5	10	1.19615115	0.09280839
0.4	10	1.04336401	0.0558185	0.6	11	0.79361849	0.15172702	0.5	11	1.20795335	0.10614303
0.4	11	1.06901774	0.07115748	0.6	12	0.79538765	0.17612524	0.6	0	0.42662636	0.01053514
0.4	12	1.07033681	0.08818781	0.6	13	0.81117798	0.20116541	0.6	1	0.53215968	0.01475246
0.4	13	1.04875427	0.10636047	0.6	14	0.82352243	0.22422017	0.6	2	0.63769192	0.02003221
0.4	14	1.00953571	0.12504717	0.6	15	0.84541693	0.24918325	0.6	3	0.74031446	0.02685975
0.4	15	0.96219326	0.1437808	0.7	0	0.28552023	0.0088277	0.6	4	0.83611646	0.03596358
0.4	16	0.91714303	0.1626428	0.7	1	0.4552778	0.00918366	0.6	5	0.92496205	0.04759207
0.4	17	0.87976015	0.18190757	0.7	2	0.62759227	0.00981429	0.6	6	1.00605016	0.06171506
0.4	18	0.85118012	0.20176305	0.7	3	0.80939577	0.01123075	0.6	7	1.07851749	0.07813513
0.4	19	0.83142716	0.22208854	0.7	4	0.97088706	0.02100958	0.6	8	1.14035887	0.09635543
0.4	20	0.81730939	0.24280637	0.7	5	1.04109807	0.03806899	0.6	9	1.1904048	0.1158151
0.5	0	0.0013092	0.00431248	0.7	6	1.01928409	0.05606353	0.6	10	1.2265724	0.13570755
0.5	1	0.13699071	0.00452484	0.7	7	0.96864203	0.07417348	0.6	11	1.24690584	0.15516688
0.5	2	0.27220791	0.00527327	0.7	8	0.909723	0.09332397	0.7	0	0.47478709	0.01242688
0.5	3	0.40529214	0.00680007	0.7	9	0.85474593	0.11439034	0.7	1	0.59416051	0.01841556
0.5	4	0.53365078	0.0094989	0.7	10	0.8136909	0.1390286	0.7	2	0.71263573	0.02739059
0.5	5	0.65500918	0.0136707	0.7	11	0.81763773	0.16551528	0.7	3	0.82867544	0.04034131
0.5	6	0.76592039	0.01976437	0.7	12	0.83099549	0.1904208	0.7	4	0.94051449	0.05765871
0.5	7	0.85968697	0.02859707	0.7	13	0.85202205	0.21635245	0.7	5	1.04452939	0.07898422
0.5	8	0.93110545	0.04048238	0.7	14	0.87184003	0.24104815	0.7	6	1.13844811	0.10354212
0.5	9	0.97859639	0.05500783	0.8	0	0.25379077	0.02261738	0.7	7	1.22049898	0.13032914
0.5	10	0.99966192	0.07142576	0.8	1	0.37291229	0.02757891	0.8	0	0.59842495	0.03070903
0.5	11	0.99461587	0.08896353	0.8	2	0.47440306	0.03524511	0.8	1	0.73936297	0.0501211
0.5	12	0.96501081	0.10691509	0.8	3	0.56312584	0.04564638	0.8	2	0.86822118	0.07409706
0.5	13	0.91613681	0.12481628	0.8	4	0.6460511	0.05832932	0.8	3	0.98333082	0.10115296
0.5	14	0.86503783	0.14251852	0.8	5	0.71874703	0.07377648	0.8	4	1.08530449	0.13027077
0.5	15	0.81917669	0.1606226	0.8	6	0.77900244	0.09125873	0.8	5	1.17487716	0.16068126
0.5	16	0.78575895	0.17937346	0.8	7	0.83005988	0.11019433
0.5	17	0.76234041	0.19901087	0.8	8	0.87349327	0.12991298
0.6	0	0.0009119	0.00416125	0.8	9	0.91336673	0.15089809
0.6	1	0.15034584	0.00446971	0.8	10	0.94395678	0.17216155
0.6	2	0.29936634	0.005434	0.8	11	0.96521429	0.19428368
0.6	3	0.44558056	0.00743859	0.8	12	0.97446734	0.21664782
0.6	4	0.58566612	0.01106919	0.8	13	0.97990796	0.24185741
0.6	5	0.71641357	0.01769764	0.8	14	0.99465858	0.26980803
0.6	6	0.83164647	0.02927649	0.8	15	1.02677325	0.29960937
0.6	7	0.92507028	0.0454219	0.9	0	-0.03959858	0.12059924
0.6	8	0.99671461	0.06461165	0.9	1	0.06822316	0.11970889
0.6	9	1.05085356	0.08561563	0.9	2	0.18051904	0.12310501
0.6	10	1.08157905	0.10677951	0.9	3	0.29714686	0.13081809
0.6	11	1.08386342	0.12648953	0.9	4	0.41716614	0.14252143
0.6	12	1.06537418	0.14454872	0.9	5	0.54251249	0.15831269
0.6	13	1.01799052	0.15965823	0.9	6	0.66775734	0.17707051
0.6	14	0.9276694	0.17099549	0.9	7	0.80270165	0.19788185
0.6	15	0.82413283	0.18302397	0.9	8	0.97275077	0.2228966
0.6	16	0.75017574	0.20017763	0.9	9	1.09793814	0.2519245
0.6	17	0.73720931	0.21962102	0.9	11	1.28626327	0.31933105
0.6	18	0.72817532	0.24000652	0.9	12	1.36381377	0.35620579
0.6	19	0.72512852	0.26104437	0.9	13	1.43461078	0.39464213
0.7	0	0.00042779	0.00437586
0.7	1	0.17780983	0.00485724
0.7	2	0.35841207	0.00684577
0.7	3	0.53344121	0.01439492
0.7	4	0.68918301	0.02921294
0.7	5	0.82633986	0.04998559
0.7	6	0.94556538	0.07494196
0.7	7	1.04722633	0.10254214
0.7	8	1.13100158	0.13150703
0.7	9	1.19559795	0.16047414
0.7	10	1.24248574	0.18862919
0.7	11	1.27330107	0.21537239
0.7	12	1.29015172	0.24036068
0.7	13	1.29612152	0.26357329
0.7	14	1.28629569	0.28347853
0.7	15	1.23736911	0.2941774
0.7	16	1.21579887	0.30991999
0.7	17	1.06062252	0.29430817
0.7	18	0.95511579	0.29199597
0.7	19	0.77880954	0.29346789
0.7	20	0.78938387	0.31550949
0.8	0	-0.00039134	0.01274311
0.8	1	0.26636627	0.02143281
0.8	2	0.52301696	0.0437993
0.8	3	0.74242164	0.07303372
0.8	4	0.92167763	0.10571765
0.8	5	1.06355444	0.13960275
0.8	6	1.17559163	0.17372668
0.8	7	1.26644313	0.20800433
0.8	8	1.34263925	0.2425298
0.8	9	1.40730755	0.27724884
0.8	10	1.46190791	0.31196262
0.8	11	1.50691821	0.34627601
0.8	12	1.54324379	0.37997316
0.8	13	1.5717274	0.4128407
0.8	14	1.59277154	0.44462681
0.8	15	1.60857777	0.47559143
0.8	16	1.61758798	0.50501055
0.8	17	1.61780059	0.53199262
0.8	18	1.61357461	0.55722501
0.8	19	1.59712192	0.57775063
0.8	20	1.55842117	0.58879609
0.9	0	0.00199354	0.1153863
0.9	1	0.10756093	0.11727129
0.9	2	0.2127823	0.12281292
0.9	3	0.31806577	0.13199071
0.9	4	0.42396218	0.14466949
0.9	5	0.53078849	0.16072812
0.9	6	0.63790546	0.17989125
0.9	7	0.74522719	0.20218415
0.9	8	0.85139603	0.22794191
0.9	9	0.95385464	0.25707932
0.9	10	1.04954578	0.28910502
0.9	11	1.13714946	0.32341146
0.9	12	1.21713735	0.35959466
0.9	13	1.29028096	0.39740223
0.9	14	1.35744052	0.43663143
0.9	15	1.41910384	0.47707385
0.9	16	1.47571004	0.51856196
0.9	17	1.52760376	0.56095865
0.9	18	1.57508078	0.60414505
0.9	19	1.61839437	0.64799499
0.9	20	1.6577327	0.69238254

References

Hanna, B.N.; Dinh, N.T.; Youngblood, R.W.; Bolotnov, I.A. Machine-learning based error prediction approach for coarse-grid Computational Fluid Dynamics (CG-CFD). Prog. Nucl. Energy 2020, 118, 103140. [Google Scholar] [CrossRef]
Ti, Z.; Deng, X.; Yang, H. Wake modeling of wind turbines using machine learning. Appl. Energy 2020, 257, 114025. [Google Scholar] [CrossRef]
Zhao, Y.; Akolekar, H.D.; Weatheritt, J.; Michelassi, V.; Sandberg, R.D. RANS turbulence model development using CFD-driven machine learning. J. Comput. Phys. 2020, 411, 109413. [Google Scholar] [CrossRef] [Green Version]
Zhu, L.; Zhang, W.; Kou, J.; Liu, Y. Machine learning methods for turbulence modeling in subsonic flows around airfoils. Phys. Fluids 2019, 31, 015105. [Google Scholar] [CrossRef]
Holland, J.R.; Baeder, J.D.; Duraisamy, K. Towards integrated field inversion and machine learning with embedded neural networks for rans modeling. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019. [Google Scholar]
Paulete-Periáñez, C.; Esther, A.-P.; Carlos, L. Surrogate modelling for aerodynamic coefficients prediction in aeronautical configurations. In Proceedings of the 8th European Conference for Aeronautics and Space Sciences (EUCASS), Madrid, Spain, 1–4 July 2019. [Google Scholar] [CrossRef]
Venturi, S.; Sharma Priyadarshini, M.; Panesi, M. A Machine Learning Framework for the Quantification of the Uncertainties Associated with Ab-Initio Based Modeling of Non-Equilibrium Flows. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019. [Google Scholar]
Buisson, B.; Lakehal, D. Towards an integrated machine-learning framework for model evaluation and uncertainty quantification. Nucl. Eng. Des. 2019, 354, 110197. [Google Scholar] [CrossRef]
Yan, X.; Zhu, J.; Kuang, M.; Wang, X. Aerodynamic shape optimization using a novel optimizer based on machine learning techniques. Aerosp. Sci. Technol. 2019, 86, 826–835. [Google Scholar] [CrossRef]
Wu, J.; Xiao, H.; Paterson, E. Physics-informed machine learning approach for augmenting turbulence models: A comprehensive framework. Phys. Rev. Fluids 2018, 3, 074602. [Google Scholar] [CrossRef] [Green Version]
Dupuis, R.; Jean-Christophe, J.; Pierre, S. Aerodynamic Data Predictions for Transonic Flows via a Machine-Learning-based Surrogate Model. In Proceedings of the 2018 AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Orlando, FL, USA, 8–12 January 2018. [Google Scholar]
Zhang, Y.; Woong, J.S.; Dimitri, N.M. Application of convolutional neural network to predict airfoil lift coefficient. In Proceedings of the 2018 AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Orlando, FL, USA, 8–12 January 2018. [Google Scholar]
Yan, C. Wind Turbine Wakes: From Numerical Modeling to Machine Learning. Ph.D. Thesis, University of Delaware, Newark, DE, USA, 2018. [Google Scholar]
Secco, N.R.; De Mattos, B.S. Artificial neural networks to predict aerodynamic coefficients of transport airplanes. Aircr. Eng. Aerosp. Technol. 2017, 89, 211–230. [Google Scholar] [CrossRef]
Kou, J.; Zhang, W. Multi-kernel neural networks for nonlinear unsteady aerodynamic reduced-order modeling. Aerosp. Sci. Technol. 2017, 67, 309–326. [Google Scholar] [CrossRef]
Morshedizadeh, M. Condition Monitoring of Wind Turbines Using Intelligent Machine Learning Techniques. Ph.D. Thesis, University of Windsor, Windsor, ON, Canada, 2017. [Google Scholar]
Andrés-Pérez, E.; Carro-Calvo, L.; Salcedo-Sanz, S.; Martin-Burgos, M.J. Aerodynamic Shape Design by Evolutionary Optimization and Support Vector Machines. In Application of Surrogate-based Global Optimization to Aerodynamic Design; Springer: Cham, Switzerland, 2016; pp. 1–24. [Google Scholar]
Viúdez-Moreiras, D.; Andrés-Pérez, E.; González-Juárez, D.; Burgos, M.J.M. Performance Comparison of Kriging and Svr Surrogate Models Applied to the Objective Function Prediction within Aerodynamic Shape Optimization. In Proceedings of the VII European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS Congress 2016), Crete, Greece, 5–10 June 2016. [Google Scholar] [CrossRef] [Green Version]
Wang, Q.; Qian, W.; He, K. Unsteady aerodynamic modeling at high angles of attack using support vector machines. Chin. J. Aeronaut. 2015, 28, 659–668. [Google Scholar] [CrossRef] [Green Version]
Fossati, M. Evaluation of Aerodynamic Loads via Reduced-Order Methodology. AIAA J. 2015, 53, 2389–2405. [Google Scholar] [CrossRef]
Chen, S.; Chen, K.; Xu, C.; Lan, L. Flexible ranking extreme learning machine based on matrix-centering transformation. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN); Institute of Electrical and Electronics Engineers (IEEE): New York, NY, USA, 2018; pp. 1–8. [Google Scholar]
Yao, J.; Moawad, A. Vehicle energy consumption estimation using large scale simulations and machine learning methods. Transp. Res. Part C: Emerg. Technol. 2019, 101, 276–296. [Google Scholar] [CrossRef]
Dube, P.; Hiravennavar, S. Machine Learning Approach to Predict Aerodynamic Performance of Underhood and Underbody Drag Enablers SAE Technical Paper 2020-01-0684; SAE Technical Paper: Warrendale, PA, USA, 2020. [Google Scholar] [CrossRef]
Download CitationChen, S.; Gao, Z.; Zhu, X.; Du, Y.; Pang, C. Unstable unsteady aerodynamic modeling based on least squares support vector machines with general excitation. Chin. J. Aeronaut. 2020. [Google Scholar] [CrossRef]

Figure 1. Main steps of a machine learning process (preprocessing–model construction–model validation).

Figure 2. Histograms of the variables in the NACA0012 database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).

Figure 3. Histograms of the variables in the RAE2822 database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).

Figure 4. Histograms of the variables in the DPW database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).

Figure 5. Training set in the NACA0012 (a), RAE2822 (b) and DPW (c) databases (left: Mach versus AoA distribution of the database samples, middle: lift coefficient curves of the database samples, right: drag coefficient curves of the database samples).

Figure 6. Exploring training set in the NACA0012 database (correlation matrix plot).

Figure 7. Exploring training set in the RAE2822 database (correlation matrix plot).

Figure 8. Exploring training set in the DPW database (correlation matrix plot).

Figure 9. True vs. predicted coefficient values with linear regression method: NACA0012 (a) RAE2822 (b) and DPW (c) databases (left: lift coefficient, right: drag coefficient).

Figure 10. True vs. predicted coefficient values with SVR method: NACA0012 (a) RAE2822 (b) and DPW (c) databases (left: lift coefficient, right: drag coefficient).

Table 1. Summary of the recent state-of-the-art studies in the scientific literature.

Ref.	Year of Publication	Main Use of Machine Learning	Summary of the Main Advance Proposed
[1]	2020	To predict the distribution of a coarse grid CFD local error and correct the fluid-flow variables.	Authors propose a surrogate model trained to predict the distribution of a coarse grid CFD local error. The proposed surrogate model is built using ML regression algorithms. They tested artificial neural network (ANN) and random forest (RF) and the test case selected was a three-dimensional turbulent flow inside a lid-driven cavity.
[2]	2020	To perform wake modeling of wind turbines	A combined framework with CFD and machine learning techniques is presented to improve the turbine wake predictions. In particular, an ANN model in combination to a reduced-order turbine model ADM-R, actuator disk model with rotation, is proposed and demonstrated to be capable to handle big amounts of data with complex relations between the parameters involved. The selected test case was a standalone Vestas V80 2 MW wind turbine.
[3]	2020	To develop a machine learning framework for Reynolds-averaged Navier–Stokes (RANS) models	Authors propose a new data-driven machine learning method to model RANS equations. The proposed CFD-driven machine learning approach was applied to model development for wake mixing in turbomachines
[4]	2019	To develop surrogate models to augment the capability of the current turbulence models	Authors propose a mapping approach between the turbulent eddy viscosity and the mean flow variables by ANNs. The study includes several tests cases using well-known airfoils, such as the NACA0012. The data-driven turbulence model is applied to predict eddy viscosity, lift/drag coefficients, and skin friction distributions.
[5]	2019	To develop surrogate models to augment the capability of the current turbulence models	A method based on neural networks is applied to reduce the error of RANS simulations by using the data-augmented turbulence model approach in an integrated way. In addition, also a new layered approach for the NN is proposed to reduce the required training times.
[6]	2019	To develop surrogate models for fast prediction of aerodynamic coefficients of aeronautical configurations.	Authors propose in this paper to use support vector machines as surrogate models for quick prediction of aerodynamic coefficients. Research included also testing in different aeronautical configurations such as NACA0012, RAE2822 and DPW wing.
[7,8]	2019	To develop machine learning techniques to predict uncertainties	In these papers, data-driven models are built form uncertainty quantification in aerodynamics.
[9]	2019	To develop machine learning methods for aerodynamic shape optimization	Authors propose a new optimizer based on machine learning techniques, in particular reinforcement learning, transfer learning and deep neural networks. The proposed approach is tested for a typical aerodynamic shape optimization of missile control surfaces with computational fluid dynamics (CFD).
[10]	2018	To accelerate RANS simulations using a data-driven method	Authors demonstrated that machine learning can be used to improve the RANS modeled Reynolds stresses by leveraging data from high-fidelity simulations.
[11]	2018	To predict aerodynamic data in transonic flows	Authors propose a local decomposition method to improve the accuracy of aerodynamic fields in transonic conditions. Tests were performed on the AS28G aircraft configuration.
[12]	2018	To develop machine learning methods for prediction of airfoil lift coefficient	A convolutional neural network is developed to learn the lift coefficients of several airfoils of different shapes and parameters such as Mach, Reynolds and AoA.
[13]	2018	To predict wind turbine wakes	Authors propose to use a deep neural network with transfer learning ability for efficient prediction of wind turbine wakes and efficiency.
[14]	2017	To predict aerodynamic coefficients of transport airplanes	An artificial neural network model is proposed to predict aerodynamic coefficients of transport airplanes. The proposed model is able to efficiently predict both lift and drag coefficients in wing-fuselage configurations.
[15]	2017	To develop machine learning methods for nonlinear unsteady aerodynamic reduced-order modeling	Authors propose a multi-kernel neural networks approach to improve the accuracy and generalization capability through linearly combining the Gaussian and wavelet basis functions as the hidden basis functions.
[16]	2017	To develop machine learning methods for condition monitoring of wind turbines	A surrogate model is proposed to monitor wind turbine conditions and to be able to detect possible anomalies in turbine performance which have the potential to result in unexpected failure.
[17]	2016	To reduce the computational cost of aerodynamic shape design process	In order to reduce the computational cost of aerodynamic shape design process of aeronautical configurations, authors propose to use evolutionary optimization methods in combination to support vector machines to speed-up the design stage while preserving a certain level of accuracy.
[18]	2016	To predict the objective function prediction within aerodynamic shape optimization	Authors present a comparison of Kriging and Support Vector Machines for Regression (SVR) surrogate models applied to the objective function prediction within an aerodynamic shape optimization framework.
[19]	2015	To develop machine learning methods for unsteady aerodynamics modeling	Authors propose to use support vector machines for unsteady aerodynamic modeling at high angles of attack.
[20]	2015	To develop machine learning methods to model aerodynamic loads	Author proposes to integrate Centroidal Voronoi tessellation, leave-one-out cross validation, proper orthogonal decomposition, and multidimensional interpolation for the evaluation of steady aerodynamic loads.

Table 2. Initial information of the aerodynamic databases.

NACA0012	RAE2822	DPW
Range Index: 185 entries, 0 to 184	Range Index: 122 entries, 0 to 121	Range Index: 100 entries, 0 to 99
Data columns (total 4 columns):	Data columns (total 4 columns):	Data columns (total 4 columns):
Mach: 185 non-null float64	Mach: 122 non-null float64	Mach: 100 non-null float64
AoA: 185 non-null int64	AoA: 122 non-null int64	AoA: 100 non-null int64
Cl: 185 non-null float64	Cl: 122 non-null float64	Cl: 100 non-null float64
Cd: 185 non-null float64	Cd: 122 non-null float64	Cd: 100 non-null float64
dtypes: float64 (3), int64 (1)	dtypes: float64 (3), int64 (1)	dtypes: float64 (3), int64 (1)

Table 3. Statistics of the aerodynamic database (NACA0012).

Variable	Mach	AoA	Cl	Cd
count	185.00	185.00	185.00	185.00
mean	0.499459	9.800000	0.897379	0.144013
std	0.261579	5.978949	0.438446	0.151947
min	0.100000	0.000000	−0.000391	0.004161
25%	0.300000	5.000000	0.622582	0.020779
50%	0.500000	10.00000	0.953855	0.105718
75%	0.700000	15.00000	1.216015	0.202184
max	0.900000	20.00000	1.657733	0.692383

Table 4. Statistics of the aerodynamic database (RAE2822).

Variable	Mach	AoA	Cl	Cd
count	122.00	122.00	122.00	122.00
mean	0.505738	6.573770	0.788143	0.076774
std	0.265448	4.340694	0.323649	0.090517
min	0.100000	0.000000	−0.039599	0.008645
25%	0.300000	3.000000	0.553919	0.011459
50%	0.500000	6.000000	0.821918	0.024456
75%	0.700000	10.00000	0.992458	0.130592
max	0.900000	15.00000	1.434611	0.394642

Table 5. Statistics of the aerodynamic database (DPW).

Variable	Mach	AoA	Cl	Cd
count	100.00	100.00	100.00	100.00
mean	0.388000	6.260000	0.946160	0.058803
std	0.213333	4.296158	0.331897	0.038622
min	0.100000	0.000000	0.344904	0.009539
25%	0.200000	3.000000	0.679916	0.026364
50%	0.400000	6.000000	0.968587	0.049764
75%	0.600000	10.00000	1.198574	0.088710
max	0.800000	15.00000	1.565249	0.160681

Table 6. Pearson correlation coefficients in NACA0012 databases.

Variable	AoA	Mach	Cl	Cd
AoA	1.0	0.013024	0.788449	0.673338
Mach	0.013024	1.0	0.101251	0.57034
Cl	0.788449	0.101251	1.0	0.652496
Cd	0.673338	0.570344	0.652496	1.0

Table 7. Pearson correlation coefficients in RAE2822 databases.

Variable	AoA	Mach	Cl	Cd
AoA	1.0	0.018619	0.770069	0.650465
Mach	0.018619	1.0	−0.100818	0.559170
Cl	0.770069	−0.100818	1.0	0.262417
Cd	0.650465	0.559170	0.262417	1.0

Table 8. Pearson correlation coefficients in DPW databases.

Variable	AoA	Mach	Cl	Cd
AoA	1.0	−0.339471	0.962521	0.819541
Mach	−0.339471	1.0	−0.121352	0.080954
Cl	0.962521	−0.121352	1.0	0.892737
Cd	0.892737	0.080954	0.892737	1.0

Table 9. Performance metrics for the linear regression model (MAE: mean absolute error, MSE: mean squared error, R²: coefficient of determination).

Test case	NACA0012			RAE2822			DPW
Variable	MAE	MSE	R²	MAE	MSE	R²	MAE	MSE	R²
Cl	0.213361	0.064331	0.599226	0.170544	0.039926	0.725196	0.030881	0.002242	0.981537
Cd	0.048828	0.003615	0.773458	0.037094	0.002124	0.753398	0.011685	0.000197	0.853932

Table 10. Performance metrics for the SVR model (MAE: mean absolute error, MSE: mean squared error, R²: coefficient of determination).

Test Case	NACA0012			RAE2822			DPW
Variable	MAE	MSE	R²	MAE	MSE	R²	MAE	MSE	R²
Cl	0.087786	0.010171	0.936636	0.080280	0.008804	0.939404	0.038026	0.002941	0.975785
Cd	0.008265	0.000168	0.989469	0.009824	0.000133	0.984611	0.006637	0.000059	0.956424

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Andrés-Pérez, E. Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits. Energies 2020, 13, 5807. https://doi.org/10.3390/en13215807

AMA Style

Andrés-Pérez E. Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits. Energies. 2020; 13(21):5807. https://doi.org/10.3390/en13215807

Chicago/Turabian Style

Andrés-Pérez, Esther. 2020. "Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits" Energies 13, no. 21: 5807. https://doi.org/10.3390/en13215807

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits

Abstract

1. Introduction

2. Brief Review of the State of the Art

3. Methodology and Results

3.1. Data Polishing and Statistical Analysis

3.2. Splitting Training and Test Sets

3.3. Exploring the Training Set

3.4. Preparing the Data for Machine Learning Algorithms

3.5. Model Construction

3.6. Model Validation

4. Conclusions

Funding

Conflicts of Interest

Appendix A. Databases

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI