Next Article in Journal
Mutual Interaction between Temperature and DO Set Point on AOB and NOB Activity during Shortcut Nitrification in a Sequencing Batch Reactor in Terms of Energy Consumption Optimization
Next Article in Special Issue
On the Experimental, Numerical and Data-Driven Methods to Study Urban Flows
Previous Article in Journal
Energy Sustainability of Rural Residential Buildings with Bio-Based Building Fabric in Northeast China
Previous Article in Special Issue
Simulations of Aerodynamic Separated Flows Using the Lattice Boltzmann Solver XFlow
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits

by
Esther Andrés-Pérez
Theoretical and Computational Aerodynamics Branch, Flight Physics Department, Spanish National Institute for Aerospace Technology (INTA), Ctra. Ajalvir, km. 4, 28850 Torrejón de Ardoz, Spain
Energies 2020, 13(21), 5807; https://doi.org/10.3390/en13215807
Submission received: 31 July 2020 / Revised: 28 September 2020 / Accepted: 15 October 2020 / Published: 6 November 2020
(This article belongs to the Special Issue Methods and Numerical Applications in Fluid Mechanics)

Abstract

:
Machine learning and data mining techniques are nowadays being used in many business sectors to exploit the data in order to detect trends, discover certain features and patters, or even predict the future. However, in the field of aerodynamics, the application of these techniques is still in the initial stages. This paper focuses on exploring the benefits that machine learning and data mining techniques can offer to aerodynamicists in order to extract knowledge from the CFD data and to make quick predictions of aerodynamic coefficients. For this purpose, three aerodynamic databases (NACA0012 airfoil, RAE2822 airfoil and 3D DPW wing) have been used and results show that machine-learning and data-mining techniques have a huge potential also in this field.

1. Introduction

In the field of aerodynamics, complex steady flows are simulated by computational fluid dynamics (CFD) daily in the industry since CFD tools have already reached an acceptable level of maturity. These simulations are usually performed over full-aircraft configurations or several aircraft components where meshes of hundreds of million points are required in order to provide precise features of the flow. In addition, simulations are performed for different parameters to properly explore the design space. This implies a high computational cost that may be, in certain situations, even infeasible nowadays. To overcome this limitation, the CFD solver could be replaced by a surrogate model which produces a fast prediction of the aerodynamic features, based on previous simulations or wind-tunnel data. Machine learning techniques commonly used in the area of artificial intelligence (AI) and data mining (DM) can represent a valuable support to reduce the computational cost required for aerodynamic analysis.
The objective of this paper is to research in the application of machine learning and data-driven approaches for aerodynamic analysis. While these techniques have been broadly used in other sectors such as finances or risk analysis, the application in the aeronautical sector is still in its infancy. The novelty of this paper is to research on the feasibility and potential benefits of applying these techniques for aerodynamic analysis of aeronautical configurations. Application test cases have been selected amongst those commonly used in the literature for validation purpose, in order to be able to quickly generate the required databases for testing the methods, and to provide comparable results. For the abovementioned purpose, this paper covers all the required aspects in any machine learning project, such as data analysis, feature scaling, model construction, and accuracy measurement.
The main motivation for this research is to analyze the potential of data mining and machine learning techniques for a fast aerodynamic features prediction based on previous and existing aerodynamic data. These techniques may have a huge impact in the aerodynamic design process for achieving novel aeronautical configurations, since they would allow the evaluation of different promising shapes in a quick manner, therefore reducing the associated cost of the overall design stage.
The paper is structured as follows: Section 2 presents a review of the state-of-the-art methods in the technical fields involved in this research, focusing on machine learning and data-driven approaches for aerodynamic analysis. Section 3 describes the followed methodology for data analysis and model construction and validation, together with the numerical results and finally, Section 4 presents the conclusions. As annexed at the end of this paper, the complete databases information is provided to allow other researchers to further exploit the data with other techniques.

2. Brief Review of the State of the Art

This section will review the state of the art in the technical fields involved in this research, namely machine learning and data-driven approaches for aerodynamic analysis.
In the five last years, there has been an increasing interest in the development of techniques to handle aerodynamic data, coming from different sources, such as CFD simulations, wind tunnel experiments or even flight test data. The ability of handling this vast amount of data of a heterogeneous nature is a crucial factor in order to enable machine learning methods to be applied in the aeronautic industry.
In the following Table 1, the most recent state-of-the-art studies in the scientific literature are reviewed:
In summary, from the last 5 years, it can be observed that machine learning techniques have been used to predict the aerodynamic features, to accelerate or improve the precision of turbulence models, to speed-up the shape design optimization process and to quantify manage uncertainties in the flow fields, amongst others.
All the papers found, however, focus more on the application, and do not provide an overall view of all the requires steps and requirements to properly handle and prepare the data for machine learning techniques. This paper aims to fill in this gap and provides, through simple examples, an overall scheme of all the steps needed to obtain proper models by machine learning.

3. Methodology and Results

The Figure 1 shows the main steps of a typical machine learning process:
In this section, each of the steps in the figure above will be explained and applied to the selected aerodynamic databases. It is important to mention that all the research performed in this paper used Scikit-learn 0.22.1, pandas 1.0.1, matplotlib 3.1.3, and python 3.8.2 libraries, all included in the Anaconda distribution. CFD computations for the databases generation were performed with the DLR TAU code (release 2019.1.0, Spalart–Allmaras turbulence model, convergence criteria based on minimum residuals).
In order to allow other researchers to perform experiments on existing aerodynamic databases and avoid the repetition of the CFD simulations required to build such databases, Appendix A provides all the required data for free use within the research community

3.1. Data Polishing and Statistical Analysis

The first step was to quickly explore what the databases looked like. As mentioned previously, the complete databases for all the tested performed in this paper can be found in the annexes.
One of the important issues in databases for machine learning is that there are no empty values because, in case they exist, they should be substituted by the average, by zero or other values, depending on the specific case, in order to not affect the surrogate model performance. Therefore, it was checked if there were empty values for any of the variables and the result, as can be observed in the following Table 2, was that there were no empty values in the databases.
As can be observed in the previous table, all three databases are composed of 4 columns (corresponding to the Mach, AoA, lift coefficient and drag coefficient). The data size varies depending on the configuration, the NACA0012 database has 185 rows (it means CFD computations), the RAE2822 database has 122 samples, and the DPW databases includes 100 samples.
Then, it is possible to have a look at some statistics of the aerodynamic data in the databases Table 3, Table 4 and Table 5:
In the table above, the rows labeled “count”, “mean”, “min”, and “max” correspond to the number of samples, mean value of the parameter, minimum and maximum values, respectively. The row labeled “std” shows the standard deviation of the values for this particular parameter. Rows labeled as 25%, 50%, and 75% show the percentiles, which reflect the value below which a given percentage of observations in a certain group of observations falls.
From these statistics, there is one important aspect to consider. The AoA values have a high standard deviation and this will have to be considered further when deciding how to scale the training data to not affect the model performance.
It is also possible to plot the histograms of each of the considered parameters, to better understand the type of data to deal with. The following Figure 2, Figure 3 and Figure 4 show the histograms of each variable in the database for the three cases considered:
As can be deduced from the pictures above, all the three database were generated with a Latin Hypercube Sampling (LHS) method in parameters AoA and Mach, this is why the histograms show the same bars altitude except for those cases where the CFD solver did not converge (those with highest angles of attack which did not achieve the minimum residual convergence criteria) and were eliminated from the database.
In the case of Cd, the histogram shows a strong concentration of values near to 0, and the Cl histogram shows the main concentration for values between 0.7 and 1.25, especially in the two airfoil databases.

3.2. Splitting Training and Test Sets

The next step was to split the database between train and test sets. In this example, the split was done with a pure random sampling method and considering 80% of the initial samples for the train set and the other 20% for the test set. This step could be improved by using other splitting methods (such as cross-fold validation for instances as the work performed in [21]), but since the purpose of this paper is to give a general overview of the machine learning process for aerodynamic analysis, this random sampling technique was considered.

3.3. Exploring the Training Set

Now, the kind of data that will be used to build the model is explored more in detail, as can be observed in Figure 5:
In addition, since the datasets are of a manageable size, it is also possible to compute the Pearson’s coefficient r for every pair of variables, as are displayed in the following Table 6, Table 7 and Table 8:
In addition, one can also plot every numerical variable against any other numerical variable. In this case, since there are 4 attributes, 42 plots are obtained for each database, as can be observed in the following Figure 6, Figure 7 and Figure 8:
Since the main diagonal of these plots would be full of straight lines, instead of showing them, it is displayed as a histogram of each attribute (remember that these histograms look different with respect to the ones showed previously, since now only the training dataset is considered).
From these pictures, it can be observed that, for predicting the lift coefficient, the AoA has a very strong importance, while the Mach number is less important. However, for the prediction of the Cd, both parameters have almost the same importance, especially in the 2D cases. This aspect is less clear in the DPW test case, where the Mach number seems to be less important than the AoA for Cd prediction.

3.4. Preparing the Data for Machine Learning Algorithms

The first step here would be to handle the missing values in the database, but as it was mentioned before, the databases do not have missing values, since all the cases without solver convergence were not incorporated in the database.
Now, it is necessary to apply one of the most important transformations to the data, which is feature scaling. Machine learning algorithms do not behave well when the parameters have different ranges, and this is the case here since the scales of the AoA and the Cd for instance are very different, as it was also mentioned previously. In this research, a standard normally distributed scaling method [22] for each column in the training database (0 mean and unit variance) was used. The selected scaling method does not have a relevant impact on the results; what is really crucial is that all features are scaled in order to help machine learning methods to provide efficient predictions.

3.5. Model Construction

Until now, the problem has been stablished, the data have been obtained and examined, the training set and a test set have been sampled, and feature scaling has been performed to adapt the data for machine learning algorithms. In this subsection, a machine learning model is going to be selected and trained.
Since this paper aims to provide a global overview of the machine learning process, it is not in the scope to provide a deep evaluation of different models and tune the model parameters. Instead, only two regression models are selected and used with the default variables defined in the Scikit-learn packages:
  • Linear regression [23].
  • Support vector regression (gamma = “scale”, C = 1.0, epsilon = 0.1 for Cl prediction, 0.01 for Cd prediction)) [24].

3.6. Model Validation

Once the model is trained, the final step is to use it for predicting new values and validate its performance.
First, the linear regression model was tested. Table 9 shows the typical regression metrics for model comparison. Figure 9 shows the comparison between true vs. predicted coefficient values with linear regression method.
Then, the support vector regression model was tested. The following Table 10 shows the typical regression metrics for model comparison. Figure 10 shows the comparison between true vs. predicted coefficient values with SVR method.
As can be observed in the figures above, the SVR model behaves better than the linear regression model, as was expected. The R2 metrics show reasonable accuracy values for the constructed model. Of course, a more robust cross-validation strategy, as well as the optimization of the SVR hyper parameters could have been performed, but the objective of this paper was only to give an overall view of the whole machine learning process for analysis of aerodynamic databases and prediction of aerodynamic coefficients.

4. Conclusions

This paper focuses on exploring the benefits that machine learning and data mining techniques can offer to aerodynamicists in order to extract knowledge from the CFD data and to make quick predictions of aerodynamic coefficients. The main objective of this paper has been to introduce all the steps in a typical machine learning process and apply these steps to aerodynamic databases. For this purpose, three aerodynamic databases (NACA0012 airfoil, RAE2822 airfoil and 3D DPW wing) have been used and results have demonstrated the feasibility and potential benefits of applying machine learning and data-driven techniques for aerodynamic analysis of aeronautical configurations.
As the future work, there is still further potential to be exploited: a clever generation of the samples in the initial dataset (not LHS), the use of more robust model validation strategies, such as cross-fold validation, the combination of multi-fidelity data within the aerodynamic database (e.g., CFD, wind tunnel, flight testing data, etc.), the comparison of different regression models and tuning these parameters, etc. In addition, the use of these models for uncertainty quantification is another future research topic to face.
Finally, it is important to mention that all databases used in this paper are freely available for the scientific community.

Funding

This research received no external funding

Conflicts of Interest

The authors declare no conflict of interest

Appendix A. Databases

In order to allow other researchers to perform experiments on existing aerodynamic databases and avoid the repetition of the CFD simulations required to build such databases, this appendix provides all the required data for free use within the research community (databases will be provided on request by email).
Table A1. Aerodynamic databases for naca0012, rae2822 and dpw test cases.
Table A1. Aerodynamic databases for naca0012, rae2822 and dpw test cases.
Naca0012Rae2822Dpw Wing
MachAoAClCdMachAoAClCdMachAoAClCd
0.100.000348640.013550650.100.204782280.011508370.100.344904350.01532178
0.110.117862910.013484840.110.315981340.011692310.110.430940630.01811268
0.120.235119230.01408230.120.426419010.01200370.120.516615950.02160981
0.130.351632780.015375630.130.53566420.012463020.130.601812060.02580588
0.140.466933540.017387560.140.643352690.013101390.140.686402610.03069294
0.150.580471630.020212140.150.749015510.013953880.150.770254370.03626244
0.160.691696930.023921920.160.852052490.015068020.160.853230220.04250429
0.170.799898870.028667030.170.951532980.016522790.170.93517780.04940593
0.180.904399140.034534830.181.046154660.018430260.181.015932280.05695184
0.191.004440870.041611810.191.133967070.020969890.191.095303720.06512261
0.1101.099084050.050024680.1101.21166080.024461540.1101.173087810.07389458
0.1111.187336230.059864050.1111.273196210.02952910.1111.249043550.0832391
0.1121.267941210.071270740.1121.307575690.03748860.1121.322914120.09312363
0.1131.339771890.084280130.1131.294961630.051079730.1131.394397360.10351022
0.1141.401546640.098970530.200.211520560.010023950.1141.463162460.11435581
0.1151.451934940.115397740.210.325455390.010181930.1151.528844590.1256121
0.1161.489776860.133555830.220.438694580.010443820.200.357844390.01123022
0.1171.5140510.153443760.230.550850010.010827990.210.445645840.01414844
0.1181.524000490.175018920.240.661606950.011360670.220.533152570.01776694
0.1191.519334080.198212720.250.770651150.012065810.230.620254890.02208108
0.1201.500551950.222909250.260.87740210.01298110.240.706832110.0270855
0.200.001478390.007643850.270.981110690.014167740.250.792746640.03277378
0.210.120585620.007662630.281.080621660.015713910.260.87784170.03913748
0.220.239400880.00816280.291.174257820.017753070.270.961931910.04616506
0.230.357413020.009179540.2101.259328280.020517880.281.044801820.05384133
0.240.474005850.010801120.2111.330952370.024449870.291.126203480.06214616
0.250.588354450.013209610.2121.380265340.030479220.2101.205840630.07105415
0.260.699760320.016496340.2131.390968290.040676990.2111.283372440.08053246
0.270.807346980.020779160.2140.775383430.197262770.2121.358386410.09053913
0.280.910030640.026191920.300.218389470.009340520.2131.430401810.10102278
0.291.006490570.032886860.310.335919820.009498570.2141.498837920.11192032
0.2101.095150880.041019420.320.452747080.009758150.2151.56298550.12315464
0.2111.174249830.050733680.330.568496570.010138810.300.368742280.00990418
0.2121.241831660.062167990.340.682830830.01066650.310.458726050.01298114
0.2131.295999390.075376350.350.795426280.011369660.320.548430830.01678315
0.2141.334766490.09040070.360.905616740.012294460.330.637743440.02130707
0.2151.356449890.107222690.371.012272070.013526950.340.726524330.02654955
0.2161.359835940.125751230.381.113489660.015232250.350.814617850.03250621
0.2171.344678830.145812140.391.204552670.017812210.360.901828150.03917107
0.2181.312345110.167172470.3101.269092320.022856140.370.987894930.04653312
0.2191.266883640.189557890.3111.266009570.035138130.381.072511450.05457734
0.2201.216015260.212638050.3140.809314630.190020020.391.155277770.06328186
0.300.001662240.005674370.3150.794692740.220547010.3101.235669740.07261592
0.310.123946940.005759060.400.226973870.008938650.3111.312980130.08253557
0.320.245902580.006286820.410.349814320.009107320.3121.386239680.09297645
0.330.366971650.007297930.420.471930740.009384710.3131.454089220.10384448
0.340.485801260.00909360.430.592954540.009792110.3141.514639250.1150027
0.350.601723450.011738980.440.712487210.010361970.3151.565249450.12625974
0.360.71355680.015402210.450.830010580.011141330.400.381990540.00953887
0.370.819784220.020264930.460.943980010.012252550.410.475116610.01284081
0.380.918337460.026587710.471.046972980.014309370.420.567977540.01691617
0.391.006743620.034613850.481.105379620.020611180.430.660456870.02176407
0.3101.081328470.04474320.4130.855173660.157647480.440.752386540.02738474
0.3111.13920220.057092180.500.238709720.008715510.450.84355640.03377798
0.3121.178349980.07147850.510.369604580.008907050.460.93366230.04094201
0.3131.196896980.087736890.520.499807190.009222570.471.02227860.04887145
0.3141.1942070.105578190.530.628941370.009689230.481.108697950.05755978
0.3151.17108160.124631770.540.756467060.010360550.491.19164540.06699707
0.3161.13130850.144432850.550.879938370.011442690.4101.268286590.07718435
0.3171.082545940.164635310.560.971118170.015407520.4111.332682370.0881008
0.3181.034796890.185157590.571.009684970.024544780.4121.378587190.09946716
0.3190.994608180.205973690.580.985854540.041161980.4131.401499340.11080493
0.3200.961933870.227292690.600.255986250.008645090.500.400015850.00975765
0.400.001569240.004759010.610.399962080.008881270.510.497938150.01340333
0.410.129022590.004906160.620.54353570.009274950.520.595691430.0179093
0.420.256148370.005501790.630.686397690.00988050.530.693113770.02328386
0.430.381834340.006720440.640.820313060.011631860.540.7899230.02954492
0.440.50440460.008851550.650.948888130.016077040.550.885285120.03675215
0.450.622581420.012073260.661.055016930.025289430.560.97524140.04521391
0.460.734159270.016670640.671.10616450.038977310.571.053473320.05529114
0.470.836195010.022987660.681.028318520.057256260.581.116873370.0668394
0.480.924275710.031520670.690.847461880.092037490.591.164815540.07950771
0.490.994102530.042542770.6100.802194260.12435780.5101.196151150.09280839
0.4101.043364010.05581850.6110.793618490.151727020.5111.207953350.10614303
0.4111.069017740.071157480.6120.795387650.176125240.600.426626360.01053514
0.4121.070336810.088187810.6130.811177980.201165410.610.532159680.01475246
0.4131.048754270.106360470.6140.823522430.224220170.620.637691920.02003221
0.4141.009535710.125047170.6150.845416930.249183250.630.740314460.02685975
0.4150.962193260.14378080.700.285520230.00882770.640.836116460.03596358
0.4160.917143030.16264280.710.45527780.009183660.650.924962050.04759207
0.4170.879760150.181907570.720.627592270.009814290.661.006050160.06171506
0.4180.851180120.201763050.730.809395770.011230750.671.078517490.07813513
0.4190.831427160.222088540.740.970887060.021009580.681.140358870.09635543
0.4200.817309390.242806370.751.041098070.038068990.691.19040480.1158151
0.500.00130920.004312480.761.019284090.056063530.6101.22657240.13570755
0.510.136990710.004524840.770.968642030.074173480.6111.246905840.15516688
0.520.272207910.005273270.780.9097230.093323970.700.474787090.01242688
0.530.405292140.006800070.790.854745930.114390340.710.594160510.01841556
0.540.533650780.00949890.7100.81369090.13902860.720.712635730.02739059
0.550.655009180.01367070.7110.817637730.165515280.730.828675440.04034131
0.560.765920390.019764370.7120.830995490.19042080.740.940514490.05765871
0.570.859686970.028597070.7130.852022050.216352450.751.044529390.07898422
0.580.931105450.040482380.7140.871840030.241048150.761.138448110.10354212
0.590.978596390.055007830.800.253790770.022617380.771.220498980.13032914
0.5100.999661920.071425760.810.372912290.027578910.800.598424950.03070903
0.5110.994615870.088963530.820.474403060.035245110.810.739362970.0501211
0.5120.965010810.106915090.830.563125840.045646380.820.868221180.07409706
0.5130.916136810.124816280.840.64605110.058329320.830.983330820.10115296
0.5140.865037830.142518520.850.718747030.073776480.841.085304490.13027077
0.5150.819176690.16062260.860.779002440.091258730.851.174877160.16068126
0.5160.785758950.179373460.870.830059880.11019433
0.5170.762340410.199010870.880.873493270.12991298
0.600.00091190.004161250.890.913366730.15089809
0.610.150345840.004469710.8100.943956780.17216155
0.620.299366340.0054340.8110.965214290.19428368
0.630.445580560.007438590.8120.974467340.21664782
0.640.585666120.011069190.8130.979907960.24185741
0.650.716413570.017697640.8140.994658580.26980803
0.660.831646470.029276490.8151.026773250.29960937
0.670.925070280.04542190.90-0.039598580.12059924
0.680.996714610.064611650.910.068223160.11970889
0.691.050853560.085615630.920.180519040.12310501
0.6101.081579050.106779510.930.297146860.13081809
0.6111.083863420.126489530.940.417166140.14252143
0.6121.065374180.144548720.950.542512490.15831269
0.6131.017990520.159658230.960.667757340.17707051
0.6140.92766940.170995490.970.802701650.19788185
0.6150.824132830.183023970.980.972750770.2228966
0.6160.750175740.200177630.991.097938140.2519245
0.6170.737209310.219621020.9111.286263270.31933105
0.6180.728175320.240006520.9121.363813770.35620579
0.6190.725128520.261044370.9131.434610780.39464213
0.700.000427790.00437586
0.710.177809830.00485724
0.720.358412070.00684577
0.730.533441210.01439492
0.740.689183010.02921294
0.750.826339860.04998559
0.760.945565380.07494196
0.771.047226330.10254214
0.781.131001580.13150703
0.791.195597950.16047414
0.7101.242485740.18862919
0.7111.273301070.21537239
0.7121.290151720.24036068
0.7131.296121520.26357329
0.7141.286295690.28347853
0.7151.237369110.2941774
0.7161.215798870.30991999
0.7171.060622520.29430817
0.7180.955115790.29199597
0.7190.778809540.29346789
0.7200.789383870.31550949
0.80-0.000391340.01274311
0.810.266366270.02143281
0.820.523016960.0437993
0.830.742421640.07303372
0.840.921677630.10571765
0.851.063554440.13960275
0.861.175591630.17372668
0.871.266443130.20800433
0.881.342639250.2425298
0.891.407307550.27724884
0.8101.461907910.31196262
0.8111.506918210.34627601
0.8121.543243790.37997316
0.8131.57172740.4128407
0.8141.592771540.44462681
0.8151.608577770.47559143
0.8161.617587980.50501055
0.8171.617800590.53199262
0.8181.613574610.55722501
0.8191.597121920.57775063
0.8201.558421170.58879609
0.900.001993540.1153863
0.910.107560930.11727129
0.920.21278230.12281292
0.930.318065770.13199071
0.940.423962180.14466949
0.950.530788490.16072812
0.960.637905460.17989125
0.970.745227190.20218415
0.980.851396030.22794191
0.990.953854640.25707932
0.9101.049545780.28910502
0.9111.137149460.32341146
0.9121.217137350.35959466
0.9131.290280960.39740223
0.9141.357440520.43663143
0.9151.419103840.47707385
0.9161.475710040.51856196
0.9171.527603760.56095865
0.9181.575080780.60414505
0.9191.618394370.64799499
0.9201.65773270.69238254

References

  1. Hanna, B.N.; Dinh, N.T.; Youngblood, R.W.; Bolotnov, I.A. Machine-learning based error prediction approach for coarse-grid Computational Fluid Dynamics (CG-CFD). Prog. Nucl. Energy 2020, 118, 103140. [Google Scholar] [CrossRef]
  2. Ti, Z.; Deng, X.; Yang, H. Wake modeling of wind turbines using machine learning. Appl. Energy 2020, 257, 114025. [Google Scholar] [CrossRef]
  3. Zhao, Y.; Akolekar, H.D.; Weatheritt, J.; Michelassi, V.; Sandberg, R.D. RANS turbulence model development using CFD-driven machine learning. J. Comput. Phys. 2020, 411, 109413. [Google Scholar] [CrossRef] [Green Version]
  4. Zhu, L.; Zhang, W.; Kou, J.; Liu, Y. Machine learning methods for turbulence modeling in subsonic flows around airfoils. Phys. Fluids 2019, 31, 015105. [Google Scholar] [CrossRef]
  5. Holland, J.R.; Baeder, J.D.; Duraisamy, K. Towards integrated field inversion and machine learning with embedded neural networks for rans modeling. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019. [Google Scholar]
  6. Paulete-Periáñez, C.; Esther, A.-P.; Carlos, L. Surrogate modelling for aerodynamic coefficients prediction in aeronautical configurations. In Proceedings of the 8th European Conference for Aeronautics and Space Sciences (EUCASS), Madrid, Spain, 1–4 July 2019. [Google Scholar] [CrossRef]
  7. Venturi, S.; Sharma Priyadarshini, M.; Panesi, M. A Machine Learning Framework for the Quantification of the Uncertainties Associated with Ab-Initio Based Modeling of Non-Equilibrium Flows. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019. [Google Scholar]
  8. Buisson, B.; Lakehal, D. Towards an integrated machine-learning framework for model evaluation and uncertainty quantification. Nucl. Eng. Des. 2019, 354, 110197. [Google Scholar] [CrossRef]
  9. Yan, X.; Zhu, J.; Kuang, M.; Wang, X. Aerodynamic shape optimization using a novel optimizer based on machine learning techniques. Aerosp. Sci. Technol. 2019, 86, 826–835. [Google Scholar] [CrossRef]
  10. Wu, J.; Xiao, H.; Paterson, E. Physics-informed machine learning approach for augmenting turbulence models: A comprehensive framework. Phys. Rev. Fluids 2018, 3, 074602. [Google Scholar] [CrossRef] [Green Version]
  11. Dupuis, R.; Jean-Christophe, J.; Pierre, S. Aerodynamic Data Predictions for Transonic Flows via a Machine-Learning-based Surrogate Model. In Proceedings of the 2018 AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Orlando, FL, USA, 8–12 January 2018. [Google Scholar]
  12. Zhang, Y.; Woong, J.S.; Dimitri, N.M. Application of convolutional neural network to predict airfoil lift coefficient. In Proceedings of the 2018 AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Orlando, FL, USA, 8–12 January 2018. [Google Scholar]
  13. Yan, C. Wind Turbine Wakes: From Numerical Modeling to Machine Learning. Ph.D. Thesis, University of Delaware, Newark, DE, USA, 2018. [Google Scholar]
  14. Secco, N.R.; De Mattos, B.S. Artificial neural networks to predict aerodynamic coefficients of transport airplanes. Aircr. Eng. Aerosp. Technol. 2017, 89, 211–230. [Google Scholar] [CrossRef]
  15. Kou, J.; Zhang, W. Multi-kernel neural networks for nonlinear unsteady aerodynamic reduced-order modeling. Aerosp. Sci. Technol. 2017, 67, 309–326. [Google Scholar] [CrossRef]
  16. Morshedizadeh, M. Condition Monitoring of Wind Turbines Using Intelligent Machine Learning Techniques. Ph.D. Thesis, University of Windsor, Windsor, ON, Canada, 2017. [Google Scholar]
  17. Andrés-Pérez, E.; Carro-Calvo, L.; Salcedo-Sanz, S.; Martin-Burgos, M.J. Aerodynamic Shape Design by Evolutionary Optimization and Support Vector Machines. In Application of Surrogate-based Global Optimization to Aerodynamic Design; Springer: Cham, Switzerland, 2016; pp. 1–24. [Google Scholar]
  18. Viúdez-Moreiras, D.; Andrés-Pérez, E.; González-Juárez, D.; Burgos, M.J.M. Performance Comparison of Kriging and Svr Surrogate Models Applied to the Objective Function Prediction within Aerodynamic Shape Optimization. In Proceedings of the VII European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS Congress 2016), Crete, Greece, 5–10 June 2016. [Google Scholar] [CrossRef] [Green Version]
  19. Wang, Q.; Qian, W.; He, K. Unsteady aerodynamic modeling at high angles of attack using support vector machines. Chin. J. Aeronaut. 2015, 28, 659–668. [Google Scholar] [CrossRef] [Green Version]
  20. Fossati, M. Evaluation of Aerodynamic Loads via Reduced-Order Methodology. AIAA J. 2015, 53, 2389–2405. [Google Scholar] [CrossRef]
  21. Chen, S.; Chen, K.; Xu, C.; Lan, L. Flexible ranking extreme learning machine based on matrix-centering transformation. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN); Institute of Electrical and Electronics Engineers (IEEE): New York, NY, USA, 2018; pp. 1–8. [Google Scholar]
  22. Yao, J.; Moawad, A. Vehicle energy consumption estimation using large scale simulations and machine learning methods. Transp. Res. Part C: Emerg. Technol. 2019, 101, 276–296. [Google Scholar] [CrossRef]
  23. Dube, P.; Hiravennavar, S. Machine Learning Approach to Predict Aerodynamic Performance of Underhood and Underbody Drag Enablers SAE Technical Paper 2020-01-0684; SAE Technical Paper: Warrendale, PA, USA, 2020. [Google Scholar] [CrossRef]
  24. Download CitationChen, S.; Gao, Z.; Zhu, X.; Du, Y.; Pang, C. Unstable unsteady aerodynamic modeling based on least squares support vector machines with general excitation. Chin. J. Aeronaut. 2020. [Google Scholar] [CrossRef]
Figure 1. Main steps of a machine learning process (preprocessing–model construction–model validation).
Figure 1. Main steps of a machine learning process (preprocessing–model construction–model validation).
Energies 13 05807 g001
Figure 2. Histograms of the variables in the NACA0012 database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).
Figure 2. Histograms of the variables in the NACA0012 database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).
Energies 13 05807 g002
Figure 3. Histograms of the variables in the RAE2822 database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).
Figure 3. Histograms of the variables in the RAE2822 database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).
Energies 13 05807 g003
Figure 4. Histograms of the variables in the DPW database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).
Figure 4. Histograms of the variables in the DPW database (X-axis shows the value of the parameter while Y-axis shows the number of occurrences of that value in the database).
Energies 13 05807 g004
Figure 5. Training set in the NACA0012 (a), RAE2822 (b) and DPW (c) databases (left: Mach versus AoA distribution of the database samples, middle: lift coefficient curves of the database samples, right: drag coefficient curves of the database samples).
Figure 5. Training set in the NACA0012 (a), RAE2822 (b) and DPW (c) databases (left: Mach versus AoA distribution of the database samples, middle: lift coefficient curves of the database samples, right: drag coefficient curves of the database samples).
Energies 13 05807 g005
Figure 6. Exploring training set in the NACA0012 database (correlation matrix plot).
Figure 6. Exploring training set in the NACA0012 database (correlation matrix plot).
Energies 13 05807 g006
Figure 7. Exploring training set in the RAE2822 database (correlation matrix plot).
Figure 7. Exploring training set in the RAE2822 database (correlation matrix plot).
Energies 13 05807 g007
Figure 8. Exploring training set in the DPW database (correlation matrix plot).
Figure 8. Exploring training set in the DPW database (correlation matrix plot).
Energies 13 05807 g008
Figure 9. True vs. predicted coefficient values with linear regression method: NACA0012 (a) RAE2822 (b) and DPW (c) databases (left: lift coefficient, right: drag coefficient).
Figure 9. True vs. predicted coefficient values with linear regression method: NACA0012 (a) RAE2822 (b) and DPW (c) databases (left: lift coefficient, right: drag coefficient).
Energies 13 05807 g009
Figure 10. True vs. predicted coefficient values with SVR method: NACA0012 (a) RAE2822 (b) and DPW (c) databases (left: lift coefficient, right: drag coefficient).
Figure 10. True vs. predicted coefficient values with SVR method: NACA0012 (a) RAE2822 (b) and DPW (c) databases (left: lift coefficient, right: drag coefficient).
Energies 13 05807 g010
Table 1. Summary of the recent state-of-the-art studies in the scientific literature.
Table 1. Summary of the recent state-of-the-art studies in the scientific literature.
Ref.Year of PublicationMain Use of Machine LearningSummary of the Main Advance Proposed
[1]2020To predict the distribution of a coarse grid CFD local error and correct the fluid-flow variables.Authors propose a surrogate model trained to predict the distribution of a coarse grid CFD local error. The proposed surrogate model is built using ML regression algorithms. They tested artificial neural network (ANN) and random forest (RF) and the test case selected was a three-dimensional turbulent flow inside a lid-driven cavity.
[2]2020To perform wake modeling of wind turbinesA combined framework with CFD and machine learning techniques is presented to improve the turbine wake predictions. In particular, an ANN model in combination to a reduced-order turbine model ADM-R, actuator disk model with rotation, is proposed and demonstrated to be capable to handle big amounts of data with complex relations between the parameters involved. The selected test case was a standalone Vestas V80 2 MW wind turbine.
[3]2020To develop a machine learning framework for Reynolds-averaged Navier–Stokes (RANS) modelsAuthors propose a new data-driven machine learning method to model RANS equations. The proposed CFD-driven machine learning approach was applied to model development for wake mixing in turbomachines
[4]2019To develop surrogate models to augment the capability of the current turbulence modelsAuthors propose a mapping approach between the turbulent eddy viscosity and the mean flow variables by ANNs. The study includes several tests cases using well-known airfoils, such as the NACA0012. The data-driven turbulence model is applied to predict eddy viscosity, lift/drag coefficients, and skin friction distributions.
[5]2019To develop surrogate models to augment the capability of the current turbulence modelsA method based on neural networks is applied to reduce the error of RANS simulations by using the data-augmented turbulence model approach in an integrated way. In addition, also a new layered approach for the NN is proposed to reduce the required training times.
[6]2019To develop surrogate models for fast prediction of aerodynamic coefficients of aeronautical configurations.Authors propose in this paper to use support vector machines as surrogate models for quick prediction of aerodynamic coefficients. Research included also testing in different aeronautical configurations such as NACA0012, RAE2822 and DPW wing.
[7,8]2019To develop machine learning techniques to predict uncertaintiesIn these papers, data-driven models are built form uncertainty quantification in aerodynamics.
[9]2019To develop machine learning methods for aerodynamic shape optimizationAuthors propose a new optimizer based on machine learning techniques, in particular reinforcement learning, transfer learning and deep neural networks. The proposed approach is tested for a typical aerodynamic shape optimization of missile control surfaces with computational fluid dynamics (CFD).
[10]2018To accelerate RANS simulations using a data-driven methodAuthors demonstrated that machine learning can be used to improve the RANS modeled Reynolds stresses by leveraging data from high-fidelity simulations.
[11]2018To predict aerodynamic data in transonic flowsAuthors propose a local decomposition method to improve the accuracy of aerodynamic fields in transonic conditions. Tests were performed on the AS28G aircraft configuration.
[12]2018To develop machine learning methods for prediction of airfoil lift coefficientA convolutional neural network is developed to learn the lift coefficients of several airfoils of different shapes and parameters such as Mach, Reynolds and AoA.
[13]2018To predict wind turbine wakesAuthors propose to use a deep neural network with transfer learning ability for efficient prediction of wind turbine wakes and efficiency.
[14]2017To predict aerodynamic coefficients of transport airplanesAn artificial neural network model is proposed to predict aerodynamic coefficients of transport airplanes. The proposed model is able to efficiently predict both lift and drag coefficients in wing-fuselage configurations.
[15]2017To develop machine learning methods for nonlinear unsteady aerodynamic reduced-order modelingAuthors propose a multi-kernel neural networks approach to improve the accuracy and generalization capability through linearly combining the Gaussian and wavelet basis functions as the hidden basis functions.
[16]2017To develop machine learning methods for condition monitoring of wind turbinesA surrogate model is proposed to monitor wind turbine conditions and to be able to detect possible anomalies in turbine performance which have the potential to result in unexpected failure.
[17]2016To reduce the computational cost of aerodynamic shape design processIn order to reduce the computational cost of aerodynamic shape design process of aeronautical configurations, authors propose to use evolutionary optimization methods in combination to support vector machines to speed-up the design stage while preserving a certain level of accuracy.
[18]2016To predict the objective function prediction within aerodynamic shape optimizationAuthors present a comparison of Kriging and Support Vector Machines for Regression (SVR) surrogate models applied to the objective function prediction within an aerodynamic shape optimization framework.
[19]2015To develop machine learning methods for unsteady aerodynamics modelingAuthors propose to use support vector machines for unsteady aerodynamic modeling at high angles of attack.
[20]2015To develop machine learning methods to model aerodynamic loadsAuthor proposes to integrate Centroidal Voronoi tessellation, leave-one-out cross validation, proper orthogonal decomposition, and multidimensional interpolation for the evaluation of steady aerodynamic loads.
Table 2. Initial information of the aerodynamic databases.
Table 2. Initial information of the aerodynamic databases.
NACA0012RAE2822DPW
Range Index: 185 entries, 0 to 184Range Index: 122 entries, 0 to 121Range Index: 100 entries, 0 to 99
Data columns (total 4 columns):Data columns (total 4 columns):Data columns (total 4 columns):
Mach: 185 non-null float64Mach: 122 non-null float64Mach: 100 non-null float64
AoA: 185 non-null int64AoA: 122 non-null int64AoA: 100 non-null int64
Cl: 185 non-null float64Cl: 122 non-null float64Cl: 100 non-null float64
Cd: 185 non-null float64Cd: 122 non-null float64Cd: 100 non-null float64
dtypes: float64 (3), int64 (1)dtypes: float64 (3), int64 (1)dtypes: float64 (3), int64 (1)
Table 3. Statistics of the aerodynamic database (NACA0012).
Table 3. Statistics of the aerodynamic database (NACA0012).
VariableMachAoAClCd
count185.00185.00185.00185.00
mean0.4994599.8000000.8973790.144013
std0.2615795.9789490.4384460.151947
min0.1000000.000000−0.0003910.004161
25%0.3000005.0000000.6225820.020779
50%0.50000010.000000.9538550.105718
75%0.70000015.000001.2160150.202184
max0.90000020.000001.6577330.692383
Table 4. Statistics of the aerodynamic database (RAE2822).
Table 4. Statistics of the aerodynamic database (RAE2822).
VariableMachAoAClCd
count122.00122.00122.00122.00
mean0.5057386.5737700.7881430.076774
std0.2654484.3406940.3236490.090517
min0.1000000.000000−0.0395990.008645
25%0.3000003.0000000.5539190.011459
50%0.5000006.0000000.8219180.024456
75%0.70000010.000000.9924580.130592
max0.90000015.000001.4346110.394642
Table 5. Statistics of the aerodynamic database (DPW).
Table 5. Statistics of the aerodynamic database (DPW).
VariableMachAoAClCd
count100.00100.00100.00100.00
mean0.3880006.2600000.9461600.058803
std0.2133334.2961580.3318970.038622
min0.1000000.0000000.3449040.009539
25%0.2000003.0000000.6799160.026364
50%0.4000006.0000000.9685870.049764
75%0.60000010.000001.1985740.088710
max0.80000015.000001.5652490.160681
Table 6. Pearson correlation coefficients in NACA0012 databases.
Table 6. Pearson correlation coefficients in NACA0012 databases.
VariableAoAMachClCd
AoA1.00.0130240.7884490.673338
Mach0.0130241.00.1012510.57034
Cl0.7884490.1012511.00.652496
Cd0.6733380.5703440.6524961.0
Table 7. Pearson correlation coefficients in RAE2822 databases.
Table 7. Pearson correlation coefficients in RAE2822 databases.
VariableAoAMachClCd
AoA1.00.0186190.7700690.650465
Mach0.0186191.0−0.1008180.559170
Cl0.770069−0.1008181.00.262417
Cd0.6504650.5591700.2624171.0
Table 8. Pearson correlation coefficients in DPW databases.
Table 8. Pearson correlation coefficients in DPW databases.
VariableAoAMachClCd
AoA1.0−0.3394710.9625210.819541
Mach−0.3394711.0−0.1213520.080954
Cl0.962521−0.1213521.00.892737
Cd0.8927370.0809540.8927371.0
Table 9. Performance metrics for the linear regression model (MAE: mean absolute error, MSE: mean squared error, R2: coefficient of determination).
Table 9. Performance metrics for the linear regression model (MAE: mean absolute error, MSE: mean squared error, R2: coefficient of determination).
Test caseNACA0012RAE2822DPW
VariableMAEMSER2MAEMSER2MAEMSER2
Cl0.2133610.0643310.5992260.1705440.0399260.7251960.0308810.0022420.981537
Cd0.0488280.0036150.7734580.0370940.0021240.7533980.0116850.0001970.853932
Table 10. Performance metrics for the SVR model (MAE: mean absolute error, MSE: mean squared error, R2: coefficient of determination).
Table 10. Performance metrics for the SVR model (MAE: mean absolute error, MSE: mean squared error, R2: coefficient of determination).
Test CaseNACA0012RAE2822DPW
VariableMAEMSER2MAEMSER2MAEMSER2
Cl0.0877860.0101710.9366360.0802800.0088040.9394040.0380260.0029410.975785
Cd0.0082650.0001680.9894690.0098240.0001330.9846110.0066370.0000590.956424
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Andrés-Pérez, E. Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits. Energies 2020, 13, 5807. https://doi.org/10.3390/en13215807

AMA Style

Andrés-Pérez E. Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits. Energies. 2020; 13(21):5807. https://doi.org/10.3390/en13215807

Chicago/Turabian Style

Andrés-Pérez, Esther. 2020. "Data Mining and Machine Learning Techniques for Aerodynamic Databases: Introduction, Methodology and Potential Benefits" Energies 13, no. 21: 5807. https://doi.org/10.3390/en13215807

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop