Machine Learning Prediction of Airfoil Aerodynamic Performance Using Neural Network Ensembles

Sterpu, Diana-Andreea; Măriuța, Daniel; Cican, Grigore; Larco, Ciprian-Marius; Grigorie, Lucian-Teodor

doi:10.3390/app15147720

Open AccessArticle

Machine Learning Prediction of Airfoil Aerodynamic Performance Using Neural Network Ensembles

by

Diana-Andreea Sterpu

¹

,

Daniel Măriuța

²

,

Grigore Cican

^1,3

,

Ciprian-Marius Larco

² and

Lucian-Teodor Grigorie

^1,*

¹

Faculty of Aerospace Engineering, National University of Science and Technology POLITEHNICA Bucharest, 060042 Bucharest, Romania

²

Department of Aircraft Integrated Systems and Aviation, Military Technical Academy “Ferdinand I” Bucharest, 050141 Bucharest, Romania

³

Romanian Research and Development Institute for Gas Turbines—COMOTI, 220D Iuliu Maniu, 061126 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7720; https://doi.org/10.3390/app15147720

Submission received: 21 June 2025 / Revised: 5 July 2025 / Accepted: 7 July 2025 / Published: 9 July 2025

(This article belongs to the Special Issue 5th Anniversary of Aerospace Science and Engineering Section—Recent Advances in Aerospace)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

This research introduces a fast and reliable method for predicting aerodynamic performance using a deep learning approach, enabling efficient integration into preliminary design and optimization workflows. In addition, it presents a novel investigation into the influence of random seed selection, a factor that is virtually never addressed in aerodynamic machine learning studies, demonstrating its significant effect on model accuracy and reproducibility.

Abstract

Reliable aerodynamic performance estimation is essential for both preliminary design and optimization in various aeronautical applications. In this study, a hybrid deep learning model is proposed, combining convolutional neural networks (CNNs) and operating directly on raw airfoil geometry, with parallel branches of fully connected deep neural networks (DNNs) that process operational parameters and engineered features. The model is trained on an extensive database of NACA four-digit airfoils, covering angles of attack ranging from −5° to 14° and ten Reynolds numbers increasing in steps of 500,000 from 500,000 up to 5,000,000. As a novel contribution, this work investigates the impact of random seed initialization on model accuracy and reproducibility and introduces a seed-based ensemble strategy to enhance generalization. The best-performing single-seed model tested (seed 0) achieves a mean absolute percentage error (MAPE) of 1.1% with an

R^{2}

of 0.9998 for the lift coefficient prediction and 0.57% with an

R^{2}

of 0.9954 for the drag coefficient prediction. In comparison, the best ensemble model tested (seeds 610, 987, and 75025) achieves a lift coefficient MAPE of 1.43%, corresponding to

R^{2}

0.9999, and a drag coefficient MAPE of 1.19%, corresponding to

R^{2}

= 0.9968. All the tested seed dependencies in this paper (ten single seeds and five ensembles) demonstrate an overall

R^{2}

greater than 0.97, which reflects the model architecture’s strong foundation. The novelty of this study lies in the demonstration that the same machine learning model, trained on identical data and architecture, can exhibit up to 250% variation in prediction error solely due to differences in random seed selection. This finding highlights the often-overlooked impact of seed initialization on model performance and highlights the necessity of treating seed choice as an active design parameter in ML aerodynamic predictions.

Keywords:

aerodynamic prediction; NACA airfoils; hybrid deep learning; ensemble learning; random seed sensitivity; convolutional neural network; deep neural network

1. Introduction

Accurately predicting aerodynamic performance is fundamental to the design and optimization of airfoils employed in aircraft, wind turbines, unmanned aerial vehicles (UAVs), and drones. Traditionally, this process is heavily dependent on high-fidelity computational fluid dynamics (CFD). While effective, CFD remains one of the most resource-intensive tools available in current aerodynamic design workflows. For example, Galeazzo et al. [1] evaluated finite-volume operations per second across hardware architectures (AMD, Intel, Arm) using OpenFOAM benchmarks, demonstrating the considerable computational costs involved in CFD workflows. Regarding ANSYS Fluent applications, a recent study by Cooper-Baldock et al. [2] evaluated GPU-accelerated CFD performance, comparing compute speed, power consumption, and service-unit cost across CPU and GPU architectures. While less reliable than CFD, XFOIL simulations are proven to be much faster and serve as a practical alternative for establishing the initial aerodynamic performance of airfoils. As shown in Günel et al. [3], XFOIL completed solutions in seconds to minutes, whereas CFD runs required significantly more time due to mesh generation and solver iterations. In response to this challenge, a new direction is created by leveraging machine learning (ML) as a data-driven alternative for aerodynamic prediction. When provided with enough high-quality data, ML models can deliver fast and reliable performance estimates that are useful in early-stage design or low-cost applications such as drones and small UAVs.

Recent studies have explored various artificial intelligence (AI) approaches, specifically ML architectures, to predict aerodynamic characteristics. On a global review standpoint, different studies cover ML applications across aerospace engineering, including aerodynamic prediction and turbulence modeling [4], wind farms [5], UAV operations [6], and drone technology [7]. For instance, Malecha and Sobczyk [8] utilized AI to predict the lift coefficient of wind turbine profiles, employing PARSEC parametrization and random forest regression. Similarly, Bakar et al. [9] developed a convolutional neural network (CNN) framework for the multi-objective optimization of low Reynolds number airfoils, demonstrating the efficacy of CNNs in aerodynamic coefficient prediction, while Zuo et al. [10] proposed a hybrid model combining CNNs and multilayer perceptrons (MLPs) to predict flow fields around airfoils. On a different type of approach, a NASA report [11] explores the differences between graph neural networks (GNNs) and DNNs to predict aerodynamic performance for XFOIL-generated data. These studies, among others [12,13,14,15,16], are laying the groundwork for bridging classical aerodynamic theory with contemporary computational principles.

To advance the current state of the research in machine learning applications for aerodynamics, this study presents a unified and methodologically transparent framework for aerodynamic performance prediction using ML. It leverages multi-modal input integration, combining high-resolution airfoil geometries, engineered geometric descriptors, and scalar flow parameters such as Reynolds number and angle of attack (AoA). These three input sources are processed through dedicated neural network branches: a CNN for geometric input and two dense neural networks (DNNs) for the scalar and engineered features. Each branch is independently tailored to extract relevant features from its respective data type, allowing for simultaneous and specialized processing before fusion. Additionally, the dataset used in this work is fully self-generated using XFOIL simulations, ensuring accessibility in how the input data were produced.

One of the distinctive aspects of this work is its detailed treatment of pseudo-random number generator (PRNG) seeds. In the broader context of machine learning, the importance of setting random seeds for reproducibility is well-recognized, although not specific to aerodynamic applications; for example, the fine-tuning of large language models [17], the general usage of seeding in coding ML projects [18], and the study of black swans, which are seeds that produce radically different results [19]. While these studies provide valuable insights into the impact of random seeds across various ML applications, there remains a gap in the literature specifically addressing their role in aerodynamic performance prediction. A key novelty of this present study lies in the implementation of ensemble modeling structured around three distinct pseudo-random seed values, a strategy that was not previously applied in aerodynamic ML contexts. Given the inherent sensitivity of aerodynamic predictions to model initialization and stochastic training dynamics, addressing seed-induced variability is critical.

In addition to machine learning ensemble models designed for performance prediction, ensemble methods have also been applied in the field of aerodynamic state estimation, as shown by Pérez and Periáñe [20] in a state-of-the-art review on AI and machine learning in aerodynamics. For instance, da Silva and Colonius [21] proposed an ensemble-based Kalman filter to estimate aerodynamic flow states around airfoils and cylinders. Their method demonstrates how ensemble averaging can be effectively utilized to capture system uncertainty and improve robustness in flow reconstruction using limited pressure or velocity measurements. Similarly to this paper’s approach, Kumar and Ghosh [22] applied a bagging deep ensemble approach by training multiple neural networks with varied initializations in order to predict aerodynamic forces from simulated and experimental data. Furthermore, Zhang et al. [23] explored stacking-based ensembles for multi-objective aerodynamic shape optimization, stacking diverse base learners to boost both predictive accuracy and optimization efficiency. In addition to these, a growing number of studies have adopted ensemble strategies in aerodynamic machine learning tasks to enhance predictive performance and robustness; e.g., Refs. [24,25,26,27,28].

While all these studies explore the performance of ensembles composed of different models, none explicitly investigate the impact of random seed initialization on a fixed machine learning architecture. The novelty of this present study lies in shifting the ensemble modeling focus toward internal calibration, demonstrating that prediction performance can vary by as much as 250% solely due to seed selection. This highlights the substantial influence of initialization on model reliability, even when architecture and data remain unchanged. By systematically analyzing and averaging across multiple seeds, this approach mitigates the influence of unfavorable initializations and enhances the stability and reproducibility of predictions.

2. Methodology

2.1. General Structure

The architecture developed in this study is designed to predict aerodynamic coefficients, specifically lift coefficients (

C_{L}

) and drag coefficients (

C_{D}

), by integrating geometric, scalar, and engineered input modalities through a hybrid deep learning framework. The model is structured as a multi-batch neural network; each subnetwork specializes in processing a distinct form of data input.

For training the ML model, 91 NACA four-digit airfoils are used and generated in-house based on the NACA four-digit equation (detailed in Appendix A.1—NACA Four-digit Generation). For each airfoil, by running XFOIL (detailed in Appendix A.2—XFOIL Command Sequence), there are aerodynamic performance values corresponding to 10 Re numbers, ranging from 500,000 up to 5,000,000 in

5 \times 10^{5}

steps, and 20 AoA, ranging from −5° to 14°. Summing up, for each of the 91 airfoils, there are 200 samples. In total, the algorithm runs 18,200 samples.

Each training sample is a tuple:

(x_{g}, x_{s}, x_{e}; y) \in R^{135 \times 2} \times R^{2} \times R^{2} \times R^{2}

(1)

where

x_{g}

represents the geometry of the airfoil profile by 135 x–y points;

x_{s}

represents the scalar inputs formed with AoA (

α

) and a Reynolds number (Re);

x_{e}

represents the engineered features of the airfoil constructed by airfoil thickness (

t_{m a x}

), which is computed as the vertical distance between the highest and lowest y-coordinates across the airfoil, and camber, which is estimated as the mean of the maximum and minimum y-values; and y represents the targets—the lift (

C_{L}

) and drag (

C_{D}

) coefficients.

Each training sample is processed simultaneously by three branches:

Geometric branch: A CNN for geometry ( $x_{g}$ ).

The airfoil contour, represented as a sequence of 135 (x,y) coordinate pairs, is processed in the geometric branch. This branch begins with Conv 1D #1, which represents a convolutional 1D layer with a kernel size of 3 and 32 filters, extracting local shape descriptors. The output shape of the Conv1D #1 layer is represented by 133 positions × 32 filters.

These features are then refined by Conv 1D #2, maintaining the same filter dimensions, resulting in 131 positions × 32 filters. The output is then flattened into a 4192-value vector (

z_{f l a t} \in R^{4192}

) and passed to a Dense layer with 64 neurons, which is a fully connected layer with seed-dependent weights and biases:

W_{i j}^{(s)} \in R^{4192 \times 64}

,

b_{i}^{(s)} \in R^{64}

; this results in a final output of

z = R e L u (W \cdot z_{f l a t} + b) \in R^{64}

(2)

where ReLU (Rectified Linear Unit) represents a non-linear activation function commonly used in machine learning models.

Scalar Input Branch: A DNN for scalar inputs ( $x_{s}$ ).

AoA and a Reynolds number are processed through Dense Scalar, a fully connected layer with 16 neurons. This component is meant to capture interactions between flight conditions and aerodynamic response. The output of the Scalar Input Branch, s, is represented by a set of 16 neurons with seed-dependent weights and biases:

W_{i j}^{(s)} \in R^{2 \times 16}

,

b_{i}^{(s)} \in R^{16}

:

s = R e L u (W \cdot x_{s} + b) \in R^{16}

(3)

Engineered Feature Branch: A second DNN for engineered features ( $x_{e}$ ).

Engineered features such as maximum thickness and camber are processed by Dense Engineered, a 16-unit dense layer. This branch encodes the structural characteristics of the airfoil. The output, e, is similar to the Scalar Input Branch, consisting of 16 neurons with seed-dependent weights and biases of the following dimensions:

W_{i j}^{(s)} \in R^{2 \times 16}

,

b_{i}^{(s)} \in R^{16}

.

e = R e L u (W \cdot x_{e} + b) \in R^{16}

(4)

Once all three of the earlier mentioned branches compile, the outputs are concatenated into a single 96-feature vector,

h = [z, s, e] \in R^{96}

. The vector is then fed into a 64-neuron layer, entitled Fusion Dense, which learns cross-domain interactions:

H = R e L u (W \cdot h + b) \in R^{64}

(5)

The fused representation is then branched again for final outputs into the following:

The CL Output Head, which includes CL Dense #1 and CL Dense #2, is responsible for computing the lift coefficient.

CL Dense #1 is a 32-neuron layer that refines the fusion vector for lift-specific features.

\bar{C_{L}} = R e L u (W \cdot H + b) \in R^{32}

(6)

The second layer, CL Dense #2, reduces the output to a single neuron, representing the lift coefficient

\overset{̿}{C_{L}}

:

\overset{̿}{C_{L}} = R e L u (W \cdot \bar{C_{L}} + b) \in R

(7)

The CD Output Head, which follows a parallel architecture with CD Dense #1 and CD Dense #2, predicts the drag coefficient $\overset{̿}{C_{D}}$ :

$\overset{̿}{C_{D}} = R e L u (W \cdot \bar{C_{D}} + b) \in R$

(8)

After the predictions are made, the algorithm compares them to the true values of

C_{L}

and

C_{D}

using a mean squared error (MSE) loss function for each input sample in a mini-batch:

L = \frac{1}{N} \sum_{i = 1}^{N} [{(c_{L}^{(i)} - \overset{̿}{c_{L}^{(𝚤)}})}^{2} + {(c_{D}^{(i)} - \overset{̿}{c_{D}^{(𝚤)}})}^{2}]

(9)

where N represents the number of training samples in a batch,

\overset{̿}{c_{L}^{(𝚤)}}

and

\overset{̿}{c_{D}^{(𝚤)}}

are the predictions of the i-th sample, and

c_{L}^{(i)}

and

c_{D}^{(i)}

are the correct aerodynamic performance values obtained from XFOIL.

Once the error is computed, a backpropagation calibration begins to update all weights and biases to minimize the prediction error. The training is performed in mini-batches, and for each mini-batch, the network goes through a forward pass, then a backward pass, adjusting each time.

2.2. General Architecture of a Model

The following diagram presents the architectural flow of the deep hybrid model used to predict aerodynamic coefficients (Figure 1).

In terms of information flow, the gray solid arrows indicate the feedforward path, showing how data progress through the network. The dotted arrows represent the backpropagation process, where the loss gradients are propagated backward from the outputs through each, later updating weights and biases.

Regarding configuration, there are 455 mini-batches per epoch, and the model runs 100 epochs per seed; therefore, there are:

A total of 45,500 feedforward passes and 45,500 backpropagation steps for a single seed configuration.
A total of 136,500 feedforward passes and 136,500 backpropagation steps for a model ensemble of three different seeds.

2.3. Seed Randomness

When it is not controlled, the training process is influenced by a random seed

s ϵ N

, which controls multiple stochastic processes, such as:

Weights and bias initialization: Each layer weight tensor W is initialized using a pseudorandom number generator that is seeded with s.
Data shuffling and splitting into train/test: The seed determines the permutation $π_{s}$ , used to shuffle and split the dataset into training and validation sets.
Mini-batch shuffling: The order of mini-batch construction per epoch depends on s.

2.3.1. Seed Influence on Weights and Biases

The Glorot Uniform Initialization method [29] is a widely used weight initializer that stabilizes the training of deep networks by keeping the variance of activations approximately constant across layers. Weights are drawn from a uniform distribution:

W_{i j} ~ u [- \frac{\sqrt{6}}{\sqrt{n_{i n} + n_{o u t}}}, \frac{\sqrt{6}}{\sqrt{n_{i n} + n_{o u t}}}]

(10)

where

n_{i n}

represents the number of inputs,

n_{o u t}

represents the number of outputs, and

u

(continuous uniform distribution) represents a probability distribution where every number in the interval has an equal chance of being picked. Equation (10) can be rewritten by a variable, a, which defines the range of values from each weight

W_{i j}

, which is sampled as follows:

W_{i j} ~ u (- a, + a)

(11)

After the Glorot limit a in Equation (11) is computed, the PRNG is initialized to a specific state given the seed that is allocated (Group A/B). PRNG is a deterministic function that takes seeds to initialize an internal state, and then produces a sequence of outputs:

{P R N G}_{s} : N ⟶ [0,1)

(12)

The function produces a list of values generated deterministically from seed s:

{P R N G}_{s} = [ψ_{0}^{(s)}, ψ_{1}^{(s)}, ψ_{2}^{(s)}, \dots]

(13)

where each

ψ_{k}^{(s)} \in [0,1)

is the k-th number in the sequence generated under seed s. These values are then linearly transformed into the Glorot range:

W_{i j}^{(s)} = 2 a \cdot ψ_{k}^{(s)} - a \in (- a, + a)

(14)

Just like weights, biases in each layer are initialized using PRNG seeded with a fixed value s:

b_{i}^{(s)} = 2 a \cdot ψ_{i}^{(s)} - a \in (- a, + a)

(15)

PRNGs are generated as massive deterministic sequences, and their values are consumed in order throughout the algorithm to train the model. Each PRNG call consumes one float and is used exactly once per weight/bias.

When the same seed is used for model training, the entire stochastic process, including weight initialization, data shuffling, and batch ordering, becomes deterministic, leading to identical results. However, changing the seed alters this pseudo-random trajectory, producing different outcomes.

In this current work, Philox PRNG is used, which supports up to

2^{64}

distinct outputs per seed. The random seed itself is a 32-bit unsigned integer, allowing for values from 0 to 2³² − 1, providing over four billion unique initialization states. Each time a new weight or a bias is needed, Philox uses the defined seed s and an incrementing internal counter to obtain new random values (the same principle as in Equation (13)).

The following Table 1 illustrates how the chosen seed influences the initialization of weights and biases across the entire machine learning architecture, as previously described in Section 2.1. In total, 282,210 PRNGs are used for single seed configuration models on initializing weights and biases.

2.3.2. Seed Influence on Data Shuffling

As mentioned earlier, in total, there are 18,200 samples available. To randomly shuffle 18,200 elements, one of all possible permutations is chosen, while there are 18,200 factorial ways to arrange the samples. Entropy is used to process how many bits are needed to represent all the possible outcomes of a random process:

H = \log_{2} (18,200!) = 257,530

(16)

Therefore, in order to choose one random shuffle of 18,200 elements, there are 257,530 bits of entropy needed. Each PRNG gives 32 bits; therefore, 8048 PRNG values are required for a single seed configuration.

This data shuffling is performed once, in order to randomly permute all 18,200 samples and select 80% for training, while 20% remain available for validation. Therefore, in order to reduce computational cost and focus on seed sensitivity, the model uses a simple train/test split for one full run, which randomly divides the dataset into training and validation sets.

2.3.3. Seed Influence on Mini-Batch Shuffling

Unlike the weight initialization (presented in Section 2.3.1), shuffling affects the path the model takes through the optimization of the results in backpropagation.

Before each epoch, the training data are partially shuffled in mini-batch shuffling. Therefore, each epoch reshuffles 14,560 training samples (80% of training data) into batches. Similarly to Equation (16),

\log_{2} (14,560!) = 201,421

bits of entropy are needed. In total, 6294 floats of 32-bit PRNGs are necessary across one epoch.

Since the model uses 100 epochs, for a single seed configuration, there are 629,400 PRNGs necessary.

In total, considering the influence of weight and bias initialization, initial data shuffling, train/test splitting, and mini-batch generation and shuffling, the model needs approximately 919,658 PRNG values, all of which are deterministically dependent on the chosen seed.

2.4. Model Configuration

In this work, two single-seed groups and an ensemble seed group are used to study variance and generalization:

Group A: {42, 43, 44, 45, 46}, single seeds chosen to show performance consistency, linearity, and stability across nearby seeds.
Group B: {0, 1, 123, 777, 999}, single seeds intentionally diverse so as to stress-test sensitivity and non-linearity under different initialization states.
Group C: {(17, 89, 257), (610, 987, 75025), ( $2^{8}$ , $2^{16}$ , $2^{32} - 1$ )}, and the best/worst performing sets computed from Groups A and B—ensemble models constructed on three seeds—are chosen to evaluate the performance and stability of ensemble methods under significant seed variation.

The model is trained using the Adam (Adaptive Moment Estimation) optimizer over 100 epochs. The default learning rate of the Adam optimizer is used (0.001). In each epoch, the training data are shuffled with PRNG calls determined by the set seed. Validation loss is computed after each epoch.

For Group C, the model is trained as an ensemble of three models. All models share the same architecture and training data but differ in initial weights due to different seed inputs. If each model can be defined as a function,

f_{s_{i n}} (x)

, which maps input

x

to an outproduced prediction based on the learned parameters from seed

s_{i n}

, then the ensemble average function equation becomes

\bar{f} (x) = \frac{1}{3} \sum_{i = 1}^{3} f_{s_{i n}} (x)

(17)

3. Results

To rigorously assess the sensitivity, stability, and reproducibility of the aerodynamic predictions, the results are structured according to three seed-based experimental groups, as detailed in Section 2.4. This analysis covers both single-seed (Groups A and B) and ensemble-based (Group C) setups designed to probe the deterministic effects of pseudo-random initialization in an aerodynamic prediction application.

The performance metrics reported in this section are obtained by evaluating the trained models on a dedicated test set of 21 NACA four-digit airfoils that were completely excluded from the training process. This separation ensures that the results reflect the model’s true generalization capability across unseen geometries and flow scenarios. For each airfoil, predictions include both lift and drag coefficients across 10 distinct Reynolds numbers and 20 AoA, enabling a comprehensive assessment of generalization under varied flow conditions. In the following Table 2, the airfoil sets used for training and testing the model are presented:

3.1. Group A—Consecutive Single Seeds

Group A is designed to assess the consistency and sensitivity of the model to minimal variations in random initializations by using five consecutive seed values: 42, 43, 44, 45, and 46. The purpose is to evaluate whether the model exhibits stable predictive behavior when initialized with numerically adjacent seeds.

The general results, presented in Table 3, indicate that 45 seeds reach MAPE values below 2% for both aerodynamic coefficients predictions. Seed 46 achieves the best outcome, with 1.27% MAPE for CL and 1.19% for CD. Seed 43 slightly exceeds the 2% threshold for CD (2.02%) but remains within the expected range for CL (1.41%). Overall, the model exhibits consistent performance in predicting lift coefficients, while drag predictions remain more sensitive to variations in initialization.

From Table 4, a distinct pattern emerges among the worst-performing cases in drag prediction: NACA 6614 (3.8% MAPE), NACA 5615 (3.61%), and NACA 5312 (2.54%). Although these airfoils share a common geometric trait, high camber, this alone does not justify the poor drag prediction performance. The variability in the results across seeds reveals the strong influence of random initialization, highlighting the role of seed selection as a non-negligible factor in ML workflow.

The

R^{2}

values for

C_{L}

remain consistently high across all seeds (≥0.99), confirming stable model performance for lift prediction. However,

R^{2}

for

C_{D}

varies more (from 0.97 to 0.99), indicating higher sensitivity to seed initialization.

Overall, the best performing seed of Group A is seed 42. The results suggest partial consistency in the model’s performance across consecutive seeds. However, those consecutive seeds expose asymmetries in predictions, where:

-: Lift coefficients are predicted to be relatively stable across seeds, with MAPE values ranging from 1.27% to 1.86% (band of $\pm 0.3 %$ ).
-: Drag coefficients fluctuate and are sensitive to the selected seeds, with MAPE values ranging from 0.84% to 2.02%

3.2. Group B—Diverse Single Seeds

Group B is designed to stress-test the model’s sensitivity to random initialization by breaking away from uniform sampling, unlike Group A’s structure. By introducing nonlinear, uncorrelated seed values, this group probes how much variability exists in performance when the seed space is sampled with no continuity, thereby evaluating the stability and generalization consistency of the model.

As presented in Table 5, the results show a good degree of stability in both lift and drag predictions across the group. In terms of lift, the model maintains a MAPE between 1.08% and 1.97%, with seed 123 achieving the lowest error (1.08%) and seed 1 the highest (1.97%). For drag, prediction error remained largely controlled, with seed 0 and seed 999 achieving strong results (0.57%, respectively 0.58%).

These findings, presented in more detail in Table 6, suggest that the model exhibits a high level of resilience to initialization-induced variance. While some fluctuation is present, especially in seeds 1 and 777, the network consistently converges to low-error predictions. The three airfoils that previously had notable drag prediction instability in Group A (NACA 6614, NACA 5615, and NACA 5312) have an increased performance in the Group B seed scenario, which demonstrates that these cases do not intrinsically lead to high error.

The improvement across seeds in Group B supports the hypothesis that seed choice is a key factor in model variance, and that some seeds, like 123 or 777, may lead to higher errors for certain airfoils while others, such as 0 or 999, are more stable.

Seed 0 is the most consistent performer in Group B, with the lowest total combined MAPE across both aerodynamic coefficients.

In Group B (Table 5),

R^{2}

values improve slightly over Group A (Table 3), with better consistency for both

C_{L}

and, especially,

C_{D}

(up to 0.9974). This suggests that Group B seeds lead to more stable learning, particularly for drag, which is typically harder to predict due to higher noise.

3.3. Group C—Ensemble Models

In Group C, the focus is on evaluating the behavior and effectiveness of ensemble learning strategies, unlike Groups A and B, which assess individual seed performance and model sensitivity to random initialization. Group C tests the hypothesis that ensemble models trained on diverse/strategically selected seeds can yield better generalized predictions compared to single-seed models. The ensemble strategy used in this study follows a bagging approach, where multiple models are trained independently. Each model in the ensemble is initialized, similarly to the single seed configuration, with a different random seed. The final prediction is obtained by computing the arithmetic mean of the individual model outputs for each sample.

The ensemble models were constructed to explore the influence of seed selection strategies grounded in mathematical fundamentals, specifically:

Ensemble C1 was composed of seeds (17, 89, 257), selected as prime numbers, under the hypothesis that their inherent numerical irregularity may induce diverse initialization.
Ensemble C2 includes seeds (610, 987, 75025), which represent Fibonacci numbers—specifically, positions 5, 15, and 25 in the Fibonacci sequence. These seeds were selected to investigate whether growth-based sequence influences model convergence in an ensemble.
Ensemble C3, defined as ( $2^{8}$ , $2^{16}$ , $2^{32} - 1$ ), is formed by powers of two up to the 32-bit unsigned integer limit. This group was constructed to test edge behavior within the seed space and examine potential instabilities.

Based on the rationale for seed selection in Groups A and B and their evaluated performance, the results can be consolidated by separating the tested seeds into two subsets, the three best-performing and the three worst-performing, according to their generalization errors in both

C_{L}

and

C_{D}

; therefore:

Ensemble C4 is defined as the ensemble constructed from the best-performing seeds identified in Groups A and B: (0, 123, 999).
Ensemble C5 is formed from the worst-performing seeds, (43, 45, 777), in contrast with C4.

The overall model performance in the ensemble setting is presented in Table 7. All ensembles achieve high

R^{2}

values (>0.98), confirming robust overall performance.

The observed generalization error for

C_{L}

in ensemble C3, reaching 2.73%, prompted an investigation into the presence of possible black swan seeds. For the purpose of this study, a black swan is defined as any initialization that yields a generalization MAPE for

C_{L}

or

C_{D}

exceeding 2%, given that the majority of tested seeds, whether standalone (Groups A and B) or ensemble-based (Group C), consistently remained below this threshold.

Ensemble C3 was constructed using seed values derived from powers of two, extending to the upper limit of a 32-bit unsigned integer:

2^{8}

,

2^{16}

, and

2^{32} - 1

. When analyzed individually, the performance of each seed revealed the following:

Seed $2^{8}$ with a 712.02 s (11.87 min) run-time compiles a $C_{L}$ MAPE of 2.78% and a $C_{D}$ MAPE of 1.60%, confirming it as a black swan seed.
Seed $2^{16}$ with a 506.60 s (8.44 min) run-time compiles a $C_{L}$ MAPE of 1.24% and a $C_{D}$ MAPE of 1.00%, placing this seed within expected error margins.
Seed $2^{32} - 1$ with a 433.43 s (7.22 min) run-time compiles a $C_{L}$ MAPE of 2.78%% and a $C_{D}$ MAPE of 1.60%, and is black swan seed with a similar convergence path as seed $2^{8}$ .

The matching performance of seeds

2^{8}

and

2^{32} - 1

is of particular interest. Despite their numeric disparity, they led to the same final error metrics, suggesting that these two seeds initialized the model into effectively the same region of the solution space. A closer inspection of the per-airfoil results reveals substantial differences in their error distribution across the 21 excluded airfoils; therefore, in this case, the global mean errors align, while the local prediction behaviors diverge.

Furthermore, the detailed results from C3, C4, and C5, as presented in Table 8, illustrate a key principle in ensemble modeling: the MAPE of an ensemble is not the arithmetic mean of the individual MAPEs. Instead, as in Equation (17), the ensemble performance is governed by the interaction of the individual predictions. Group C’s best-performing ensemble is represented by C2, which achieves a MAPE

C_{L}

of 1.43% and a MAPE

C_{D}

of 1.19%.

4. Discussion

This study investigates the influence of random seed initialization and ensemble configuration on the reliability and predictive accuracy of deep learning models for aerodynamic performance, while also introducing a custom-built deep neural network architecture developed and described in detail. A dataset of 112 NACA four-digit airfoils, each defined by 135 coordinate points, was generated to support the training and evaluation process. For each airfoil, aerodynamic data were produced using XFOIL; the input sequence used to run XFOIL is provided in Appendix A.2—XFOIL Command Sequence for 10 Reynolds numbers ranging from 500,000 to 5,000,000 in 500,000 increments, corresponding to a target speed range of approximately 7 m/s to 78 m/s in standard air conditions. Furthermore, each configuration was evaluated over 20 AoA uniformly distributed between

- 5 °

and

14 °

. While XFOIL was selected for its computational efficiency, it is known to exhibit reduced accuracy under certain conditions, particularly near stall or in cases of complex boundary layer behavior. As such, the entire dataset inherits the limitations of the underlying simulation tool. Nevertheless, the focus of this present work lies in the machine learning framework, not in the fidelity of the solver itself, and the methodology remains extensible to higher-fidelity datasets such as CFD or experimental measurements in future stages.

The novelty of this work does not lie in the aerodynamic prediction itself, which is a field already well established, but rather, in the in-depth investigation of how random seed initialization impacts model reproducibility and prediction stability. This study is among the first to share and compare multiple results from the same neural network architecture, showing the range of predictive errors that can emerge solely due to the choice of seed.

For identical datasets and code,

C_{L}

prediction errors (MAPE) range from as low as 1.08% (Group B, seed 123) to as high as 2.72% (Group C, ensemble C3). For

C_{D}

predictions, the model performance spans an even wider range, with errors from as low as 0.57% (Group B, seed 0) up to 2.62% (Group C, ensemble C1). On average, this indicates a variation of over 250% between the best and worst outcomes, underscoring how heavily reproducibility and accuracy are influenced by seed selection.

Among all individually tested seeds across Groups A and B, seed 0 demonstrates the best numerical accuracy, as it achieves a MAPE

C_{L}

of 1.1% and

C_{D}

of 0.57%, which corresponds to above 0.99

R^{2}

values. While not surpassing seed 0 in raw accuracy in MAPE, achieving an overall MAPE

C_{L}

of 1.43% and

C_{D}

of 1.19%, as illustrated per-airfoil detail in Figure 2 for

C_{L}

and Figure 3 for

C_{D}

, ensemble C2 outperforms it in

R^{2}

values, which are also above 0.99; however, this is slightly better than seed 0. The

R^{2}

representation is shown in Figure 4 for

C_{L}

and Figure 5 for

C_{D}

.

When analyzing both

C_{L}

(Figure 2) and

C_{D}

(Figure 3) MAPE distribution across all tested airfoils—where each error value represents the aggregated prediction error across 10 Reynolds numbers and 20 AoA per Reynolds number—it becomes evident that seed 0 consistently delivers lower and more uniform errors. The single seed-based model not only outperforms ensemble C2 in terms of average accuracy but also avoids the pronounced local spikes that the ensemble occasionally exhibits. However, ensembles have the potential to outperform even the best-performing single seeds if the individual seeds within the ensemble can capture exceptionally well the complementary characteristics of the dataset (e.g., one seed may predict more accurately at a low AoA, while another excels at a high AoA). A potential limitation of single-seed models could be evident when the dataset is expanded to include more complex geometries, such as NACA five-digit airfoils with more challenging camber lines. In such scenarios, ensembling can help mitigate localized weaknesses by averaging out inconsistent predictions. Thus, ensemble learning introduces a stabilizing effect that enhances generalization in ML models.

In regard to tracking the coefficient of determination values applied to the best performing single seed (seed 0) and ensemble (C2), the

C_{L}

evaluation is shown in Figure 4, while Figure 5 presents the

C_{D}

evaluation. For lift coefficient prediction, both models achieve very high

R^{2}

values across all airfoils, but seed 0 shows a slight drop for the last few airfoils while C2 maintains more consistent performance. For drag coefficient predictions, the difference is more visible between the two models tested, as C2 outperforms seed 0 on most airfoils, with higher and smoother

R^{2}

values. Overall, applied to all single seeds and ensembles tested in the current work, despite the variability in

R^{2}

for

C_{D}

, all values remain above 0.978, indicating that the model still captures drag trends well. In fact, for the ML architecture tested in this paper,

R^{2}

values are consistently above 0.99 for most cases, confirming that the model performs with high reliability, even for the more sensitive drag predictions.

Exploring all possible seed values is computationally infeasible, as a 32-bit seed space comprises more than four billion options. Given the performance spread observed, it is plausible that better-performing seeds exist, but finding them would require significant computational effort. As such, the selected seeds tested in this study are meant to offer a practical glimpse into the variability introduced by PRNGs. Moreover, this study has also revealed the presence of black swan seeds, those that result in significantly worse predictions despite identical training settings. Their influence is visible in ensemble C3

C_{L}

prediction, where the inclusion of seeds

2^{8}

and

2^{32} - 1

led to the worst performance observed across all models tested. It is likely that more black swans exist in the unexplored seed space, highlighting the unpredictable chaos of deep learning models without seed initialization.

5. Conclusions

In addition to absolute accuracy assessed via MAPE and presented in Table 4, Table 6 and Table 8, R² values were computed for both the best- and worst-performing individual seeds and ensemble configurations, as shown in Table 9. The best performing single seed model (seed 0) achieves the lowest overall MAPE (0.83%) with strong

R^{2}

values, summing up an overall 0.9976. In contrast, the worst performing single seed (seed 45) yields a higher overall MAPE of 1.89% and a reduced

R^{2}

for drag coefficient prediction (0.9784).

Among ensemble models, C2 (seeds 610, 987, 75025) offers the most favorable performance, combining a low overall MAPE of 1.31% with the highest

R^{2}

value observed (0.9999 for

C_{L}

and 0.9968 for

C_{D}

), reflecting better generalization than seed 0. The weakest ensemble, C1 (seeds 17, 89, and 257), exhibits the highest error (2.2% overall MAPE) and comparatively lower reliability in

R^{2}

0.9942.

In this study, a total of 91 NACA four-digit airfoils were used for training (details regarding the train/test manual split can be found in Table 2), where for each airfoil, the aerodynamic coefficients were obtained using XFOIL across 10 Reynolds numbers and 20 AoA, resulting in 200 aerodynamic data samples per airfoil. Altogether, 18,299 samples were generated and used as input for training and calibrating the machine learning architecture. In addition to the training set, an independent set of 21 NACA four-digit airfoils, not seen during training, was used exclusively for testing the model performance. Throughout this study, the model was tested with 10 different single seeds and five ensembles in order to investigate the hypothesis that random seed selection plays a significant role in the calibration and predictive behavior of machine learning architectures. Although this study uses a custom-generated dataset of NACA four-digit airfoils with standardized geometry inputs, the model architecture is fully adaptable to benchmark datasets, such as those from CFD repositories; for example, NASA PALMO [30] or RANS alternatives [31]. This adaptability stems from the generic nature of the model inputs, which rely on resampled coordinate data and standard aerodynamic features.

While both MAPE and

R^{2}

are used to evaluate model performance, MAPE is particularly relevant in the context of aerodynamic prediction, where small numerical deviations in lift and drag coefficients can have significant implications. Therefore, for the scope of this study, lower MAPE values reflect higher predictive precision. Nevertheless, all tested models, regardless of seed or ensemble configuration, maintain

R^{2}

values exceeding 0.97. This consistently high coefficient of determination demonstrates that the presented architecture is reliable and well-suited to the described dataset, effectively capturing the global variance in the data, which suggests strong generalization across the wide range of aerodynamic conditions considered. The contribution of low MAPE and consistently high

R^{2}

indicates both numerical accuracy and strong generalization; therefore, the dataset presented is representative, the features are informative, and the model architecture overall is not overfitting. From a practical engineering standpoint, the proposed framework can be readily integrated into early-stage aerodynamic design workflows. Potential applications include rapid airfoil screening for UAV conceptual design, optimization loops for low-speed aircraft, and preliminary performance estimation in parametric studies.

This study presents a novel and highly relevant investigation within the field of machine learning for aerodynamic prediction, where both sensitivity and numerical precision are critical for evaluating model performance. Particular emphasis is placed on MAPE as a primary metric of predictive accuracy, which is especially important in aerodynamics due to the tight performance tolerances usually required in aerospace engineering applications. Nevertheless, the limitations of this present study stem primarily from the aerodynamic side, as the entire dataset is generated using XFOIL, a tool known to be less reliable in flow regimes involving separation or high angles of attack. This reliance on simulation-based data limits the real-world applicability of the results. From a machine learning perspective, although the proposed architecture has proven effective, it has not yet been extended to more complex three-dimensional cases. Additionally, ensemble modeling, while beneficial for accuracy, introduces increased computational costs, which may pose constraints in large-scale applications.

The analysis in this paper highlights the critical role of random seed selection in ML-based aerodynamic prediction. Despite using an identical architecture and dataset, prediction errors vary substantially from overall MAPEs of 0.83% to 2.2%, depending solely on the choice of seed, whether single or ensemble. As shown in the values in Table 9, this results in an approximate 163% difference considering only the best/worst performing sets. However, as highlighted in Section 4, seeds can impact the performance of specific branches (lift or drag ML branches); therefore, this study identified cases in which fluctuation reaches 151.8% for

C_{L}

prediction and 359.65% for

C_{D}

prediction, summing up an overall MAPE fluctuation of 250%.

Moreover, seeds are generally responsible for ensuring reproducibility in machine learning architectures; otherwise, each run of the same model will default to a randomly selected seed from the 32-bit integer space

[0, 2^{32} - 1]

available, leading to different results every time.

In conclusion, controlling seed values is a necessary step in achieving the highest outcomes possible given a balanced ML architecture and a sized database. As such, evaluating multiple seeds, or using ensemble strategies, should be considered best practice in ML-based aerodynamic prediction workflows.

Author Contributions

Conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing—original draft preparation, writing—review and editing, D.-A.S.; visualization, supervision, writing—review and editing, L.-T.G., G.C., D.M. and C.-M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

1D	One-Dimensional
ADAM	Adaptive Moment Estimation
AI	Artificial Intelligence
AoA	Angle of Attack
CFD	Computational Fluid Dynamics
CNN	Convolutional Neural Network
DNN	Deep Neural Network
GNN	Graph Neural Network
N	Mean Absolute Percentage Error
ML	Machine Learning
MLP	Multilayer Perceptron
NACA	National Advisory Committee for Aeronautics
NASA	National Aeronautics and Space Administration
PRNG	Pseudo-Random Number Generator
$R^{2}$	Coefficient of Determination
Re	Reynolds Number
ReLU	Rectified Linear Unit
TE	Trailing Edge
UAV	Unmanned Aerial Vehicle

Appendix A

Appendix A.1. NACA Four-Digit Generation

The NACA four-digit airfoil series, developed by the National Advisory Committee for Aeronautics (now NASA), defines airfoil shapes based on a four-digit code in the format NACA XYZZ, where

X \in [0,9]

, maximum camber (% of chord);

Y \in [0,9]

, position of maximum camber (% of chord); and

Z \in [1,40]

, maximum thickness (% of chord).

To generate the airfoil, the x-coordinate distribution can be set linearly by

x = [0,1]

or by using cosine spacing, where

x = (1 - \cos β) / 2

,

β = [0, π]

.

The thickness distribution is computed as follows:

y_{t} = \frac{Z Z}{0.2} (a_{0} x^{0.5} + a_{1} x + a_{2} x^{2} + a_{3} x^{3} + a_{4} x^{4})

(A1)

where the constants are the following:

a_{0} = 0.2969, a_{1} = - 0.126, a_{2} = - 0.3516, a_{3} = 0.2843 and a_{4} = - 0.1015 f o r a n o p e n e d T E o r a_{4} = - 0.1036

for a closed TE.

The equations for the camber line are constructed with respect to Y, the position of the maximum camber, as follows:

\{\begin{matrix} \frac{d y_{c}}{d x} = \frac{2 X}{Y^{2}} (Y - x), x = [0, Y) \\ \frac{d y_{c}}{d x} = \frac{2 X}{{(1 - Y)}^{2}} (Y - x), x = [Y, 1] \end{matrix}

(A2)

The airfoil geometry is calculated for the upper surface by

y_{u} = y_{c} + y_{t} \cos θ

, and for the lower surface by

y_{l} = y_{c} - y_{t} \cos θ

.

Appendix A.2. XFOIL Command Sequence

The following command sequence was used to generate aerodynamic data for each airfoil using XFOIL, alternating the Reynolds numbers as described in the main text:

LOAD <filename>
PPAR → N → <number of panel nodes>
OPER
VISC <Reynolds number>
ITER <maximum number of iterations per AoA>
VPAR → N <number of Newton iterations for tuning>
PACC → <save files>
ASEQ <minimum AoA, maximum AoA, increment>

Appendix A.3. Computational Resources

Each model (single-seed or ensemble) was trained independently. Average training durations ranged from approximately 6 to 15 min per model, depending on seed and ensemble complexity.

Hardware configuration: Processor—Intel64 Family 6 Model 170 Stepping 4 (Genuine Intel; Intel Corporation, Santa Clara, CA, USA), Machine Type—AMD64 Architecture, CPU Cores—22 (AMD, Santa Clara, CA, USA).
Software configuration: Operating System—Microsoft Windows 11, Python 3.11.9, TensorFlow Library 2.17.0.

References

Galeazzo, F.C.C.; Garcia-Gasulla, M.; Boella, E.; Pocurull, J.; Lesnik, S.; Rusche, H.; Bnà, S.; Cerminara, M.; Brogi, F.; Marchetti, F.; et al. Performance Comparison of CFD Microbenchmarks on Diverse HPC Architectures. Computers 2024, 13, 115. [Google Scholar] [CrossRef]
Cooper-Baldock, Z.; Vara Almirall, B.; Inthavong, K. Speed, Power and Cost Implications for GPU Acceleration of Computational Fluid Dynamics on HPC Systems. arXiv 2024. [Google Scholar] [CrossRef]
Günel, M.; Koç, Z.; Yavuz, M. Comparison of CFD and XFOIL Airfoil Analyses for Low Reynolds Number. In Proceedings of the 3rd International Symposium on Innovative Technologies in Engineering and Science (ISITES2016), Pecs, Hungary, 1–3 September 2016; pp. 857–867. [Google Scholar]
Le Clainche, S.; Ferrer, E.; Gibson, S.; Cross, E.; Parente, A.; Vinuesa, R. Improving aircraft performance using machine learning: A review. Aerosp. Sci. Technol. 2023, 138, 108354. [Google Scholar]
Zehtabiyan-Rezaie, N.; Iosifidis, A.; Abkar, M. Data-driven fluid mechanics of wind farms: A review. J. Renew. Sustain. Energy 2022, 14, 032703. [Google Scholar]
Kurunathan, H.; Huang, H.; Li, K.; Ni, W.; Hossain, E. Machine learning-aided operations and communications of unmanned aerial vehicles: A contemporary survey. IEEE Commun. Surv. Tutor. 2023, 26, 496–533. [Google Scholar]
Haque, A.; Chowdhury, M.N.U.R.; Hassanalian, M. A Review of Classification and Application of Machine Learning in Drone Technology. AI Comput. Sci. Robot. Technol. 2025, 4, 1–32. [Google Scholar] [CrossRef]
Malecha, Z.; Sobczwyk, A. Using Artificial Intelligence to Predict the Aerodynamic Properties of Wind Turbine Profiles. Computers 2024, 13, 167. [Google Scholar] [CrossRef]
Bakar, A.; Li, K.; Liu, H.; Xu, Z.; Alessandrini, M.; Wen, D. Multi-Objective Optimization of Low Reynolds Number Airfoil Using Convolutional Neural Network and Non-Dominated Sorting Genetic Algorithm. Aerospace 2022, 9, 35. [Google Scholar] [CrossRef]
Zuo, K.; Bu, S.; Zhang, W.; Hu, J.; Ye, Z.; Yuan, X. Fast sparse flow field prediction around airfoils via multi-head perceptron based deep learning architecture. Aerosp. Sci. Technol. 2022, 130, 107942. [Google Scholar] [CrossRef]
Nelson, A.D.; Godfrey, A. Predicting Two-Dimensional Airfoil Performance Using Graph Neural Networks 2023 (NASA Technical Report No. NASA/TM–20220006290). NASA. Available online: https://ntrs.nasa.gov/api/citations/20220006290/downloads/TM-20220006290.pdf (accessed on 20 June 2025).
Negoita, M.-F.; Hothazaie, M.-V. A Machine Learning-Based Approach for Predicting Aerodynamic Coefficients Using Deep Neural Networks and CFD Data. INCAS 2024, 16, 91–104. [Google Scholar] [CrossRef]
Ahmed, S.; Kamal, K.; Ratlamwala, T.A.H.; Mathavan, S.; Hussain, G.; Alkahtani, M.; Alsultan, M.B.M. Aerodynamic Analyses of Airfoils Using Machine Learning as an Alternative to RANS Simulation. Appl. Sci. 2022, 12, 5194. [Google Scholar] [CrossRef]
Du, B.; Shen, E.; Wu, J.; Guo, T.; Lu, Z.; Zhou, D. Aerodynamic Prediction and Design Optimization Using Multi-Fidelity Deep Neural Network. Aerospace 2025, 12, 292. [Google Scholar] [CrossRef]
Murata, T.; Fukami, K.; Fukagata, K. Nonlinear mode decomposition with convolutional neural networks for fluid dynamics. J. Fluid Mech. 2020, 882, A13. [Google Scholar] [CrossRef]
Zhang, X.-L.; Xiao, H.; Luo, X.; He, G. Ensemble Kalman method for learning turbulence models from indirect observation data. J. Fluid Mech. 2022, 949, A26. [Google Scholar] [CrossRef]
Zhou, H.; Savova, G.; Wang, L. Assessing the Macro and Micro Effects of Random Seeds on Fine-Tuning Large Language Models. arXiv 2025. [Google Scholar] [CrossRef]
Dutta, S.; Arunachalam, A.; Misailovic, S. To Seed or Not to Seed? An Empirical Analysis of Usage of Seeds for Testing in Machine Learning Projects. University of Illinois at Urbana-Champaign. In Proceedings of the 2022 IEEE Conference on Software Testing, Verification and Validation (ICST), Valencia, Spain, 4–14 April 2022. [Google Scholar]
Picard, D. torch.manual_seed (3407) Is All You Need: On the Influence of Random Seeds in Deep Learning Architectures for Computer Vision. arXiv 2023. [Google Scholar] [CrossRef]
Andrés-Pérez, E.; Paulete-Periáñez, C. On the application of surrogate regression models for aerodynamic coefficient prediction. Complex Intell. Syst. 2021, 7, 1991–2021. [Google Scholar] [CrossRef]
da Silva, A.F.C.; Colonius, T. Ensemble-based State Estimator for Aerodynamic Flows. AIAA J. 2018, 56, 2568–2578. [Google Scholar] [CrossRef]
Kumar, A.; Ghosh, A.K. Ensemble Machine Learning Methods for Unsteady Aerodynamics Modeling from Flight Data. In Proceedings of the 2024 10th International Conference on Control, Automation and Robotics (ICCAR), Orchard District, Singapore, 27–29 April 2024; pp. 219–225. [Google Scholar] [CrossRef]
Zhang, L.; Yu, M.; Chen, H.; Li, Y. A stacking-based ensemble prediction method for multiobjective aerodynamic optimization of high-speed train nose shape. Adv. Eng. Softw. 2024, 228, 105580. [Google Scholar] [CrossRef]
Li, J.; Du, X.; Martins, J.R.R.A. Machine learning in aerodynamic shape optimization. Prog. Aerosp. Sci. 2022, 134, 100849. [Google Scholar] [CrossRef]
Saetta, E.; Tognaccini, R.; Iaccarino, G. Machine Learning to Predict Aerodynamic Stall. Int. J. Comput. Fluid Dyn. 2022, 36, 641–654. [Google Scholar] [CrossRef]
Sabater, C.; Sturmer, P.; Bekemeyer, P. Fast Predictions of Aircraft Aerodynamics Using Deep-Learning Techniques. AIAA 2022, 60, 5249–5261. [Google Scholar] [CrossRef]
Zhang, Z.-Q.; Li, P.-J.; Li, Q.-L.; Dong, X.; Lu, X.-G.; Zhang, Y.-F. Dynamic Machine Learning Global Optimization Algorithm and Its Application to Aerodynamics. AIAA 2023, 39, 524–539. [Google Scholar] [CrossRef]
Petrov, D.; Golev, A.; Moskovtsev, A. The Application of Ensemble Machine Learning Methods for Construction of Surrogate Models in Problems of Preliminary Design of an Aircraft Wing Airfoil. In Proceedings of the 2024 17th International Conference on Management of Large-Scale System Development (MLSD), Moscow, Russia, 24–26 September 2024; pp. 1–4. [Google Scholar] [CrossRef]
Glorot, X.; Bengio, Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 249–256. [Google Scholar]
Cornelius, J.K.; Peters, N.; Ågren, T.; Nieves Lugo, D. PALMO: An OVERFLOW Machine Learning Airfoil Performance Database. In Proceedings of the AIAA SCITECH 2025 Forum, Orlando, FL, USA, 6–10 January 2025. [Google Scholar]
Bonnet, F.; Mazari, J.A.; Cinnella, P.; Gallinari, P. AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier–Stokes Solutions. NeurIPS Track Datasets Benchmarks 2025, 35, 23463–23478. [Google Scholar]

Figure 1. Model architecture in single seed configuration.

Figure 2. Comparison of MAPE CL for seed 0 and ensemble C2.

Figure 3. Comparison of MAPE CD for seed 0 and ensemble C2.

Figure 4. Comparison of

R^{2}

CL for seed 0 and ensemble C2.

Figure 4. Comparison of

R^{2}

CL for seed 0 and ensemble C2.

Figure 5. Comparison of

R^{2}

CD for seed 0 and ensemble C2.

Figure 5. Comparison of

R^{2}

CD for seed 0 and ensemble C2.

Table 1.

{P R N G}_{s}

usage per layer.

Table 1.

{P R N G}_{s}

usage per layer.

Layer	Weights Shape	Bias Shape	${P R N G}_{s}$ Values Used
Conv 1D #1	(3, 2, 32) Kernel width (3), input channels (2 x-y), output filters (32)	(32) One bias per output filter	224
Conv 1D #2	(3, 32, 32) Kernel width (3), input channels (32 from previous layer), output filters (32)	(32) One bias per output filter	3104
Dense	(4192, 64) Flattened CNN features (131, 32) mapped to 64 neurons	(64) One bias per neuron	268,352
Dense Scalar	(2, 16) AoA and Re mapped to 16 neurons	(16) One bias per neuron	48
Dense Engineered	(2, 16) $t_{m a x}$ and camber mapped to 16 neurons	(16) One bias per neuron	48
Fusion Dense	(96, 64) Concatenation of 64 CNN, 16 scalar, and 16 engineered mapped to 64 neurons	(64) One bias per neuron	6208
CL Dense #1	(64, 32) Learning intermediate non-linear representation	(32) One bias per neuron	2080
CL Dense #2	(32, 1) Producing a scalar, $C_{L}$	(1) Single bias for output	33
CD Dense #1	(64, 32) Learning intermediate non-linear representation	(32) One bias per neuron	2080
CD Dense #2	(32, 1) Producing a scalar, $C_{D}$	(1) Single bias for output	33

Table 2. Airfoil sets used for training and testing, categorized by maximum camber.

Airfoil Set	NACA Codes	Number of Airfoils
Testing mixed dataset	0314, 0415, 0412, 0612, 1313, 1512, 1615, 2315, 2412, 2613, 3512, 3613, 3412, 4314, 4512, 4615, 5312, 5412, 5615, 5615, 6614, 6415	21
Training with 0% max camber	0312, 0313, 0315, 0413, 0414, 0512, 0513, 0514, 0515, 0613, 0614, 0615	12
Training with 1% max camber	1312, 1314, 1315, 1412, 1413, 1414, 1415, 1513, 1514, 1515, 1612, 1613, 1614	13
Training with 2% max camber	2312, 2313, 2314, 2413, 2414, 2415, 2512, 2513, 2514, 2515, 2612, 2614, 2615	13
Training with 3% max camber	3312, 3313, 3314, 3315, 3413, 3414, 3415, 3513, 3514, 3515, 3612, 3614, 3615	13
Training with 4% max camber	4312, 4313, 4315, 4412, 4413, 4414, 4415, 4513, 4514, 4515, 4612, 4613, 4614	13
Training with 5% max camber	5313, 5314, 5315, 5413, 5414, 5415, 5512, 5513, 5514, 5515, 5612, 5613, 5614	13
Training with 6% max camber	6312, 6313, 6314, 6315, 6412, 6413, 6414, 6512, 6513, 6514, 6515, 6612, 6613, 6615	14

Table 3. Group A—Overall model performance for consecutive single seeds.

Single Seeds	Training Time	$MAPE$ $C_{L}$	$MAPE$ $C_{D}$	$R^{2}$ $C_{L}$	$R^{2}$ $C_{D}$
42	528.52 s (8.81 min)	1.33%	1.07%	0.9998	0.9933
43	503.31 s (8.39 min)	1.41%	2.02%	0.9996	0.98
44	492.29 s (8.20 min)	1.67%	0.84%	0.9998	0.9909
45	471.30 s (7.86 min)	1.86%	1.92%	0.9998	0.9784
46	920.17 s (15.34 min)	1.27%	1.19%	0.9998	0.99

Table 4. Group A—Individual MAPE prediction errors per airfoil and seed for

C_{L}

and

C_{D}

.

Table 4. Group A—Individual MAPE prediction errors per airfoil and seed for

C_{L}

and

C_{D}

.

Nr.crt.	Airfoil	Seed	$MAPE$ $C_{L}$	$MAPE$ $C_{D}$
1	NACA 0314	42	0.86	0.69	8	NACA 2315	42	1.67	0.92	15	NACA 4512	42	0.92	1.04
		43	0.82	0.98			43	1.90	1.57			43	1.49	2.74
		44	1.33	0.82			44	0.84	0.80			44	2.82	0.64
		45	1.17	2.14			45	1.81	1.33			45	1.56	1.54
		46	1.64	1.18			46	1.03	1.26			46	1.61	1.16
2	NACA 0412	42	1.18	0.88	9	NACA 2412	42	1.17	1.01	16	NACA 4615	42	1.47	0.80
		43	0.77	1.10			43	1.25	1.91			43	1.06	3.53
		44	1.19	0.53			44	2.54	0.91			44	1.46	0.59
		45	1.30	2.59			45	2.21	2.03			45	1.64	1.67
		46	1.51	0.99			46	1.31	1.23			46	0.90	1.14
3	NACA 0415	42	0.98	0.79	10	NACA 2613	42	2.13	1.12	17	NACA 5312	42	2.62	2.06
		43	0.86	0.82			43	1.45	1.65			43	1.50	2.54
		44	1.38	1.11			44	1.55	0.83			44	3.67	1.14
		45	1.28	1.77			45	3.13	2.43			45	0.83	1.71
		46	1.58	1.34			46	1.31	1.42			46	1.21	1.45
4	NACA 0612	42	1.18	0.88	11	NACA 3412	42	1.05	1.09	18	NACA 5412	42	1.54	1.38
		43	0.77	1.10			43	1.57	2.28			43	1.49	2.12
		44	1.19	0.53			44	2.58	0.92			44	2.26	1.17
		45	1.30	2.59			45	1.08	1.59			45	0.98	1.55
		46	1.51	0.99			46	0.84	1.13			46	0.82	0.99
5	NACA 1313	42	1.01	0.82	12	NACA 3512	42	1.01	1.03	19	NACA 5615	42	0.97	1.14
		43	1.25	1.41			43	1.39	2.30			43	0.98	3.61
		44	0.68	0.62			44	2.21	0.93			44	1.31	0.61
		45	2.27	1.93			45	1.90	1.87			45	1.14	1.85
		46	1.32	1.13			46	0.92	1.29			46	0.92	1.18
6	NACA 1512	42	1.19	1.03	13	NACA 3613	42	2.37	1.14	20	NACA 6415	42	1.01	1.31
		43	0.90	1.53			43	1.04	2.33			43	1.71	1.89
		44	1.04	0.61			44	1.91	0.72			44	0.75	1.58
		45	3.03	2.25			45	2.84	2.22			45	0.81	2.08
		46	1.70	1.28			46	0.77	1.30			46	0.76	1.22
7	NACA 1615	42	1.81	0.90	14	NACA 4314	42	1.07	1.21	21	NACA 6614	42	0.66	1.17
		43	2.53	1.39			43	3.34	1.82			43	1.43	3.80
		44	1.28	0.76			44	1.93	1.08			44	1.02	0.73
		45	5.09	1.69			45	2.78	1.25			45	0.85	2.17
		46	2.81	1.01			46	1.59	1.31			46	0.52	0.83

Table 5. Group B—Overall model performance for diverse single seeds.

Single Seeds	Training Time	$MAPE$ $C_{L}$	$MAPE$ $C_{D}$	$R^{2}$ $C_{L}$	$R^{2}$ $C_{D}$
0	564.94 s (9.42 min)	1.10%	0.57%	0.9998	0.9954
1	710.68 s (11.84 min)	1.97%	0.99%	0.9998	0.9938
123	567.39 s (9.46 min)	1.08%	1.54%	0.9999	0.9855
777	361.83 s (6.03 min)	1.78%	1.70%	0.9998	0.9861
999	681.81 s (11.36 min)	1.65%	0.58%	0.9998	0.9974

Table 6. Group B—Individual MAPE prediction errors per airfoil and seed for

C_{L}

and

C_{D}

.

Table 6. Group B—Individual MAPE prediction errors per airfoil and seed for

C_{L}

and

C_{D}

.

Nr.crt.	Airfoil	Seed	$MAPE$ $C_{L}$	$MAPE$ $C_{D}$
1	NACA 0314	0	0.58	0.48	8	NACA 2315	0	1.23	0.29	15	NACA 4512	0	0.99	0.53
		1	1.62	0.53			1	1.73	0.64			1	1.97	0.67
		123	1.09	0.49			123	0.96	1.26			123	1.00	2.15
		777	1.28	1.03			777	1.65	1.57			777	1.22	1.88
		999	1.57	0.42			999	1.54	0.31			999	1.57	0.70
2	NACA 0412	0	0.65	0.39	9	NACA 2412	0	0.83	0.54	16	NACA 4615	0	1.22	0.86
		1	1.97	0.84			1	2.94	1.16			1	0.80	1.32
		123	1.04	1.14			123	1.07	1.24			123	0.94	1.87
		777	1.18	1.27			777	2.88	1.54			777	1.39	2.61
		999	1.41	0.33			999	2.01	0.37			999	1.54	0.55
3	NACA 0415	0	0.72	0.27	10	NACA 2613	0	1.79	0.50	17	NACA 5312	0	1.29	0.84
		1	1.73	0.64			1	1.85	0.89			1	3.58	1.24
		123	1.09	0.36			123	1.15	1.20			123	2.00	2.08
		777	1.02	1.12			777	1.40	2.00			777	4.69	2.03
		999	1.53	0.47			999	1.93	0.82			999	2.30	1.05
4	NACA 0612	0	0.65	0.39	11	NACA 3412	0	1.11	0.43	18	NACA 5412	0	0.93	0.74
		1	1.97	0.84			1	3.10	0.86			1	2.36	1.25
		123	1.04	1.14			123	0.94	1.62			123	1.06	1.96
		777	1.18	1.27			777	3.21	1.72			777	2.58	1.80
		999	1.41	0.33			999	1.91	0.55			999	1.70	0.87
5	NACA 1313	0	0.99	0.36	12	NACA 3512	0	1.03	0.46	19	NACA 5615	0	1.11	0.96
		1	1.67	0.53			1	2.21	0.93			1	0.72	1.28
		123	0.69	0.99			123	1.18	1.71			123	0.82	2.09
		777	1.06	1.45			777	1.43	1.86			777	0.90	2.06
		999	1.34	0.29			999	1.67	0.67			999	1.13	0.70
6	NACA 1512	0	0.87	0.43	13	NACA 3613	0	1.69	0.62	20	NACA 6415	0	1.17	0.86
		1	3.49	0.85			1	1.60	1.00			1	0.87	2.07
		123	0.90	1.26			123	1.91	1.89			123	0.81	2.51
		777	2.14	1.32			777	1.26	2.31			777	1.25	1.35
		999	1.97	0.34			999	1.64	0.80			999	0.98	0.78
7	NACA 1615	0	1.84	0.36	14	NACA 4314	0	1.27	0.70	21	NACA 6614	0	1.01	0.86
		1	2.03	0.71			1	2.54	0.89			1	0.56	1.51
		123	1.00	0.88			123	1.17	1.65			123	0.63	2.71
		777	1.75	1.37			777	3.19	1.74			777	0.66	2.32
		999	2.27	0.54			999	2.44	0.50			999	0.73	0.67

Table 7. Group B—Overall model performance for ensemble setting.

Ensembles	Training Time	$MAPE$ $C_{L}$	$MAPE$ $C_{D}$	$R^{2}$ $C_{L}$	$R^{2}$ $C_{D}$
C1 = (17, 89, 257)	366.41 s (6.11 min)	1.78%	2.62%	0.9998	0.9887
C2 = (610, 987, 75025)	811.98 s (13.53 min)	1.43%	1.19%	0.9999	0.9968
C3 = $(2^{8}, 2^{16}, 2^{32} - 1)$	440.43 s (7.34 min)	2.72%	1.08%	0.9998	0.997
C4 = (0, 123, 999)	934.12 s (15.57 min)	1.82%	1.46%	0.9999	0.9936
C5 = (43, 45, 777)	1386.45 s (23.11 min)	1.55%	2.01%	0.9999	0.9893

Table 8. Group C—Ensemble MAPE prediction errors per airfoil and seed for

C_{L}

and

C_{D}

.

Table 8. Group C—Ensemble MAPE prediction errors per airfoil and seed for

C_{L}

and

C_{D}

.

Nr.crt.	Airfoil	Ensemble	$MAPE$ $C_{L}$	$MAPE$ $C_{D}$
1	NACA 0314	C1	1.62	2.40	8	NACA 2315	C1	2.26	2.16	15	NACA 4512	C1	1.82	2.16
		C2	1.77	1.12			C2	2.65	0.95			C2	0.88	0.98
		C3	4.81	0.93			C3	4.56	0.96			C3	0.73	1.12
		C4	2.97	0.81			C4	4.23	1.19			C4	0.73	1.34
		C5	1.57	1.52			C5	2.26	1.61			C5	0.71	2.17
2	NACA 0412	C1	0.96	2.78	9	NACA 2412	C1	1.83	2.18	16	NACA 4615	C1	1.63	2.90
		C2	0.85	1.61			C2	1.63	1.24			C2	2.32	1.06
		C3	5.67	1.33			C3	1.21	1.15			C3	2.67	0.70
		C4	2.42	1.17			C4	1.52	1.27			C4	2.06	1.72
		C5	0.91	2.76			C5	1.34	2.78			C5	3.07	1.59
3	NACA 0415	C1	2.12	2.85	10	NACA 2613	C1	1.01	2.32	17	NACA 5312	C1	3.37	2.65
		C2	1.47	1.31			C2	1.98	1.24			C2	0.68	1.09
		C3	4.26	0.94			C3	3.11	1.09			C3	1.73	1.36
		C4	2.72	1.19			C4	1.84	1.02			C4	1.00	2.14
		C5	2.09	1.83			C5	1.21	1.37			C5	1.41	2.25
4	NACA 0612	C1	0.96	2.78	11	NACA 3412	C1	2.72	2.24	18	NACA 5412	C1	2.42	2.47
		C2	0.85	1.61			C2	0.99	1.07			C2	0.46	0.87
		C3	5.67	1.33			C3	1.18	1.01			C3	1.57	1.38
		C4	2.42	1.17			C4	1.25	1.18			C4	0.86	2.25
		C5	0.91	2.76			C5	0.77	2.14			C5	1.12	1.96
5	NACA 1313	C1	2.62	2.47	12	NACA 3512	C1	2.21	2.13	19	NACA 5615	C1	0.91	3.13
		C2	3.36	0.94			C2	0.90	1.18			C2	1.04	1.36
		C3	3.90	0.93			C3	0.96	1.12			C3	1.01	0.86
		C4	3.08	0.88			C4	0.69	1.16			C4	1.00	1.85
		C5	2.80	1.77			C5	1.03	1.99			C5	1.03	2.00
6	NACA 1512	C1	1.58	2.42	13	NACA 3613	C1	1.05	2.77	20	NACA 6415	C1	1.45	2.64
		C2	1.64	1.35			C2	2.45	1.19			C2	0.71	1.12
		C3	1.95	1.22			C3	2.17	0.98			C3	1.74	1.53
		C4	1.83	1.18			C4	2.10	1.27			C4	1.27	3.40
		C5	1.79	2.50			C5	2.43	1.37			C5	0.72	1.58
7	NACA 1615	C1	0.78	2.74	14	NACA 4314	C1	3.25	1.92	21	NACA 6614	C1	0.62	4.79
		C2	1.19	1.36			C2	1.69	1.24			C2	0.51	1.06
		C3	3.70	0.68			C3	3.41	1.13			C3	1.01	0.90
		C4	2.32	0.91			C4	1.35	1.84			C4	0.53	1.63
		C5	3.41	2.37			C5	1.40	2.06			C5	0.55	1.79

Table 9. Overall performance of the best/worst tested seeds/ensembles.

Category	Seed/Ensemble	$MAPE C_{L}$	$MAPE C_{D}$	$R^{2} C_{L}$	$R^{2} C_{D}$	Overall MAPE	$Overall R^{2}$
Best performing single seed	Seed 0	1.1%	0.57%	0.9998	0.9954	0.83%	0.9976
Worst performing single seed	Seed 45	1.86%	1.92%	0.9998	0.9784	1.89%	0.9891
Best performing ensemble	C2 (610, 987, 75025)	1.43%	1.19%	0.9999	0.9968	1.31%	0.9983
Worst performing ensemble	C1 (17, 89, 257)	1.78%	2.62%	0.9998	0.9887	2.2%	0.9942

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sterpu, D.-A.; Măriuța, D.; Cican, G.; Larco, C.-M.; Grigorie, L.-T. Machine Learning Prediction of Airfoil Aerodynamic Performance Using Neural Network Ensembles. Appl. Sci. 2025, 15, 7720. https://doi.org/10.3390/app15147720

AMA Style

Sterpu D-A, Măriuța D, Cican G, Larco C-M, Grigorie L-T. Machine Learning Prediction of Airfoil Aerodynamic Performance Using Neural Network Ensembles. Applied Sciences. 2025; 15(14):7720. https://doi.org/10.3390/app15147720

Chicago/Turabian Style

Sterpu, Diana-Andreea, Daniel Măriuța, Grigore Cican, Ciprian-Marius Larco, and Lucian-Teodor Grigorie. 2025. "Machine Learning Prediction of Airfoil Aerodynamic Performance Using Neural Network Ensembles" Applied Sciences 15, no. 14: 7720. https://doi.org/10.3390/app15147720

APA Style

Sterpu, D.-A., Măriuța, D., Cican, G., Larco, C.-M., & Grigorie, L.-T. (2025). Machine Learning Prediction of Airfoil Aerodynamic Performance Using Neural Network Ensembles. Applied Sciences, 15(14), 7720. https://doi.org/10.3390/app15147720

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Prediction of Airfoil Aerodynamic Performance Using Neural Network Ensembles

Abstract

Featured Application

Abstract

1. Introduction

2. Methodology

2.1. General Structure

2.2. General Architecture of a Model

2.3. Seed Randomness

2.3.1. Seed Influence on Weights and Biases

2.3.2. Seed Influence on Data Shuffling

2.3.3. Seed Influence on Mini-Batch Shuffling

2.4. Model Configuration

3. Results

3.1. Group A—Consecutive Single Seeds

3.2. Group B—Diverse Single Seeds

3.3. Group C—Ensemble Models

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. NACA Four-Digit Generation

Appendix A.2. XFOIL Command Sequence

Appendix A.3. Computational Resources

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI