Probability Selection-Based Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization

Wang, Siyuan; Li, Jian-Yu

doi:10.3390/app152111404

Open AccessArticle

Probability Selection-Based Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization

by

Siyuan Wang

¹ and

Jian-Yu Li

^2,*

¹

Center for Educational Technology and Resource Development, Ministry of Education, Beijing 100031, China

²

College of Artificial Intelligence, Nankai University, Tianjin 300350, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(21), 11404; https://doi.org/10.3390/app152111404 (registering DOI)

Submission received: 27 September 2025 / Revised: 16 October 2025 / Accepted: 17 October 2025 / Published: 24 October 2025

(This article belongs to the Special Issue Applications of Genetic and Evolutionary Computation)

Download

Browse Figures

Versions Notes

Abstract

Surrogate-assisted evolutionary algorithms (SAEAs) have emerged as a powerful class of optimization methods that utilize surrogate models to address expensive optimization problems (EOPs), where fitness evaluations (FEs) are expensive or limited. By leveraging previously evaluated solutions to learn predictive models, SAEAs enable efficient search under constrained evaluation budgets. However, the performance of SAEAs heavily depends on the quality and utilization of surrogate models, and balancing the accuracy and generalization ability makes effective model construction and management a key challenge. Therefore, this paper introduces a novel probability selection-based surrogate-assisted evolutionary algorithm (PS-SAEA) to enhance optimization performance under FE-constrained conditions. The PS-SAEA has two novel designs. First, a probabilistic model selection (PMS) strategy is proposed to stochastically select surrogate models, striking a balance between prediction accuracy and generalization by avoiding overfitting commonly caused by greedy selection. Second, a weighted model ensemble (WME) mechanism is developed to integrate selected models, assigning weights based on individual prediction errors to improve the accuracy and reliability of fitness estimation. Extensive experiments on benchmark problems with varying dimensionalities demonstrate that PS-SAEA consistently outperforms several state-of-the-art SAEAs, validating its effectiveness and robustness in dealing with various complex EOPs.

Keywords:

expensive optimization; surrogate-assisted optimization; evolutionary computation; evolutionary algorithm

1. Introduction

Over the past decades, surrogate-assisted evolutionary algorithms (SAEAs), a prominent class of evolutionary computation (EC) algorithms, have demonstrated considerable success in solving complex optimization problems through the use of surrogate models [1,2,3,4]. Traditional evolutionary algorithms (EAs) rely on fitness evaluations (FEs) to identify good individuals for evolutionary search. However, when FEs are limited or expensive to obtain, the performance of EAs degrades significantly [5]. This limitation is common in many real-world scenarios, often known as expensive optimization problems (EOPs), where each FE incurs high computational or financial cost [6,7,8]. To address this, SAEAs leverage surrogate models trained on previously evaluated solutions to approximate FEs, enabling efficient search under limited evaluation budgets. In many practical settings, particularly those constrained by time, computational resources, or data access, conducting new FEs during optimization is impractical or impossible [9,10,11]. In such contexts, offline SAEAs are especially valuable, as they construct surrogates solely from existing data and rely entirely on surrogate-assisted evaluation throughout the optimization process. As a result, SAEAs provide a cost-effective and practical alternative to conventional EAs for EOPs.

Although SAEAs have shown progress in solving EOPs, the accuracy of built surrogates will have a great influence on the optimization results. Therefore, how to build an accurate surrogate for solving EOP is a crucial question in designing SAEAs. To date, research on improving SAEAs has primarily followed two major directions. The first focuses on enhancing the quality and quantity of available data, as these factors critically influence the accuracy and reliability of surrogate models. For example, the presence of noise in the data can significantly impair the surrogate’s predictive performance, making data preprocessing an essential step [5]. When handling complex data structures, advanced learning techniques can be employed to uncover latent patterns and improve model robustness. One of the most persistent challenges in SAEAs is the limited size of the evaluated dataset. In general, larger datasets lead to more accurate and generalizable surrogate models [12,13]. Accordingly, extensive research has been dedicated to increasing data availability and making better use of existing data. Representative approaches include local smoothing techniques and synthetic data generation strategies [14,15], which aim to augment the dataset and enrich the learning process.

The second major research direction aims to construct more accurate surrogate models and/or develop effective surrogate model management (SMM) strategies to govern their use. Several surrogate modeling techniques have been explored, including Kriging [16], random forests [17,18], and radial basis function neural networks (RBFNNs) [19]. Moreover, ensemble learning has become a powerful tool to improve surrogate prediction by aggregating multiple base models [20,21]. In parallel, SMM plays a crucial role in surrogate-assisted evolutionary optimization (SAEO), particularly in scenarios with limited data. SMMs are designed to dynamically manage model usage across the optimization process, including strategies for sample selection [22,23] and knowledge transfer between related tasks [24,25,26]. Moreover, different learning paradigm has also been studied, such as using contrastive learning [27], federated learning [28], symbolic regression [29], dimension reduction [30], and autoencoder embedding [31]. Collectively, these strategies aim to maximize data utility, mitigate overfitting, and enhance the optimization performance of SAEAs under FE-constrained conditions.

Despite numerous advancements, balancing prediction accuracy and generalization ability remains a critical challenge in the design of SAEAs, which restricts their applicability in solving EOPs. To address this, a novel probability selection-based SAEA (PS-SAEA) is proposed in this paper to more effectively tackle EOPs. Specifically, a probabilistic model selection (PMS) strategy is introduced to select promising surrogate models in a stochastic manner, thereby avoiding the overfitting commonly caused by greedy selection mechanisms. In addition, a weighted model ensemble (WME) method is developed to integrate the selected models, with weighting determined by each model’s prediction error, to produce more accurate fitness estimations.

The main contributions of this paper can be summarized as follows:

(1): A PMS strategy is proposed to select models for ensemble prediction by considering both prediction accuracy and selection probability, thereby achieving a better trade-off between surrogate accuracy and generalization ability.
(2): A WME mechanism is introduced to combine the selected models into an ensemble, with each model weighted according to its prediction error, so as to enhance the overall reliability and accuracy of fitness approximation.
(3): By integrating PMS and WME, a new algorithm, PS-SAEA, is developed for solving EOPs more efficiently, which offers a promising approach for complex optimization tasks.

Comprehensive experiments are conducted on well-established benchmark problems with varying dimensionalities. The results show that the PS-SAEA significantly outperforms cutting-edge SAEAs across different scenarios. This confirms the effectiveness and robustness of PS-SAEA in handling EOPs.

The remainder of this paper is: Section 2 provides a brief overview of SAEAs and reviews related work. Section 3 details the proposed PS-SAEA, including its core components and algorithmic framework. Section 4 offers the experimental setup, benchmark problems, and evaluation. Finally, Section 5 concludes the paper.

2. Background and Related Work of SAEA

2.1. SAEA

In general, the core principle of SAEAs lies in effectively leveraging available data to minimize the number of required FEs through surrogate models, thereby guiding the evolutionary process more efficiently [32]. By constructing accurate surrogates from previously evaluated data, SAEAs can replace expensive real FEs with surrogate-based predictions, significantly reducing the computational and financial burden. As a result, SAEAs often outperform traditional EAs when addressing expensive and computationally intensive optimization problems.

From a structural perspective, the typical framework of SAEAs consists of two main components: the SMM module and the SAEO module. The SMM is responsible for constructing, managing, and continuously updating surrogate models to ensure accurate approximations of the objective function. In parallel, the SAEO incorporates these surrogate models into the evolutionary search process to guide solution generation and selection [21]. Crucially, the SMM can dynamically refine surrogate models using newly obtained information and evaluation results from the SAEO.

Considering whether new real FEs are available within the optimization, SAEAs have two categories: online and offline variants. In online SAEAs, a limited number of FEs are permitted throughout the evolutionary search. The newly evaluated data can be incorporated to improve surrogate accuracy, making this approach suitable when a small evaluation budget is available. In contrast, offline SAEAs rely solely on pre-existing historical data for surrogate construction and cannot perform additional real FEs during the optimization process. Such offline settings are common in scenarios where evaluations are highly expensive or practically infeasible.

Despite their operational differences, both online and offline SAEAs share a common objective: to reduce reliance on expensive evaluations by maximizing the utility of available data to guide the optimization process.

2.2. Related Work on Enhanced SAEAs

To date, a wide range of strategies have been conducted to enhance SAEAs [6]. As this paper focuses on offline data-driven scenarios, this section briefly reviews the relevant literature on offline SAEAs, emphasizing their limitations and distinctions.

Offline SAEAs operate exclusively on pre-evaluated datasets and cannot acquire new FEs during the optimization process. Consequently, the effectiveness of these algorithms heavily relies on how well they utilize the given data. Both the quantity and quality of the available data are critical factors affecting surrogate accuracy and overall optimization performance [6]. Due to the fixed nature of the data, its quality can significantly influence results. For example, Lin et al. [33] examined SAEAs under various noise conditions, demonstrating that increased noise substantially impairs performance. To address such challenges, various data preprocessing techniques have been introduced to handle issues such as class imbalance, incompleteness, and noise. For instance, local regression was used to mitigate noise in a blast furnace optimization task [9].

When dealing with large-scale or redundant datasets, methods such as clustering and data mining have been applied to reduce computational overhead. For example, Wang et al. [10] applied clustering techniques in a trauma system design task, successfully reducing computational cost by 90%. In scenarios where data is insufficient to construct accurate surrogate models, synthetic data generation offers a promising solution. Li et al. [15], for example, proposed a localized data generation method that improved surrogate accuracy with encouraging results. Similarly, data perturbation strategies [14] have proven effective for constructing multiple useful data subsets. Huang et al. [22] introduced TT-SAEA, which employs semi-supervised learning to expand the training dataset. However, synthetic data may also introduce errors if the generative methods are not well-aligned with the problem landscape [15].

Beyond data-related approaches, significant progress has been made in improving surrogate model accuracy through advanced learning techniques. These include ensemble learning strategies that combine either homogeneous or heterogeneous base models to improve prediction robustness. For instance, SAEA-SE [21] applies a bagging strategy to build thousands of radial basis function neural networks (RBFNNs), from which hundreds are selectively aggregated for fitness approximation. The selection is based on the interval sampling with the sorted predicted fitness of the current best solution. In contrast, BSAEA [15] uses a boosting strategy to train models, incrementally refining prediction performance sequentially. A dynamic ensemble approach is proposed in [34], which adaptively selects the most reliable surrogate based on the current optimization context. In addition, several methods have explored combining diverse types of base learners. For example, SRK-SAEA [12] employs multiple RBFNNs with varying radial basis functions and ranks solutions via surrogate-assisted sorting. These studies emphasize the significance of ensemble design and learner diversity in enhancing surrogate performance and, consequently, optimization outcomes. Different from existing work that uses greedy or interval sampling methods to select models, the PS-SAEA proposed in this paper employs a PMS strategy to select models for ensemble prediction, considering both prediction accuracy and selection probability, thereby achieving a better trade-off between surrogate accuracy and generalization ability. Other paradigms have also emerged. Huang et al. [27] introduced CL-SAEA, which integrates contrastive learning and ensemble learning to construct surrogates that better capture the relationships between individuals. Hao et al. [35] studied a relation-based model for replacing the expensive FEs. Sun et al. [29] explored symbolic regression for surrogate modeling in SAEO.

While recent advancements in SAEAs have made significant progress, balancing prediction accuracy and generalization ability remains a persistent challenge that restricts their broader applicability in solving EOPs. Directly addressing this limitation, the proposed PS-SAEA introduces the novel PMS and WME strategy to dynamically select and combine surrogate models. This approach is designed to achieve a superior balance between accuracy and generalization, advancing the state-of-the-art of SAEAs.

3. The Proposed PS-SAEA

3.1. Aims of the Study

Despite existing advancements, balancing prediction accuracy and generalization ability remains a critical challenge in the design of SAEAs, which restricts their applicability in solving EOPs. To address this, this paper has two main goals: (1) propose the novel PMS strategy to select models for ensemble prediction by considering both prediction accuracy and selection probability, thereby achieving a better trade-off between surrogate accuracy and generalization ability; (2) propose the WME mechanism to combine the selected models into an ensemble, with each model weighted according to its prediction error, so as to enhance the overall reliability and accuracy of fitness approximation; and (3) propose the new algorithm, PS-SAEA, for solving EOPs more efficiently, which offers a promising approach for complex optimization tasks.

The following content will first describe the framework of PS-SAEA and then provide details of the PMS and WME. Finally, the complete PS-SAEA is detailed.

3.2. The Framework of PS-SAEA

Figure 1 shows the PS-SAEA framework, which consists of two main components: the SAEO process and the PMS-based evaluation module. The SAEO component adopts a conventional evolutionary computation (EC) structure, comprising initialization, variation, fitness evaluation, and selection. As a result, PS-SAEA is compatible with various EC algorithms and can incorporate different optimizers. The key distinction between the SAEO in PS-SAEA and traditional EC lies in that PS-SAEA uses a probability selection-based ensemble model for fitness prediction in place of expensive FEs. This surrogate aims to improve prediction accuracy and better guide the evolutionary process without using expensive FEs.

The core innovation of PS-SAEA lies in the PSE-based evaluation module, which employs a probability selection-based ensemble surrogate. Initially, multiple models are trained using the available evaluated data. Subsequently, the selection probability based on the prediction error of models on the evaluated data is calculated. The obtained probability will be used to select models for ensemble prediction. Different from existing ensemble surrogate that selects models in a greedy manner or deterministic rule, the proposed PSE selects models for the ensemble via probability sampling, which brings in the stochasticity and avoids the overfitting problem. Driven by this surrogate, the SAEO continues evolving the population until a termination condition is met, upon which PS-SAEA outputs the final solution(s). The detailed procedures for model training, PMS, and WME are elaborated in the subsequent sections.

3.3. Model Training

As ensemble-based approaches have demonstrated strong potential in solving expensive optimization problems (EOPs), the proposed PS-SAEA also adopts a multiple-model ensemble strategy for fitness prediction. The key distinction between PS-SAEA and other ensemble-based SAEAs lies in its use of a probability selection mechanism to select a subset of models for ensemble prediction, where the model with the least prediction error has a higher probability of being selected. This model selection strategy aims to enhance both prediction accuracy and generalization ability by favoring models with superior performance through probability selection.

To facilitate this, a total of L models are constructed during the initialization phase of PS-SAEA. The pseudo-code for the initial model construction process is shown in Algorithm 1. Specifically, for each model, a training subset is generated by randomly sampling from the original dataset used for evaluation. The sampling process is as follows: for each data point in the original dataset, a random value is drawn from a uniform distribution over the interval [0, 1]. If the sampled value does not exceed 0.5, the data point is included in the corresponding model’s training subset. Note that 0.5 is a widely used threshold setting in existing SAEA studies [14,21].

Given the advantages of computational efficiency and learning ability [21], RBFNNs are chosen as the base learning models. Each training subset is used to train an RBFNN, resulting in L distinct prediction models that can be used in ensemble prediction. After training, the mean squared error of each model is calculated over the full original dataset to estimate its prediction quality. The mean squared error is a widely used metric for evaluating the accuracy of model predictions, and therefore it is used herein. For the model i, its mean squared error, denoted as e_i, is as:

e_{i} = \sqrt{\frac{1}{| Z |} \sum_{x \in Ζ} {(f (x) - H_{i} (x))}^{2}}

(1)

where Z is the original dataset, the fitness and the predicted fitness of model i on x is f(x) and H_i(x), respectively.

Algorithm 1 Model Generation

Input: Z—the evaluated dataset;

L—the number of initial models.

Output: H₁, H₂, …, H_L—the L generated models;

1: Begin

2: For i = 1 to L Do

3: Initialize an empty set U;

4: For every data x from Z Do

5: Sample a random variable in [0, 1];

6: If the sampled value is not larger than 0.5 Then

7: Insert x into U;

8: End If

9: End For

10: Train H_i using U;

11: Calculate the error of H_i on Z as e_i;

12: End For

13: End

3.4. PMS

The PMS strategy is proposed to select a subset of accurate and diverse models for ensemble prediction. To achieve this, PMS simultaneously considers two critical factors: the prediction error of individual models and the diversity among the selected models. For measuring prediction accuracy, the prediction error of model i is denoted as e_i, which serves as an estimator of its reliability. To address the diversity aspect, PMS employs a probabilistic sampling mechanism, which ensures that models with lower prediction errors are more likely to be selected, while still allowing other models to be occasionally chosen. This balances the exploitation of accurate models and the exploration of diverse ones. The selection probability p_i for model i, is:

p_{i} = \frac{v_{i}}{\sum_{j = 1}^{L} v_{j}} \cdot r \cdot L

(2)

where r is the selection ratio that determines the proportion of models to be selected for the ensemble. Herein, the r is set as 7.5% according to the empirical study shown in Section 4.5. The v_i is calculated based on the e_i as:

v_{i} = \frac{1}{e_{i} + ε}

(3)

where ε is a small constant added to prevent numerical errors when e_i = 0. Herein, the ε is set as 1 × 10⁻²⁴. As ε is a very small constant to avoid numerical errors, the performance of the algorithm will not be sensitive to it. Consequently, the higher the prediction error e_i, the lower the selection probability p_i, promoting the preference for more accurate models. During the PMS process (depicted in Algorithm 2), for each model i, a random value is drawn from a uniform distribution over [0, 1]. If the sampled value is less than p_i, then model i is selected and added to the ensemble set. Otherwise, it is excluded. The indices of all selected models are collected into a set denoted by O, which is subsequently used for ensemble prediction.

Algorithm 2 PMS

Input: H₁, H₂, …, H_L—the available models for selection;

e₁, e₂, …, e_L—the model prediction errors;

Output: O—the index set of selected model;

1: Begin

2: For j = 1 to L Do

3: Use Equation (2) to calculate p_j;

4: Sample a random variable in [0, 1];

5: If the sampled value is smaller than p_j Then

6: Add j into O;

7: End If

8: End For

9: End

3.5. WME

The WME aims to combine the predictions of selected models to perform ensemble prediction, with consideration of each model’s accuracy. Given an input data point x and a selected subset of models O, the ensemble prediction E(x) is computed as

E (x) = \sum_{i \in O} w_{i} \cdot H_{i} (x)

(4)

where the w_i is computed based on the p_i, as:

w_{i} = \frac{p_{i}}{\sum_{j \in O} p_{j}}

(5)

where only the p_i values corresponding to the selected models in O will be considered during the calculation. This weighting mechanism ensures that models with higher confidence contribute more to the ensemble prediction, thereby improving the overall prediction accuracy and robustness. Note that the WME is a parameter-free method and does not require setting a parameter value.

3.6. The Entire PS-SAEA

The entire PS-SAEA is introduced in this section, with Algorithm 3 presenting the corresponding pseudocode. Similarly to conventional SAEAs, PS-SAEA comprises several key stages. First, the initialization phase is carried out, which includes both population initialization and the construction of surrogate models. Specifically, a total of L surrogate models are generated based on the initially evaluated dataset, as described in Algorithm 1. Additionally, the prediction error e_i of each model is calculated using the original data to facilitate subsequent model selection through the PMS. Following initialization, the algorithm proceeds to the SAEO phase. In this stage, PS-SAEA iteratively performs the following three steps: (1) model selection using PMS; (2) generation of offspring via variation operators (e.g., crossover and mutation); and (3) surrogate-assisted fitness evaluation and selection. These operations are repeated until a predefined termination criterion is met. In each generation, approximately r × L models are probabilistically selected using the PMS to form an ensemble. The selected ensemble is then used to predict the fitness values of all individuals in the population. Based on the predicted fitness, the algorithm selects the most promising individuals to advance to the next generation. Once the termination condition is satisfied, the algorithm outputs the best solution as predicted by the surrogate ensemble as the final result.

Algorithm 3 PS-SAEA

Input: Z—the evaluated dataset;

L—the number of initial models;

r—the ratio for model selection;

Output: xbest—the best solution;

1: Begin

2: Use Algorithm 1 to obtain L models;

3: Population initialization;

4: While the termination condition not satisfied Do

5: Perform PMS to select models;

6: Reevaluate the old individual in the current population with Equation (4);

7: Reproduce new individuals via crossover and mutation;

8: Evaluate new individuals with Equation (4);

9: Select better individuals among old individuals and new individuals to form a new population;

10: Find the xbest in the new population;

11: End While

12: Output xbest;

13: End

4. Experimental Studies

4.1. Experiment Setup

In this study, five widely recognized problems are employed to comprehensively assess the performance of the proposed PS-SAEA. The selected test functions, as shown in Table 1, denoted as T1 through T5, are frequently used in the SAEA literature due to their diverse landscape characteristics and varying degrees of complexity. Specifically, P1 (Ellipsoid) is a unimodal function designed to assess an algorithm’s ability to converge to the global optimum efficiently. In contrast, T2 (Rosenbrock), T3 (Ackley), T4 (Griewank), and T5 (Rastrigin) are multimodal functions that present significant challenges for SAEAs due to their numerous local optima. Together, these functions span a broad range of search landscapes, enabling a robust evaluation of the PS-SAEA’s exploration and exploitation capabilities.

In line with standard practices, each test function is evaluated under four different dimensional settings: D = {10, 30, 50, 100}, where D denotes the problem dimension. All test functions are configured such that the global optimum is known and fixed at zero, facilitating a fair and consistent performance comparison.

To ensure equitable and reproducible comparisons, the experimental setup is as follows. First, Latin hypercube sampling [36] is employed to generate 11 × D data points across the entire search space, thereby forming the initial real-evaluation dataset for each SAEA. Each data point is a numerical vector that represents the corresponding candidate solution. Based on this dataset, each SAEA constructs surrogate models to guide the evolutionary search toward optimal solutions. Importantly, under this offline data-driven setting, no algorithm is allowed to perform real fitness evaluations beyond the initial 11 × D samples. This constraint reflects realistic conditions where only limited evaluation resources are available. All the experiments are conducted using the MATLAB (R2023a) software tool. The experimental environment utilizes a computer server with two CPUs, specifically Intel (from Dell, Beijing, China) (R) Xeon(R) W5-3423, and 256 GB of RAM.

To mitigate statistical variability, each algorithm is carried out 25 times for each problem instance independently, and the average results are reported. For significance testing, the Wilcoxon rank-sum test is conducted with a significance level of α = 0.05. For clarity in result interpretation, three symbols are used to represent comparative outcomes: “+” indicates that PS-SAEA performs significantly better than the compared algorithm; “≈” denotes statistically equivalent performance; and “−“ indicates significantly inferior performance of PS-SAEA. These symbols provide an intuitive summary of the experimental results, as presented in the following sections.

4.2. Compared Advanced Algorithms

To thoroughly evaluate and challenge the effectiveness of the proposed PS-SAEA, four state-of-the-art SAEAs are selected for comparison: SAEA-SE [21], BSAEA [15], SAEA-PES [14], and CL-SAEA [27]. These algorithms are all representative SAEAs that employ multiple surrogate models for ensemble prediction and have obtained promising results, each adopting a distinct strategy for surrogate selection. As such, they provide an ideal baseline for assessing the performance of PS-SAEA, which introduces the novel PMS and WME for ensemble model selection.

To ensure fairness and consistency in the comparative study, all competing algorithms are implemented using their official or publicly available codebases, thereby eliminating discrepancies that may arise from implementation differences. Furthermore, in the proposed PS-SAEA, the same evolutionary operators as those utilized in the compared algorithms are adopted. That is, both the PS-SAEA and the compared SAEAs use the simulated binary crossover operator [37] and the polynomial mutation operator [38]. The parameter settings are kept consistent with those used in the compared DDEAs. Specifically, the crossover probability is set to 100%, while the mutation probability is defined as 1/D, where D denotes the problem dimensionality. The evolutionary process terminates after 500 generations, which serves as the termination criterion. This design choice ensures that any observed performance improvements can be explicitly attributed to the proposed PMS and WME, rather than to differences in the underlying evolutionary mechanisms. In addition, the number of available models in these algorithms is set to 2000.

4.3. Comparison Study with SAEAs

The results presented in Table 2 compare the proposed PS-SAEA with four representative SAEA variants: SAEA-SE, BSAEA, SAEA-PES, and CL-SAEA, across four scenarios (D = 10, 30, 50, 100) and five test problems (T1–T5) for each. To provide a better illustration, Figure 2 plots the number of better, similar, and worse results obtained by the PS-SAEA when compared with different SAEAs.

In the low-dimensional setting (D = 10), PS-SAEA demonstrates a competitive performance. It outperforms SAEA-SE in T2, but performs equivalently in the remaining tasks. Compared to BSAEA and SAEA-PES, PS-SAEA achieves better performance in two tasks and comparable results in the rest, indicating that while the performance gap is not large, PS-SAEA maintains robustness and consistency. Compared to the CL-SAEA, the proposed PS-SAEA obtains significantly better results on 2 problems and similar results on 1 problem. Notably, all compared methods exhibit similar levels of variance, which suggests the differences lie primarily in convergence behavior rather than instability.

When D = 30 (medium-dimensional cases), the differences become more pronounced. PS-SAEA outperforms BSAEA and SAEA-PES in all five problems and achieves two wins against SAEA-SE and CL-SAEA. The advantage is particularly evident in T1 and T5, where the gaps in mean performance are substantial. This highlights PS-SAEA’s strong search ability. The consistent superiority over BSAEA and SAEA-PES demonstrates better adaptability to landscape changes and more efficient surrogate model usage.

When D = 50, as problem complexity increases, the performance gap continues to widen. PS-SAEA significantly outperforms all three SAEAs in almost all test problems. Especially for T1, T3, T4, and T5, PS-SAEA shows much lower mean errors and tighter standard deviations. In this setting, BSAEA and SAEA-PES often suffer from higher variance and degraded accuracy, while PS-SAEA maintains stable convergence. These results validate the effectiveness of the PMS and WME used in PS-SAEA, which helps balance accuracy and generalization ability even in higher dimensions.

When D = 100 (high-dimensional cases), PS-SAEA exhibits superior overall performance, with five significantly better results over SAEA-SE and SAEA-PES, three significantly better and two similar performances against CL-SAEA, and two significantly better and three similar performances against BSAEA. The most notable gap occurs in T1 and T2, where SAEA-SE and SAEA-PES produce very large mean errors and variances, likely due to model misguidance or premature convergence, especially in higher-dimensional problems with multiple local optima. In contrast, PS-SAEA maintains the prediction-driven search focus and avoids local optima, indicating the excellent scalability and robustness of the method.

To better evaluate the effectiveness of the surrogate, Table 3 gives the comparisons between the PS-SAEA and other SAEAs in terms of mean absolute error (MAE). Overall, PS-SAEA achieves the lowest or near-lowest MAE in most cases, demonstrating its superior surrogate accuracy and generalization ability. In the low-dimensional scenario (D = 10), PS-SAEA exhibits the best prediction accuracy in T2 and T3, and remains highly competitive in other tasks, implying that its model ensemble can effectively capture local landscape features. When the dimension increases to D = 30, PS-SAEA consistently yields lower MAE values than SAEA-SE, BSAEA, and SAEA-PES, particularly in T1 and T5, showing that the proposed prediction-guided mechanism enhances model fidelity even with limited evaluations. Notably, CL-SAEA occasionally achieves low MAEs (e.g., T5), but its performance is unstable across tasks, indicating overfitting or excessive reliance on specific samples. At D = 50, PS-SAEA continues to deliver stable and accurate surrogate predictions, maintaining a clear advantage over other SAEAs in most test problems. The reduced errors on T1–T3 demonstrate that the PMS and WME modules contribute to the construction of diverse yet reliable models. Finally, in the high-dimensional setting (D = 100), PS-SAEA markedly outperforms all baselines on T1–T3 and T5, achieving up to 80% lower MAE than SAEA-SE and SAEA-PES. These results confirm that PS-SAEA not only maintains search robustness but also significantly improves model accuracy, which directly contributes to its superior optimization performance in high-dimensional spaces.

From the above, PS-SAEA achieves no losses across all test scenarios, consistently delivering equal or superior performance, and especially outperforms the other SAEAs in mid-to-high dimensional problems, validating the effectiveness of the PMS and WME. These results demonstrate that PS-SAEA not only improves optimization efficiency but also enhances generalization across diverse problem scales, making it a robust and promising approach for EOPs.

4.4. Component Analysis of PS-SAEA

This part analyzes the PMS and WME of PS-SAEA. To do this, the PS-SAEA is compared with its variant not using PMS or WME. For simplicity, these two variants are denoted as PS-SAEA-P and PS-SAEA-W, respectively. In the PS-SAEA-P, the models are selected based on their prediction error on evaluated data in a greedy manner. In the PS-SAEA-W, the WME is removed, and the models are combined on average.

Table 4 presents the performance comparisons among the original PS-SAEA and its two variants, PS-SAEA-P and PS-SAEA-W. To provide a better visualization, Figure 3 plots the average optimization results of different PS-SAEA variants. From the results, several important observations can be made regarding the contributions of the PMS and WME components in the PS-SAEA.

Across all problem dimensions, the original PS-SAEA consistently performs better than PS-SAEA-P. Specifically, PS-SAEA outperforms PS-SAEA-P in 15 out of 20 test cases, and shows similar performance in the remaining 5 cases. These results highlight the effectiveness of the PMS method, which introduces stochasticity into the model selection process. This helps avoid overfitting caused by deterministic greedy selection strategies, especially when training data are limited or noisy.

When compared with PS-SAEA-W, which removes the WME and instead averages the selected models for prediction, the original PS-SAEA still achieves better or comparable performance in all cases. Notably, PS-SAEA outperforms PS-SAEA-W in 2 cases, performs similarly in 18 cases, and does not show significantly worse results. This suggests that while the WME component offers modest but consistent gains, it contributes to the robustness and overall predictive accuracy of the ensemble, particularly under complex or high-dimensional scenarios. This may be because that the WME assigns higher weights to the more accurate surrogate models and lower weights to the less accurate models, thereby yielding a final prediction with reduced error.

In summary, the results demonstrate that both the PMS and WME contribute positively to the effectiveness of PS-SAEA. Their integration enables the PS-SAEA to strike a better balance between accuracy and generalization, thereby enhancing its ability to solve EOPs more reliably.

4.5. Parameter Study of PS-SAEA

To study the impact and sensitivity of the parameter r in PS-SAEA, this paper conducts experimental comparisons among the PS-SAEA variants using different values of r. Specifically, the original PS-SAEA using r = 7.5% is compared with the variants using r = 2.5%, r = 5%, r = 10%, and r = 15%. For simplicity, these variants are referred as PS-SAEA (r = 7.5%), PS-SAEA (r = 2.5%), PS-SAEA (r = 5%), PS-SAEA (r = 10%), and PS-SAEA (r = 15%), respectively.

The results presented in Table 5 analyze the sensitivity of the PS-SAEA with respect to different values of the selection ratio r, which determines the proportion of elite models retained for the ensemble. To provide a better visualization, Figure 4 plots the average optimization results of PS-SAEA variants with different r.

For D = 10, PS-SAEA (r = 7.5%) generally shows competitive or superior performance, outperforming the r = 2.5% variant once and performing comparably in the remaining four cases. Notably, it achieves a significantly better result on T2 compared with the r = 2.5% variant. Interestingly, r = 5% and r = 10% perform equally well in all cases, even achieving better results on T5. However, both these configurations show one statistically inferior case compared to the default. This suggests that PS-SAEA is relatively robust in this low-dimensional environment, though r = 5% to 10% may offer slight performance advantages in some problems.

For D = 30, the comparisons show more divergence. PS-SAEA (r = 7.5%) performs slightly worse than its variants in several cases. Specifically, r = 5% and r = 10% outperform it in T2 and T3, while r = 2.5% and r = 15% lead to significantly worse outcomes in T1 and T4, respectively. However, PS-SAEA (r = 7.5%) remains consistently within the top-performing group, without being outperformed in any scenario. These results imply that moderate values of r (5–10%) are better suited for balancing exploration and surrogate accuracy in different scenarios.

For D = 50, the differences among variants are subtler. All PS-SAEAs achieve statistically equivalent results in most cases, with only a single statistically worse result for r = 15% (T4). This indicates that the algorithm becomes less sensitive to the value of r as the problem dimension increases. Nevertheless, PS-SAEA (r = 7.5%) remains stable and effective across all test cases.

For D = 100, with high-dimensional characteristics, the results become more varied again. The default r = 7.5% achieves the best or equivalent results in all tasks, particularly outperforming r = 5% and r = 10% in T2 and T5. Both of these variants exhibit significantly worse performance on T2 and T5, indicating possible overfitting or model bias when the selection ratio is too high in complex landscapes. Conversely, r = 2.5% and r = 15% remain statistically similar to the baseline but do not exhibit clear advantages.

The parameter study reveals that r = 7.5% consistently achieves stable and superior or comparable performance across a diverse set of benchmark problems and different dimensional settings. Specifically, in low to high-dimensional scenarios (e.g., D = 10, 30, 50, 100), this ratio balances the trade-off between exploration and exploitation effectively. The results indicate that smaller ratios (e.g., 2.5%) may lead to insufficient model diversity, hampering the surrogate ensemble’s robustness, while larger ratios (e.g., 10–15%) tend to cause overfitting or excessive focus on a limited subset of models, reducing generalization.

As analyzed above, the PS-SAEA is not sensitive to the r value, while excessively large or small values may decrease the performance of both the ensemble model and the PS-SAEA. The r = 7.5% can achieve a good balance between the accuracy and generalization ability, thereby recommended in this paper.

4.6. Computational Cost of PS-SAEA

To investigate the computational cost of the PS-SAEA, the average running time (in seconds) is recorded and compared with three state-of-the-art SAEA variants: SAEA-SE, BSAEA, and SAEA-PES. The experiments are conducted on two representative test functions, T1 (Ellipsoid, unimodal) and T5 (Rastrigin, multimodal), with four different dimensions: 10, 30, 50, and 100. These test instances are selected to evaluate the time efficiency of PS-SAEA in handling problems with varying landscapes and dimensionalities. The detailed comparisons are provided in Table 6.

From the results, it can be observed that the computational cost of PS-SAEA is generally comparable to or slightly higher than that of other SAEAs in low-dimensional cases (e.g., D = 10 or 30). For instance, on T1 with D = 10, PS-SAEA takes 0.172 s on average per generation, which is close to SAEA-SE (0.187 s), BSAEA (0.169 s), and SAEA-PES (0.177 s). Similar trends can be found for T5 at low dimensions, indicating that the additional procedures introduced by PS-SAEA (i.e., probabilistic model selection and weighted ensemble) incur only a modest overhead in simpler scenarios.

However, as the problem dimension increases, the time cost of PS-SAEA rises more significantly compared to BSAEA. For example, on T1 with D = 100, PS-SAEA requires 11.654 s, while BSAEA takes only 4.826 s. This is primarily due to the extra computational burden associated with maintaining and selecting among multiple surrogate models, as well as performing weighted model ensemble operations. Nevertheless, when compared with SAEA-SE and SAEA-PES, both of which also incorporate complex model management strategies, the time cost of PS-SAEA remains on par or even lower in certain cases (e.g., T5 at D = 50).

Despite the increased time cost at higher dimensions, it is important to emphasize that PS-SAEA achieves a significantly better optimization performance, as demonstrated in previous sections. Thus, the trade-off between computational cost and solution quality is justifiable, particularly for EOPs where function evaluations are dominant over surrogate model operations.

In summary, although PS-SAEA introduces slight additional overhead due to model selection and ensemble procedures, its time complexity remains acceptable and comparable to existing advanced SAEAs, especially considering its substantial performance improvements. Therefore, PS-SAEA can be considered an effective and practical choice for solving both unimodal and multimodal expensive optimization problems across a wide range of dimensionalities.

4.7. Scalability with Parallel Processing

This part studies the scalable ability of PS-SAEA with parallel processing. As the main time costs are from building the model and making the model predictions, we parallelized the model generation and prediction process to test the time costs of PS-SAEAs. To analyze its ability with different numbers of threads (denoted as TR), the time costs of using different TRs are reported. The results on T1 and T5 are provided in Table 7.

From Table 7, it is evident that PS-SAEA benefits significantly from parallelization. When the TR increases from 1 to 12, the time cost decreases almost linearly across both test cases (T1 and T5). For example, on T1 with D = 100, the execution time drops from 1398.481 s with a single thread to 125.131 s with 12 threads, yielding an approximate 11.2× speedup. A similar trend is observed on T5, where the time reduces from 1437.782 s to 128.445 s under the same configuration, corresponding to an 11.2× acceleration. These results confirm that the proposed PS-SAEA framework scales effectively with the number of computing threads.

Another important observation is that the scalability is more pronounced for higher-dimensional problems. While parallelization reduces computation time consistently across all settings, the absolute savings are far greater for large-scale dimensions. For instance, in T1 with D = 10, parallelization to 12 threads saves about 18.8 s, whereas for D = 100, it reduces the runtime by over 1270 s. This highlights the suitability of PS-SAEA for large-scale optimization tasks, where model construction and prediction become the dominant computational bottlenecks.

In summary, the results verify that parallelizing surrogate training and prediction enables PS-SAEA to maintain both efficiency and scalability. The near-linear speedup with increasing TR demonstrates that the framework can effectively exploit modern multi-core computing environments, thus making it practical for real-world high-dimensional optimization problems.

4.8. Comparisons with Basic EAs Without Surrogate-Assisted Method

To further validate the effectiveness of the proposed PS-SAEA, we compared it with Basic EAs (BEAs) that do not incorporate surrogates. Both PS-SAEA and BEA employ identical evolutionary operators; the only difference lies in the evaluation process: PS-SAEA replaces expensive real FEs with surrogate predictions, whereas BEA evaluates individuals through real FEs. To provide a fair comparison, BEAs with three different amounts of evaluated data were tested: 11 × D, 55 × D, and 110 × D FEs, where D is the problem dimension.

The results in Table 8 show that when the BEA has the same number of real FEs as the PS-SAEA, the PS-SAEA consistently and significantly outperforms the BEA across all dimensions, achieving 5/0/0 significantly better/similar/significantly worse results in all cases. Moreover, the results plotted in Figure 5 also show that the PS-SAEA extensively outperforms the BEA across all dimensions. This demonstrates the substantial advantage of surrogate-assisted evaluation under severely limited evaluation budgets.

When BEA is granted more real evaluations (55 × D and 110 × D), its performance improves, occasionally surpassing PS-SAEA on certain tasks (e.g., T2–T5 in 10D, T5 in higher dimensions). However, PS-SAEA still matches or exceeds BEA in most scenarios, despite using no additional real FEs. That is, the PS-SAEA can use about 10% budget to obtain similar or even better performance than the BEA. This suggests that the proposed surrogate-assisted approach can achieve competitive or superior results with significantly fewer expensive evaluations, thereby offering substantial cost savings.

Overall, these results confirm that PS-SAEA is not only highly effective under limited evaluation conditions but also remains competitive even when BEAs are allowed up to ten times more real evaluations.

4.9. Discussions

Results and Evidence: The experimental results demonstrate that the proposed PS-SAEA consistently outperforms or matches the performance of baseline SAEAs across multiple benchmark problems, especially under limited evaluation budgets. For instance, Table 2 shows the great performance of PS-SAEA compared to existing SAEAs. Table 7 and Table 8 show that PS-SAEA achieves significantly better results than BEAs with similar evaluation resources, and maintains competitive performance even when BEAs are allotted several times more evaluations. This empirical evidence supports the effectiveness of the surrogate-assisted approach and underscores its potential for cost-efficient optimization. Furthermore, ablation studies (e.g., Section 4.8) indicate that components (PMS and WME) notably influence performance, with statistically significant differences confirmed through the applied tests. These findings provide concrete support for the claim that the integrated surrogate management strategies enhance robustness and accuracy.

Theorizing: From a theoretical perspective, the results corroborate the notion that managing model overfitting is crucial in surrogate-assisted optimization, especially with limited data. The success of the PS-SAEA’s PMS and ensemble strategies can be interpreted as mechanisms that balance exploration and exploitation, thereby improving generalization. The stochastic nature of the PMS method prevents the algorithm from prematurely converging due to over-reliance on a single surrogate model, aligning with ensemble learning theories that advocate for diversity and robustness.

Generalization ability: Although the paper primarily utilizes RBFNNs due to their efficiency and suitability for high-dimensional problems, the core architecture, particularly the model management strategy involving PMS and WME, can be extended to other surrogate types. For example, Gaussian processes (GPs) are highly popular due to their probabilistic nature, which provides both mean predictions and uncertainty estimates, making them valuable for balancing exploration and exploitation. Integrating GPs into the PS-SAEA framework would involve replacing the RBFNN predictor with a GP model. The PMS scheme could leverage the uncertainty estimates from GPs to assess model accuracy and diversity, potentially enhancing the ensemble’s reliability. Furthermore, since GPs inherently quantify prediction variance, the framework could incorporate uncertainty-based weighting strategies within the WME. Moreover, random forests (RFs) are robust, scalable ensemble models capable of handling noisy and high-dimensional data. Their ensemble structure aligns well with the WME approach. Adapting PS-SAEA to RFs would involve managing multiple RF models or variants within the ensemble, using predictive accuracy and diversity metrics to guide probabilistic selection. In addition, other surrogates (e.g., support vector regression, polynomial regression) can also be integrated with modifications to the model management protocols. For instance, support vector regression models could be incorporated with appropriate diversity metrics, whereas polynomial models can serve as simpler, faster approximants for certain problem landscapes.

Possible extension: While the current study focuses on continuous problems, adaptations can be made to accommodate the discrete or hybrid nature of many real-world applications [7]. For surrogate model building, surrogates such as classification models (e.g., decision trees, random forests) or specialized kernel methods can be employed to model combinatorial solutions. For mixed-variable domains, hybrid surrogates combining continuous and discrete modeling techniques can be designed, enabling the surrogate to effectively approximate objective functions with mixed types. For the evolutionary operator, the evolutionary components can be tailored to incorporate discrete mutation, crossover, and neighborhood search strategies suitable for combinatorial variables. Incorporating problem-specific operators can enhance search efficiency within the variable space.

4.10. Limitations

Although PS-SAEA has shown promising results in previous experimental studies, two limitations should be noted. First, the performance of PS-SAEA heavily depends on the quality of the evaluated data used for model training. As a result, noisy or incomplete data may significantly degrade the algorithm’s effectiveness. To address this, appropriate data preprocessing or noise-handling techniques should be employed when applying PS-SAEA in such scenarios. To enhance robustness in practice, one should incorporate data preprocessing techniques such as noise filtering, outlier detection, normalization, or data augmentation to improve data quality prior to modeling. Additionally, employing robust surrogate modeling approaches, like ensemble models designed for noisy data, models that integrate uncertainty quantification, or regularization techniques, could further mitigate the impact of data imperfections. Adaptive strategies that dynamically detect and correct for noisy data, as well as active learning frameworks that selectively query the most informative samples, may also bolster the resilience of PS-SAFA in real-world applications. Second, the surrogate models and evolutionary operators used in PS-SAEA are specifically designed for continuous optimization problems. To extend its applicability to combinatorial EOPs, more suitable surrogate models and discrete evolutionary operators should be considered.

5. Conclusions

This paper aims to address the overfitting issue of surrogate models trained on limited evaluated data, a fundamental limitation in SAEAs. To enhance the generalization capability of surrogate-assisted optimization, this paper proposes a novel PS-SAEA that integrates a PMS and WME mechanism into the surrogate model management process. By employing PMS to probabilistically select models based on both accuracy and diversity and use WME to ensemble, the proposed PS-SAEA effectively mitigates overfitting and enhances the reliability of fitness approximation.

Extensive experimental evaluations on a suite of EOPs demonstrate that PS-SAEA consistently outperforms or performs comparably to several SAEAs. Moreover, ablation studies validate the essential role of the RWS strategy in enhancing surrogate generalization and guiding the evolutionary search toward more effective solutions.

In summary, PS-SAEA offers a simple yet effective improvement to surrogate-based optimization frameworks, providing a promising foundation for further advancements in SAEA research. The PS-SAEA offers promising potential for deployment in various practical fields such as engineering design and energy system optimization. In engineering design, e.g., car or airplane design, it can efficiently navigate complex, high-dimensional design spaces where function evaluations, such as physical testing or detailed simulations, are expensive and time-consuming. Similarly, in energy optimization, such as optimizing power plant operations and renewable energy scheduling, the algorithm can deliver high-quality solutions while significantly reducing the number of expensive evaluations. However, for practical use in real-world applications, it is recommended to preprocess data carefully, tune model parameters based on domain knowledge, and leverage parallel processing when needed. These strategies can help reliably achieve high-quality solutions with minimal computational cost.

Future work may explore different evaluation metrics (e.g., mean absolute error), adaptive ensemble updating, multitask learning, dimensionality reduction or sparse modeling techniques, multi-objective optimization, dynamic optimization, and the incorporation of uncertainty quantification to further enhance optimization robustness in scenarios with limited or noisy data. Furthermore, further application research is needed for real-world problems in all areas.

Author Contributions

Conceptualization, S.W. and J.-Y.L.; methodology, J.-Y.L.; software, S.W. and J.-Y.L.; validation, S.W. and J.-Y.L.; formal analysis, S.W. and J.-Y.L.; investigation, S.W.; resources, J.-Y.L.; data curation, S.W.; writing—original draft preparation, S.W.; writing—review and editing, J.-Y.L.; visualization, J.-Y.L.; supervision, J.-Y.L.; project administration, J.-Y.L.; funding acquisition, J.-Y.L. and J.-Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62406152, in part by the Tianjin Top Scientist Studio Project under Grant 24JRRCRC00030, in part by the Natural Science Foundation of Tianjin under Grant 24JCQNJC02100, in part by the Tianjin Belt and Road Joint Laboratory under Grant 24PTLYHZ00250, and in part by the Fundamental Research Funds for the Central Universities, Nankai University (078-63251088).

Data Availability Statement

The data will be made available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, Y.; Jiang, H.; Tian, Y.; Ma, H.; Zhang, X. Multigranularity Surrogate Modeling for Evolutionary Multiobjective Optimization with Expensive Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 2956–2968. [Google Scholar] [CrossRef]
Kenny, A.; Ray, T.; Singh, H.K. An Iterative Two-Stage Multifidelity Optimization Algorithm for Computationally Expensive Problems. IEEE Trans. Evol. Comput. 2023, 27, 520–534. [Google Scholar] [CrossRef]
Guo, Z.; Lin, S.; Suo, R.; Zhang, X. An Offline Weighted-Bagging Data-Driven Evolutionary Algorithm with Data Generation Based on Clustering. Mathematics 2023, 11, 431. [Google Scholar] [CrossRef]
Wu, X.; Liu, S.; Lin, Q.; Tan, K.C.; Leung, V.C.M. Evolutionary Multitasking with Adaptive Knowledge Transfer for Expensive Multiobjective Optimization. IEEE Trans. Evol. Comput. 2024, 1. [Google Scholar] [CrossRef]
Jin, Y.; Wang, H.; Chugh, T.; Guo, D.; Miettinen, K. Data-Driven Evolutionary Optimization: An Overview and Case Studies. IEEE Trans. Evol. Comput. 2019, 23, 442–458. [Google Scholar] [CrossRef]
Ikeguchi, T.; Nishihara, K.; Kawauchi, Y.; Koguma, Y.; Nakata, M. A Surrogate-Assisted Memetic Algorithm for Permuta-tion-Based Combinatorial Optimization Problems. Swarm Evol. Comput. 2025, 98, 102060. [Google Scholar] [CrossRef]
Liu, S.; Wang, H.; Peng, W.; Yao, W. Surrogate-Assisted Evolutionary Algorithms for Expensive Combinatorial Optimization: A Survey. Complex Intell. Syst. 2024, 10, 5933–5949. [Google Scholar] [CrossRef]
Haftka, R.T.; Villanueva, D.; Chaudhuri, A. Parallel Surrogate-Assisted Global Optimization with Expensive Functions—A Survey. Struct. Multidiscip. Optim. 2016, 54, 3–13. [Google Scholar] [CrossRef]
Yu, L.; Meng, Z.; Zhu, H. A Hierarchical Surrogate-Assisted Differential Evolution with Core Space Localization. IEEE Trans. Cybern. 2025, 55, 939–952. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Jin, Y.; Jansen, J.O. Data-Driven Surrogate-Assisted Multiobjective Evolutionary Optimization of a Trauma System. IEEE Trans. Evol. Comput. 2016, 20, 939–952. [Google Scholar] [CrossRef]
Russo, I.L.S.; Barbosa, H.J.C. A Multitasking Surrogate-Assisted Differential Evolution Method for Solving Bi-Level Optimization Problems. In Proceedings of the 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 18–23 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar]
Huang, P.; Wang, H.; Ma, W. Stochastic Ranking for Offline Data-Driven Evolutionary Optimization Using Radial Basis Function Networks with Multiple Kernels. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2050–2057. [Google Scholar]
He, C.; Zhang, Y.; Gong, D.; Ji, X. A Review of Surrogate-Assisted Evolutionary Algorithms for Expensive Optimization Problems. Expert Syst. Appl. 2023, 217, 119495. [Google Scholar] [CrossRef]
Li, J.Y.; Zhan, Z.H.; Wang, H.; Zhang, J. Data-Driven Evolutionary Algorithm with Perturbation-Based Ensemble Surrogates. IEEE Trans. Cybern. 2020, 51, 3925–3937. [Google Scholar] [CrossRef]
Li, J.Y.; Zhan, Z.H.; Wang, C.; Jin, H.; Zhang, J. Boosting Data-Driven Evolutionary Algorithm with Localized Data Generation. IEEE Trans. Evol. Comput. 2020, 24, 923–937. [Google Scholar] [CrossRef]
Petelin, D.; Filipič, B.; Kocijan, J. Optimization of Gaussian Process Models with Evolutionary Algorithms. In Adaptive and Natural Computing Algorithms. ICANNGA 2011; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 420–429. [Google Scholar]
Sun, Y.; Wang, H.; Xue, B.; Jin, Y.; Yen, G.G.; Zhang, M. Surrogate-Assisted Evolutionary Deep Learning Using an End-to-End Random Forest-Based Performance Predictor. IEEE Trans. Evol. Comput. 2020, 24, 350–364. [Google Scholar] [CrossRef]
Wang, H.; Jin, Y. A Random Forest-Assisted Evolutionary Algorithm for Data-Driven Constrained Multiobjective Combinatorial Optimization of Trauma Systems. IEEE Trans. Cybern. 2020, 50, 536–549. [Google Scholar] [CrossRef]
Sun, C.; Jin, Y.; Cheng, R.; Ding, J.; Zeng, J. Surrogate-Assisted Cooperative Swarm Optimization of High-Dimensional Expensive Problems. IEEE Trans. Evol. Comput. 2017, 21, 644–660. [Google Scholar] [CrossRef]
Zhang, M.; Li, H.; Pan, S.; Lyu, J.; Ling, S.; Su, S. Convolutional Neural Networks-Based Lung Nodule Classification: A Surrogate-Assisted Evolutionary Algorithm for Hyperparameter Optimization. IEEE Trans. Evol. Comput. 2021, 25, 869–882. [Google Scholar] [CrossRef]
Wang, H.; Jin, Y.; Sun, C.; Doherty, J. Offline Data-Driven Evolutionary Optimization Using Selective Surrogate Ensembles. IEEE Trans. Evol. Comput. 2018, 23, 203–216. [Google Scholar] [CrossRef]
Huang, P.; Wang, H.; Jin, Y. Offline Data-Driven Evolutionary Optimization Based on Tri-Training. Swarm Evol. Comput. 2021, 60, 100800. [Google Scholar] [CrossRef]
Zheng, N.; Wang, H. A Noise-Resistant Infill Sampling Criterion in Surrogate-Assisted Multi-Objective Evolutionary Algorithms. Swarm Evol. Comput. 2024, 86, 101492. [Google Scholar] [CrossRef]
Yang, C.; Ding, J.; Jin, Y.; Chai, T. Off-Line Data-Driven Multi-Objective Optimization: Knowledge Transfer between Surrogates and Generation of Final Solutions. IEEE Trans. Evol. Comput. 2020, 24, 409–423. [Google Scholar] [CrossRef]
Xue, X.; Hu, Y.; Feng, L.; Zhang, K.; Song, L.; Tan, K.C. Surrogate-Assisted Search with Competitive Knowledge Transfer for Expensive Optimization. IEEE Trans. Evol. Comput. 2024, 1. [Google Scholar] [CrossRef]
Pan, S.; Vermetten, D.; López-Ibáñez, M.; Bäck, T.; Wang, H. Transfer Learning of Surrogate Models via Domain Affine Transformation. In Proceedings of the Genetic and Evolutionary Computation Conference, Melbourne, Australia, 14–18 July 2024; ACM: New York, NY, USA, 2024; pp. 385–393. [Google Scholar]
Huang, H.; Gong, Y. Contrastive Learning: An Alternative Surrogate for Offline Data-Driven Evolutionary Computation. IEEE Trans. Evol. Comput. 2023, 27, 370–384. [Google Scholar] [CrossRef]
Zhu, H.; Wang, X.; Jin, Y. Federated Many-Task Bayesian Optimization. IEEE Trans. Evol. Comput. 2024, 28, 980–993. [Google Scholar] [CrossRef]
Sun, Y.H.; Huang, T.; Zhong, J.H.; Zhang, J.; Gong, Y.J. Symbolic Regression-Assisted Offline Data-Driven Evolutionary Computation. IEEE Trans. Evol. Comput. 2024, 29, 2158–2172. [Google Scholar] [CrossRef]
Zhao, K.; Wang, X.; Sun, C.; Jin, Y.; Hayat, A. Efficient Large-Scale Expensive Optimization via Surrogate-Assisted Sub-Problem Selection. IEEE Trans. Evol. Comput. 2025, 29, 2145–2157. [Google Scholar] [CrossRef]
Cui, M.; Li, L.; Zhou, M.; Abusorrah, A. Surrogate-Assisted Autoencoder-Embedded Evolutionary Optimization Algorithm to Solve High-Dimensional Expensive Problems. IEEE Trans. Evol. Comput. 2022, 26, 676–689. [Google Scholar] [CrossRef]
Tian, J.; Tan, Y.; Zeng, J.; Sun, C.; Jin, Y. Multiobjective Infill Criterion Driven Gaussian Process-Assisted Particle Swarm Optimization of High-Dimensional Expensive Problems. IEEE Trans. Evol. Comput. 2019, 23, 459–472. [Google Scholar] [CrossRef]
Lin, D.; Huang, H.; Li, X.; Gong, Y. Empirical Study of Data-Driven Evolutionary Algorithms in Noisy Environments. Mathematics 2022, 10, 943. [Google Scholar] [CrossRef]
Yu, M.; Li, X.; Liang, J. A Dynamic Surrogate-Assisted Evolutionary Algorithm Framework for Expensive Structural Optimization. Struct. Multidiscip. Optim. 2020, 61, 711–729. [Google Scholar] [CrossRef]
Hao, H.; Zhang, X.; Zhou, A. Expensive Optimization via Relation. IEEE Trans. Evol. Comput. 2025, 1. [Google Scholar] [CrossRef]
Stein, M. Large Sample Properties of Simulations Using Latin Hypercube Sampling. Technometrics 1987, 29, 143–151. [Google Scholar] [CrossRef]
Deb, K.; Agrawal, R.B. Simulated binary crossover for continuous search space. Complex Syst. 1995, 9, 115–148. [Google Scholar]
Deb, K.; Goyal, M. A combined genetic adaptive search (GeneAS) for engineering design. Comput. Sci. Inform. 1996, 26, 30–45. [Google Scholar]

Figure 1. The overall framework of PS-SAEA.

Figure 2. The problem number of significance better results, similar results, and significance worse results obtained by PS-SAEA when compared with (a) SAEA-SE, (b) BSAEA, (c) SAEA-PES, and (d) CL-SAEA.

Figure 3. Comparisons of average results between the PS-SAEA, PS-SAEA-P, and PS-SAEA-W on (a) 10D problems, (b) 30D problems, (c) 50D problems, and (d) 100D problems.

Figure 4. Comparisons of average results between the PS-SAEA variants with different r on (a) 10D problems, (b) 30D problems, (c) 50D problems, and (d) 100D problems.

Figure 5. Comparisons of average results between the PS-SAEA and Basic EAs on (a) 10D problems, (b) 30D problems, (c) 50D problems, and (d) 100D problems.

Table 1. The properties of the five adopted problems.

Problem ID	Function	Dimension (D)	Search Range
T1	Ellipsoid	{10, 30, 50, 100}	[−5.12, 5.12] ^D
T2	Rosenbrock		[−2.048, 2.048] ^D
T3	Ackley		[−32.768, 32.768] ^D
T4	Griewank		[−600, 600] ^D
T5	Rastrigin		[−5, 5] ^D

^D is the problem dimension.

Table 2. Comparisons between the PS-SAEA and other SAEAs in terms of average results.

D	ID	PS-SAEA	SAEA-SE	BSAEA	SAEA-PES	CL-SAEA
10	T1	8.45 × 10⁻¹ ± 2.92 × 10⁻¹	1.04 ± 4.94 × 10⁻¹ (≈)	1.08 ± 4.13 × 10⁻¹ (≈)	1.35 ± 6.35 × 10⁻¹ (+)	1.10 ± 7.39 × 10⁻¹ (≈)
	T2	2.26 × 10¹ ± 7.88	2.80 × 10¹ ± 5.17 (+)	3.44 × 10¹ ± 7.25 (+)	2.98 × 10¹ ± 7.51 (+)	2.99 ± 2.49 (−)
	T3	5.82 ± 9.51 × 10⁻¹	5.99 ± 7.69 × 10⁻¹ (≈)	6.65 ± 6.29 × 10⁻¹ (+)	6.39 ± 1.48 (≈)	6.49 ± 1.97 (+)
	T4	1.32 ± 1.43 × 10⁻¹	1.31 ± 1.46 × 10⁻¹ (≈)	1.26 ± 1.20 × 10⁻¹ (≈)	1.32 ± 1.33 × 10⁻¹ (≈)	4.96 × 10¹ ± 9.32 (+)
	T5	7.07 × 10¹ ± 2.54 × 10¹	5.98 × 10¹ ± 1.82 × 10¹ (≈)	6.64 × 10¹ ± 2.93 × 10¹ (≈)	5.99 × 10¹ ± 1.81 × 10¹ (≈)	3.22 × 10¹ ± 9.45 (−)
+/≈/−		NA	1/4/0	2/3/0	2/3/0	2/1/2
30	T1	3.67 ± 1.05	4.07 ± 1.33 (≈)	6.96 ± 2.16 (+)	5.34 ± 1.36 (+)	3.72 ± 1.70 (≈)
	T2	5.51 × 10¹ ± 2.92	5.73 × 10¹ ± 4.76 (≈)	6.94 × 10¹ ± 8.00 (+)	7.00 × 10¹ ± 8.37 (+)	5.70 ± 6.07 (≈)
	T3	4.78 ± 3.05 × 10⁻¹	4.85 ± 5.09 × 10⁻¹ (≈)	5.65 ± 6.66 × 10⁻¹ (+)	5.48 ± 3.53 × 10⁻¹ (+)	6.73 × 10¹ ± 1.34 × 10¹ (+)
	T4	1.14 ± 7.03 × 10⁻²	1.26 ± 7.63 × 10⁻² (+)	1.38 ± 1.12 × 10⁻¹ (+)	1.26 ± 9.21 × 10⁻² (+)	1.72 × 10² ± 1.61 × 10¹ (+)
	T5	9.39 × 10¹ ± 2.41 × 10¹	1.15 × 10² ± 2.68 × 10¹ (+)	1.57 × 10² ± 4.37 × 10¹ (+)	1.46 × 10² ± 3.01 × 10¹ (+)	5.15 ± 6.15 × 10⁻¹ (−)
+/≈/−		NA	2/3/0	5/0/0	5/0/0	2/2/1
50	T1	8.58 ± 3.05	1.34 × 10¹ ± 4.64 (+)	1.32 × 10¹ ± 3.10 (+)	1.46 × 10¹ ± 4.53 (+)	9.13 ± 2.56 (≈)
	T2	8.28 × 10¹ ± 3.46	8.21 × 10¹ ± 4.15 (≈)	9.84 × 10¹ ± 9.71 (+)	1.09 × 10² ± 1.09 × 10¹ (+)	6.69 × 10¹ ± 1.31 × 10¹ (−)
	T3	4.47 ± 3.64 × 10⁻¹	4.86 ± 3.54 × 10⁻¹ (+)	4.79 ± 3.41 × 10⁻¹ (+)	4.93 ± 4.27 × 10⁻¹ (+)	6.55 ± 6.05 (+)
	T4	1.17 ± 5.85 × 10⁻²	1.92 ± 2.32 × 10⁻¹ (+)	1.41 ± 8.96 × 10⁻² (+)	1.34 ± 9.44 × 10⁻² (+)	4.49 ± 2.17 × 10⁻¹ (+)
	T5	1.50 × 10² ± 2.71 × 10¹	1.77 × 10² ± 3.11 × 10¹ (+)	1.98 × 10² ± 2.98 × 10¹ (+)	2.36 × 10² ± 4.96 × 10¹ (+)	1.53 ± 4.89 × 10⁻² (≈)
+/≈/−		NA	4/1/0	5/0/0	5/0/0	2/2/1
100	T1	6.12 × 10¹ ± 2.65 × 10¹	2.79 × 10² ± 7.97 × 10¹ (+)	5.62 × 10¹ ± 1.24 × 10¹ (≈)	5.01 × 10² ± 5.01 × 10² (+)	6.64 × 10¹ ± 7.35 (≈)
	T2	1.71 × 10² ± 2.39 × 10¹	2.41 × 10² ± 2.52 × 10¹ (+)	1.95 × 10² ± 2.01 × 10¹ (+)	5.06 × 10² ± 1.90 × 10² (+)	1.73 × 10² ± 1.24 × 10¹ (≈)
	T3	4.12 ± 2.30 × 10⁻¹	7.21 ± 6.78 × 10⁻¹ (+)	4.69 ± 2.52 × 10⁻¹ (+)	5.21 ± 6.24 × 10⁻¹ (+)	4.48 ± 2.08 × 10⁻¹ (+)
	T4	2.15 ± 1.15	1.82 × 10¹ ± 2.00 (+)	1.84 ± 2.33 × 10⁻¹ (≈)	3.76 ± 1.98 (+)	2.59 ± 2.83 × 10⁻¹ (+)
	T5	3.66 × 10² ± 9.28 × 10¹	7.79 × 10² ± 8.35 × 10¹ (+)	4.29 × 10² ± 1.41 × 10² (≈)	9.20 × 10² ± 1.27 × 10² (+)	3.72 × 10² ± 4.97 × 10¹ (+)
+/≈/−		NA	5/0/0	2/3/0	5/0/0	3/2/0

Table 3. Comparisons between the PS-SAEA and other SAEAs in terms of mean absolute error.

D	ID	PS-SAEA	SAEA-SE	BSAEA	SAEA-PES	CL-SAEA
10	T1	0.845	1.017	1.069	1.336	1.065
	T2	22.609	28.015	33.461	29.401	3.958
	T3	5.816	5.904	6.655	6.405	6.375
	T4	1.318	1.307	1.275	1.309	49.296
	T5	70.729	59.258	66.230	59.851	32.891
30	T1	3.672	4.085	6.774	5.756	3.501
	T2	55.118	56.710	69.439	70.080	47.037
	T3	4.782	4.828	5.638	5.450	67.018
	T4	1.142	1.258	1.380	1.252	171.745
	T5	93.910	114.864	155.861	147.532	5.117
50	T1	8.581	13.145	13.145	14.289	7.128
	T2	82.839	82.074	98.874	108.977	67.038
	T3	4.468	4.818	4.815	4.942	6.250
	T4	1.171	1.938	1.413	1.334	4.488
	T5	150.374	176.980	196.195	235.000	1.247
100	T1	61.189	290.015	56.451	483.397	46.311
	T2	170.573	240.415	192.510	498.010	168.477
	T3	4.123	7.270	4.681	5.181	4.467
	T4	2.148	18.052	1.812	3.719	2.569
	T5	366.219	775.560	419.795	924.424	251.584

Table 4. Comparisons between the PS-SAEA, PS-SAEA-P, and PS-SAEA-W in terms of average results.

D	ID	PS-SAEA	PS-SAEA-P	PS-SAEA-W
10	T1	8.45 × 10⁻¹ ± 2.92 × 10⁻¹	1.12 ± 3.89 × 10⁻¹ (+)	8.69 × 10⁻¹ ± 3.03 × 10⁻¹ (≈)
	T2	2.26 × 10¹ ± 7.88	3.64 × 10¹ ± 6.10 (+)	2.64 × 10¹ ± 3.95 (≈)
	T3	5.82 ± 9.51 × 10⁻¹	6.89 ± 9.30 × 10⁻¹ (+)	5.04 ± 6.59 × 10⁻¹ (≈)
	T4	1.32 ± 1.43 × 10⁻¹	1.23 ± 1.08 × 10⁻¹ (≈)	1.29 ± 1.30 × 10⁻¹ (≈)
	T5	7.07 × 10¹ ± 2.54 × 10¹	7.60 × 10¹ ± 1.71 × 10¹ (≈)	5.95 × 10¹ ± 2.39 × 10¹ (≈)
+/≈/−		NA	3/2/0	0/5/0
30	T1	3.67 ± 1.05	6.24 ± 1.65 (+)	3.93 ± 1.08 (≈)
	T2	5.51 × 10¹ ± 2.92	6.46 × 10¹ ± 1.10 × 10¹ (+)	5.26 × 10¹ ± 4.23 (≈)
	T3	4.78 ± 3.05 × 10⁻¹	5.50 ± 5.83 × 10⁻¹ (+)	4.84 ± 4.71 × 10⁻¹ (≈)
	T4	1.14 ± 7.03 × 10⁻²	1.21 ± 5.15 × 10⁻² (+)	1.19 ± 4.13 × 10⁻² (+)
	T5	9.39 × 10¹ ± 2.41 × 10¹	1.52 × 10² ± 3.07 × 10¹ (+)	1.13 × 10² ± 1.71 × 10¹ (+)
+/≈/−		NA	5/0/0	2/3/0
50	T1	8.58 ± 3.05	1.27 × 10¹ ± 3.55 (+)	9.00 ± 1.83 (≈)
	T2	8.28 × 10¹ ± 3.46	9.93 × 10¹ ± 6.04 (+)	8.06 × 10¹ ± 5.73 (≈)
	T3	4.47 ± 3.64 × 10⁻¹	4.71 ± 4.07 × 10⁻¹ (≈)	4.40 ± 2.58 × 10⁻¹ (≈)
	T4	1.17 ± 5.85 × 10⁻²	1.21 ± 6.33 × 10⁻² (≈)	1.20 ± 5.77 × 10⁻² (≈)
	T5	1.50 × 10² ± 2.71 × 10¹	2.06 × 10² ± 4.52 × 10¹ (+)	1.48 × 10² ± 3.18 × 10¹ (≈)
+/≈/−		NA	3/2/0	0/5/0
100	T1	6.12 × 10¹ ± 2.65 × 10¹	1.34 × 10² ± 1.20 × 10² (+)	6.59 × 10¹ ± 3.50 × 10¹ (≈)
	T2	1.71 × 10² ± 2.39 × 10¹	2.08 × 10² ± 4.43 × 10¹ (+)	1.73 × 10² ± 2.12 × 10¹ (≈)
	T3	4.12 ± 2.30 × 10⁻¹	4.59 ± 3.93 × 10⁻¹ (+)	4.03 ± 1.76 × 10⁻¹ (≈)
	T4	2.15 ± 1.15	1.39 ± 1.96 × 10⁻¹ (−)	1.94 ± 6.02 × 10⁻¹ (≈)
	T5	3.66 × 10² ± 9.28 × 10¹	6.44 × 10² ± 1.84 × 10² (+)	3.74 × 10² ± 7.61 × 10¹ (≈)
+/≈/−		NA	4/0/1	0/5/0

Table 5. Comparisons between the PS-SAEA variant with different selection percentages in terms of average results.

D	ID	PS-SAEA (r = 7.5%)	PS-SAEA (r = 2.5%)	PS-SAEA (r = 5%)	PS-SAEA (r = 10%)	PS-SAEA (r = 15%)
10	T1	8.45 × 10⁻¹ ± 2.92 × 10⁻¹	9.39 × 10⁻¹ ± 5.17 × 10⁻¹ (≈)	8.93 × 10⁻¹ ± 3.25 × 10⁻¹ (≈)	8.95 × 10⁻¹ ± 3.63 × 10⁻¹ (≈)	1.00 ± 4.86 × 10⁻¹ (≈)
	T2	2.26 × 10¹ ± 7.88	2.94 × 10¹ ± 4.80 (+)	2.71 × 10¹ ± 3.59 (≈)	2.75 × 10¹ ± 3.46 (≈)	2.55 × 10¹ ± 4.82 (≈)
	T3	5.82 ± 9.51 × 10⁻¹	5.55 ± 8.21 × 10⁻¹ (≈)	5.84 ± 9.33 × 10⁻¹ (≈)	5.92 ± 9.73 × 10⁻¹ (≈)	5.48 ± 6.73 × 10⁻¹ (≈)
	T4	1.32 ± 1.43 × 10⁻¹	1.20 ± 8.47 × 10⁻² (≈)	1.26 ± 1.21 × 10⁻¹ (≈)	1.32 ± 1.35 × 10⁻¹ (≈)	1.32 ± 1.08 × 10⁻¹ (≈)
	T5	7.07 × 10¹ ± 2.54 × 10¹	8.54 × 10¹ ± 1.86 × 10¹ (≈)	3.98 × 10¹ ± 2.28 × 10¹ (−)	3.85 × 10¹ ± 2.37 × 10¹ (−)	5.33 × 10¹ ± 2.20 × 10¹ (≈)
+/≈/−		NA	1/4/0	0/4/1	0/4/1	0/5/0
30	T1	3.67 ± 1.05	5.06 ± 1.59 (+)	3.20 ± 8.80 × 10⁻¹ (≈)	3.22 ± 8.15 × 10⁻¹ (≈)	5.33 ± 2.10 (+)
	T2	5.51 × 10¹ ± 2.92	5.67 × 10¹ ± 4.47 (≈)	5.88 × 10¹ ± 2.59 (+)	5.92 × 10¹ ± 2.31 (+)	5.66 × 10¹ ± 4.60 (≈)
	T3	4.78 ± 3.05 × 10⁻¹	4.81 ± 3.15 × 10⁻¹ (≈)	5.23 ± 3.02 × 10⁻¹ (+)	5.42 ± 3.76 × 10⁻¹ (+)	4.92 ± 4.86 × 10⁻¹ (≈)
	T4	1.14 ± 7.03 × 10⁻²	1.19 ± 5.21 × 10⁻² (+)	1.17 ± 4.41 × 10⁻² (≈)	1.18 ± 4.78 × 10⁻² (≈)	1.20 ± 7.01 × 10⁻² (+)
	T5	9.39 × 10¹ ± 2.41 × 10¹	1.04 × 10² ± 2.06 × 10¹ (≈)	1.07 × 10² ± 1.83 × 10¹ (≈)	1.05 × 10² ± 1.45 × 10¹ (≈)	1.08 × 10² ± 2.51 × 10¹ (≈)
+/≈/−		NA	2/3/0	2/3/0	2/3/0	2/3/0
50	T1	8.58 ± 3.05	9.82 ± 2.10 (≈)	9.15 ± 2.17 (≈)	9.64 ± 2.42 (≈)	8.69 ± 1.97 (≈)
	T2	8.28 × 10¹ ± 3.46	8.50 × 10¹ ± 5.77 (≈)	8.23 × 10¹ ± 8.93 (≈)	8.51 × 10¹ ± 8.24 (≈)	8.19 × 10¹ ± 4.29 (≈)
	T3	4.47 ± 3.64 × 10⁻¹	4.29 ± 3.80 × 10⁻¹ (≈)	4.36 ± 2.71 × 10⁻¹ (≈)	4.35 ± 2.85 × 10⁻¹ (≈)	4.43 ± 3.54 × 10⁻¹ (≈)
	T4	1.17 ± 5.85 × 10⁻²	1.23 ± 1.03 × 10⁻¹ (≈)	1.18 ± 5.00 × 10⁻² (≈)	1.23 ± 5.44 × 10⁻² (≈)	1.24 ± 9.38 × 10⁻² (+)
	T5	1.50 × 10² ± 2.71 × 10¹	1.45 × 10² ± 4.23 × 10¹ (≈)	1.37 × 10² ± 2.65 × 10¹ (≈)	1.40 × 10² ± 2.36 × 10¹ (≈)	1.36 × 10² ± 3.33 × 10¹ (≈)
+/≈/−		NA	0/5/0	0/5/0	0/5/0	1/4/0
100	T1	6.12 × 10¹ ± 2.65 × 10¹	5.78 × 10¹ ± 2.13 × 10¹ (≈)	9.09 × 10¹ ± 5.22 × 10¹ (≈)	9.11 × 10¹ ± 5.27 × 10¹ (≈)	5.37 × 10¹ ± 1.52 × 10¹ (≈)
	T2	1.71 × 10² ± 2.39 × 10¹	1.69 × 10² ± 2.30 × 10¹ (≈)	1.86 × 10² ± 2.79 × 10¹ (+)	1.89 × 10² ± 2.47 × 10¹ (+)	1.67 × 10² ± 1.60 × 10¹ (≈)
	T3	4.12 ± 2.30 × 10⁻¹	4.10 ± 2.15 × 10⁻¹ (≈)	4.16 ± 2.17 × 10⁻¹ (≈)	4.21 ± 2.22 × 10⁻¹ (≈)	4.09 ± 2.85 × 10⁻¹ (≈)
	T4	2.15 ± 1.15	1.83 ± 4.07 × 10⁻¹ (≈)	1.76 ± 5.84 × 10⁻¹ (≈)	1.87 ± 5.64 × 10⁻¹ (≈)	2.12 ± 1.38 (≈)
	T5	3.66 × 10² ± 9.28 × 10¹	3.99 × 10² ± 8.72 × 10¹ (≈)	4.45 × 10² ± 1.53 × 10² (+)	4.54 × 10² ± 1.84 × 10² (+)	4.71 × 10² ± 1.12 × 10² (+)
+/≈/−		NA	0/5/0	2/3/0	2/3/0	1/4/0

Table 6. Comparisons of average time cost (seconds) per generation between the PS-SAEA and other SAEAs.

ID	D	PS-SAEA	SAEA-SE	BSAEA	SAEA-PES
T1	10	0.172	0.187	0.169	0.177
	30	0.679	0.665	0.725	0.747
	50	2.201	2.132	1.220	2.351
	100	11.654	11.389	4.826	12.124
T5	10	0.179	0.159	0.185	0.176
	30	0.642	0.561	0.601	0.701
	50	3.133	3.127	1.300	2.934
	100	11.982	11.660	4.755	12.116

Table 7. Comparisons of time cost (seconds) among the PS-SAEA variants with different TR.

ID	D	TR = 1	TR = 4	TR = 8	TR = 12
T1	10	20.591	5.687	2.743	1.830
	30	81.431	21.829	10.806	7.303
	50	264.141	72.180	35.737	23.375
	100	1398.481	374.784	188.960	125.131
T5	10	21.497	5.935	2.884	1.987
	30	77.019	20.614	10.348	7.115
	50	376.007	99.406	51.183	33.344
	100	1437.782	395.134	189.457	128.445

Table 8. Comparisons between the PS-SAEA and Basic EAs in terms of average results.

D	ID	PS-SAEA (11 × D Data)	BEA (11 × D Data)	BEA (55 × D Data)	BEA (110 × D Data)
10	T1	8.45 × 10⁻¹ ± 2.92 × 10⁻¹	1.41 × 10² ± 4.36 × 10¹ (+)	4.31 ± 2.83 (+)	1.31 ± 8.04 × 10⁻¹ (+)
	T2	2.26 × 10¹ ± 7.88	6.15 × 10² ± 8.22 × 10¹ (+)	3.43 × 10¹ ± 1.21 × 10¹ (+)	1.05 × 10¹ ± 4.14 (−)
	T3	5.82 ± 9.51 × 10⁻¹	1.80 × 10¹ ± 9.12 × 10⁻¹ (+)	8.77 ± 9.12 × 10⁻¹ (+)	4.33 ± 8.14 × 10⁻¹ (−)
	T4	1.32 ± 1.43 × 10⁻¹	8.63 × 10¹ ± 1.07 × 10¹ (+)	2.76 ± 2.56 × 10⁻¹ (+)	6.80 × 10⁻¹ ± 2.12 × 10⁻¹ (−)
	T5	7.07 × 10¹ ± 2.54 × 10¹	9.57 × 10¹ ± 3.96 (+)	1.37 × 10¹ ± 1.15 (−)	1.76 ± 1.28 (−)
+/≈/−		NA	5/0/0	4/0/1	1/0/4
30	T1	3.67 ± 1.05	1.50 × 10³ ± 7.05 × 10¹ (+)	4.59 × 10¹ ± 1.12 × 10¹ (+)	2.08 × 10¹ ± 6.56 (+)
	T2	5.51 × 10¹ ± 2.92	3.76 × 10³ ± 2.61 × 10² (+)	1.85 × 10² ± 3.17 × 10¹ (+)	1.84 × 10² ± 4.35 × 10¹ (+)
	T3	4.78 ± 3.05 × 10⁻¹	2.01 × 10¹ ± 9.00 × 10⁻² (+)	9.66 ± 1.18 (+)	6.89 ± 1.03 (+)
	T4	1.14 ± 7.03 × 10⁻²	3.56 × 10² ± 2.31 × 10¹ (+)	1.02 × 10¹ ± 2.64 (+)	6.62 ± 4.82 × 10⁻¹ (+)
	T5	9.39 × 10¹ ± 2.41 × 10¹	3.16 × 10² ± 2.95 × 10¹ (+)	2.43 × 10¹ ± 4.63 (−)	1.93 × 10¹ ± 2.53 (−)
+/≈/−		NA	5/0/0	4/0/1	4/0/1
50	T1	8.58 ± 3.05	3.82 × 10³ ± 4.91 × 10² (+)	1.43 × 10² ± 4.40 × 10¹ (+)	1.26 × 10² ± 4.43 × 10¹ (+)
	T2	8.28 × 10¹ ± 3.46	5.53 × 10³ ± 6.13 × 10² (+)	3.12 × 10² ± 4.84 × 10¹ (+)	2.91 × 10² ± 5.52 × 10¹ (+)
	T3	4.47 ± 3.64 × 10⁻¹	1.98 × 10¹ ± 1.14 × 10⁻¹ (+)	8.61 ± 4.35 × 10⁻¹ (+)	9.04 ± 1.47 (+)
	T4	1.17 ± 5.85 × 10⁻²	6.27 × 10² ± 2.18 × 10¹ (+)	1.99 × 10¹ ± 2.86 (+)	1.72 × 10¹ ± 5.65 (+)
	T5	1.50 × 10² ± 2.71 × 10¹	5.58 × 10² ± 1.44 × 10¹ (+)	4.50 × 10¹ ± 7.19 (−)	4.28 × 10¹ ± 5.17 (−)
+/≈/−		NA	5/0/0	4/0/1	4/0/1
100	T1	6.12 × 10¹ ± 2.65 × 10¹	1.26 × 10⁴ ± 6.17 × 10² (+)	9.88 × 10² ± 2.25 × 10² (+)	9.75 × 10² ± 3.04 × 10² (+)
	T2	1.71 × 10² ± 2.39 × 10¹	1.01 × 10⁴ ± 7.29 × 10² (+)	1.00 × 10³ ± 1.92 × 10² (+)	9.44 × 10² ± 1.73 × 10² (+)
	T3	4.12 ± 2.30 × 10⁻¹	1.95 × 10¹ ± 1.33 × 10⁻¹ (+)	1.03 × 10¹ ± 7.15 × 10⁻¹ (+)	9.77 ± 7.83 × 10⁻¹ (+)
	T4	2.15 ± 1.15	9.44 × 10² ± 4.32 × 10¹ (+)	7.83 × 10¹ ± 1.86 × 10¹ (+)	7.19 × 10¹ ± 1.99 × 10¹ (+)
	T5	3.66 × 10² ± 9.28 × 10¹	1.03 × 10³ ± 3.17 × 10¹ (+)	1.43 × 10² ± 2.61 × 10¹ (−)	1.38 × 10² ± 2.56 × 10¹ (−)
+/≈/−		NA	5/0/0	4/0/1	4/0/1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Li, J.-Y. Probability Selection-Based Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization. Appl. Sci. 2025, 15, 11404. https://doi.org/10.3390/app152111404

AMA Style

Wang S, Li J-Y. Probability Selection-Based Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization. Applied Sciences. 2025; 15(21):11404. https://doi.org/10.3390/app152111404

Chicago/Turabian Style

Wang, Siyuan, and Jian-Yu Li. 2025. "Probability Selection-Based Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization" Applied Sciences 15, no. 21: 11404. https://doi.org/10.3390/app152111404

APA Style

Wang, S., & Li, J.-Y. (2025). Probability Selection-Based Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization. Applied Sciences, 15(21), 11404. https://doi.org/10.3390/app152111404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Probability Selection-Based Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization

Abstract

1. Introduction

2. Background and Related Work of SAEA

2.1. SAEA

2.2. Related Work on Enhanced SAEAs

3. The Proposed PS-SAEA

3.1. Aims of the Study

3.2. The Framework of PS-SAEA

3.3. Model Training

3.4. PMS

3.5. WME

3.6. The Entire PS-SAEA

4. Experimental Studies

4.1. Experiment Setup

4.2. Compared Advanced Algorithms

4.3. Comparison Study with SAEAs

4.4. Component Analysis of PS-SAEA

4.5. Parameter Study of PS-SAEA

4.6. Computational Cost of PS-SAEA

4.7. Scalability with Parallel Processing

4.8. Comparisons with Basic EAs Without Surrogate-Assisted Method

4.9. Discussions

4.10. Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI