*Article* **Influence of Binomial Crossover on Approximation Error of Evolutionary Algorithms**

**Cong Wang 1, Jun He 2, Yu Chen 1,\* and Xiufen Zou 3,4**


**Abstract:** Although differential evolution (DE) algorithms perform well on a large variety of complicated optimization problems, only a few theoretical studies are focused on the working principle of DE algorithms. To make the first attempt to reveal the function of binomial crossover, this paper aims to answer whether it can reduce the approximation error of evolutionary algorithms. By investigating the expected approximation error and the probability of not finding the optimum, we conduct a case study comparing two evolutionary algorithms with and without binomial crossover on two classical benchmark problems: OneMax and Deceptive. It is proven that using binomial crossover leads to the dominance of transition matrices. As a result, the algorithm with binomial crossover asymptotically outperforms that without crossover on both OneMax and Deceptive, and outperforms on OneMax, however, not on Deceptive. Furthermore, an adaptive parameter strategy is proposed which can strengthen the superiority of binomial crossover on Deceptive.

**Keywords:** binomial crossover; differential evolution; fixed-budget analysis; evolutionary computation; approximation error

**MSC:** 90C15

#### **1. Introduction**

Evolutionary algorithms (EAs) are a family of randomized search heuristics inspired from biological evolution, and many empirical studies demonstrate that crossovers that combine genes of two parents to generate new offspring could be helpful to the convergence of EAs [1–3]. Meanwhile, theoretical results on runtime analysis validate the promising function of crossover in EAs [4–15], whereas there are also some cases that crossover cannot be helpful [16,17].

By exchanging components of target vectors with donor vectors, differential evolution (DE) algorithms implement crossover operations in a different way. Numerical results show that continuous DE algorithms can achieve competitive performance on a large variety of complicated problems [18–21], and its competitiveness is to great extent attributed to the employed crossover operations [22]. However, the binary differential evolution (BDE) algorithm [23], which simulates the working mechanism of continuous DE, is not as competitive as its continuous counterpart. Analysis of the working principle indicates that the mutation and update strategies result in poor convergence of BDE [24], but there were no theoretical results reported on how crossover influences the performance of discretecoded DE algorithms.

This paper is dedicated to investigating the influence of binomial crossover by introducing it to the (1 + 1)EA, excluding the impacts of population and mutation strategies of DE. Although the expected hitting time/runtime is popularly investigated in the theoretical study of randomized search heuristics (RSHs), there is a gap between runtime analysis

**Citation:** Wang, C.; He, J.; Chen, Y.; Zou, X. Influence of Binomial Crossover on Approximation Error of Evolutionary Algorithms. *Mathematics* **2022**, *10*, 2850. https:// doi.org/10.3390/math10162850

Academic Editors: Alexandru Agapie, Denis Enachescu, Vlad Stefan Barbu and Bogdan Iftimie

Received: 24 July 2022 Accepted: 9 August 2022 Published: 10 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and practice because their optimization time to reach an optimum is uncertain and could be even infinite in continuous optimization [25]. Due to this reason, optimization time is seldom used in computer simulation for evaluating the performance of EAs, and their performance is evaluated after running finite generations by solution quality such as the mean and median of the fitness value or approximation error [26]. In theory, solution quality can be measured for a given iteration budget by the expected fitness value [27] or approximation error [28,29], which contributes to the analysis framework named fixed-budget analysis (FBA). An FBA on immune-inspired hypermutations led to theoretical results that are very different from those of runtime analysis but consistent with the empirical results, which demonstrates that the perspective of fixed-budget computations provides valuable information and additional insights for the performance of randomized search heuristics [30].

Accordingly, we evaluate the solution quality of an EA after running finite generations by the expected approximation error and the error tail probability. The former measures the fitness gap between a solution and optimum, and the latter is the probability distribution of the error over error levels, which measures the probability of finding the optimum. An EA is said to outperform another if, for the former EA, its error and tail probability are smaller. Furthermore, an EA is said to asymptotically outperform another if, for the former EA, its error and tail probability are smaller after a sufficiently large number of generations.

The research question of this paper is whether the binomial crossover operator can help reduce the approximation error of EA. As a pioneering work on this topic, we investigate a (1 + 1)*EAC* that performs the binomial crossover on an individual and an offspring generated by mutation, and compare a (1 + 1)*EA* without crossover and its variant (1 + 1)*EAC* on two classical problems, OneMax and Deceptive. By splitting the objective space into error levels, the analysis is performed based on the Markov chain models [31,32]. Given the two EAs, the comparison of their performance is drawn from the comparison of their transition probabilities, which are estimated by investigating the bits preferred by evolutionary operations. Under some conditions, (1 + 1)*EAC* with binomial crossover outperforms (1 + 1)*EA* on OneMax, but not on Deceptive; however, by adding an adaptive parameter mechanism arising from theoretical results, (1 + 1)*EAC* with binomial crossover outperforms (1 + 1)*EA* on Deceptive too.

This work presents the first study on how binomial crossover influences the expected runtime and tail probability of randomized search heuristics. Meanwhile, we also propose a feasible routine to get adaptive parameter settings of EAs from theoretical results. The rest of this paper is organized as follows. Section 2 reviews related theoretical work. Preliminary contents for our theoretical analysis are presented in Section 3. Then, the influence of the binomial crossover on transition probabilities is investigated in Section 4. Section 5 conducts an analysis of the asymptotic performance of EAs. To reveal how binomial crossover works on the performance of EAs for consecutive iterations, the OneMax problem and the Deceptive problem are investigated in Sections 6 and 7, respectively. Finally, Section 8 presents the conclusions and discussions.

#### **2. Related Work**

#### *2.1. Theoretical Analysis of Crossover in Evolutionary Algorithms*

To understand how crossover influences the performance of EAs, Jansen et al. [4] proved that an EA using crossover can reduce the expected optimization time from superpolynomial to a polynomial of small degree on the function Jump. Kötzing et al. [5] investigated crossover-based EAs on the functions OneMax and Jump and showed the potential speedup by crossover when combined with a fitness-invariant bit shuffling operator in terms of optimization time. For a simple GA without shuffling, they found that the crossover probability has a drastic impact on the performance on Jump. Corus and Oliveto [6] obtained an upper bound on the runtime of standard steady-state GAs to hillclimb the OneMax function and proved that the steady-state EAs are 25% faster than their mutation-only counterparts. Their analysis also suggests that larger populations may

be faster than populations of size 2. Dang et al. [7] revealed that the interplay between crossover and mutation may result in a sudden burst of diversity on the Jump test function and reduce the expected optimization time compared to mutation-only algorithms such as (1 + 1) EA. For royal road functions and OneMax, Sudholt [8] analyzed uniform crossover and k-point crossover and proved that crossover makes every (*μ* + *λ*) EA at least twice as fast as the fastest EA using only standard bit mutation. Pinto and Doerr [9] provided a simple proof of a crossover-based genetic algorithm (GA) outperforming any mutationbased black-box heuristic on the classic benchmark OneMax. Oliveto et al. [10] obtained a tight lower bound on the expected runtime of the (2 + 1) GA on OneMax. Lengler and Meier [11] studied the positive effect of using larger population sizes and crossover on Dynamic BinVal.

For non-artificial problems, Lehre and Yao [12] proved that the use of crossover in the (*μ* + 1) steady-state genetic algorithm may reduce the runtime from exponential to polynomial for some instance classes of the problem of computing unique input–output (UIO) sequences. Doerr et al. [13,14] analyzed EAs on the all-pairs shortest path problem. Their results confirmed that the EA with a crossover operator is significantly faster in terms of the expected optimization time. Sutton [15] investigated the closest string problem and proved that a multi-start (*μ* + 1) GA required less randomized fixed-parameter tractable (FPT) time than that with disabled crossover.

However, there is some evidence that crossover is not always helpful. Richter et al. [16] constructed Ignoble Trail functions and proved that mutation-based EAs optimize them more efficiently than GAs with crossover. The later need exponential optimization time. Antipov and Naumov [17] compared crossover-based algorithms on RealJump functions with a slightly shifted optimum, which increases the runtime of all considered algorithms on RealJump. The hybrid GA fails to find the shifted optimum with high probability.

#### *2.2. Theoretical Analysis of Differential Evolution Algorithms*

Most existing theoretical studies on DE are focused on continuous variants [33]. By estimating the probability density function of generated individuals, Zhou et al. [34] demonstrated that the selection mechanism of DE, which chooses mutually different parents for the generation of donor vectors, sometimes does not work positively on the performance of DE. Zaharie and Micota [35–37] investigated the influence of the crossover rate on both the distribution of the number of mutated components and the probability for a component to be taken from the mutant vector, as well as the influence of mutation and crossover on the diversity of the intermediate population. Wang and Huang [38] attributed the DE to a one-dimensional stochastic model, and investigated how the probability distribution of population is connected to the mutation, selection, and crossover operations of DE. Opara and Arabas [39] compared several variants of the differential mutation using characteristics of their expected mutants' distribution, which demonstrated that the classic mutation operators yield similar search directions and differ primarily by the mutation range. Furthermore, they formalized the contour fitting notion and derived an analytical model that links the differential mutation operator with the adaptation of the range and direction of search [40].

By investigating the expected runtime of BDE, Doerr and Zhang [24] performed a first fundamental analysis on the working principles of discrete-coded DE. It was shown that BDE optimizes the important decision variables, but is hard to find the optima for decision variables with a small influence on the objective function. Since BDE generates trial vectors by implementing a binary variant of binomial crossover accompanied by the mutation operation, it has characteristics significantly different from classic EAs or estimation-of-distribution algorithms.

#### *2.3. Fixed-Budget Analysis and Approximation Error*

To bridge the wide gap between theory and application, Jasen and Zarges [27] proposed an FBA framework of RSHs, by which the fitness of random local search and (1 + 1)

EA were investigated for given iteration budgets. Under the framework of FBA, Jasen and Zarges [41] analyzed the any-time performance of EAs and artificial immune systems on a proposed dynamic benchmark problem. Nallaperuma et al. [42] considered the well-known traveling salesperson problem (TSP) and derived the lower bounds of the expected fitness gain for a specified number of generations. Based on the Markov chain model of RSHs, Wang et al. [29] constructed a general framework of FBA, by which they found the analytic expression of approximation error instead of asymptotic results of expected fitness values. Doerr et al. [43] built a bridge between runtime analysis and FBA, by which a huge body of work and a large collection of tools for the analysis of the expected optimization time could meet the new challenges introduced by the new fixed-budget perspective.

Noting that hypermutations tend to be inferior to typical example functions in terms of runtime, Jansen and Zarges [30] conducted an FBA to explain why artificial immune systems are popular in spite of these proven drawbacks. It was shown that the inversely fitness-proportional mutation (IFPM) and the somatic contiguous hypermutation (CHM) could perform better than the single point mutation on OneMax while FBA is performed by considering different starting points and varied iteration budgets. It indicates that the traditional perspective of expected optimization time may be unable to explain the observed good performance, which is due to the limited length of runs. Therefore, the perspective of fixed-budget computations provides valuable information and additional insights.

#### **3. Preliminaries**

*3.1. Problems*

Considering a maximization problem

$$\max f(\mathbf{x}), \quad \mathbf{x} = (\mathbf{x}\_1, \dots, \mathbf{x}\_n) \in \{0, 1\}^n,$$

denote its optimal solution by **x**∗ and optimal objective value by *f* ∗. The quality of a solution **x** is evaluated by its approximation error *e*(**x**) :=| *f*(**x**) − *f* <sup>∗</sup> |. The error *e*(**x**) takes finite values, called error levels:

$$e(\mathbf{x}) \in \{e\_0, e\_1, \dots, e\_L\}, \quad 0 = e\_0 \le e\_1 \le \dots \le e\_{L'}$$

where *L* is a non-negative integer. **x** is called *at the level i* if *e*(**x**) = *ei*, *i* ∈ {0, 1, ... , *L*}. The collection of solutions at level *i* is denoted by X*i*.

We investigate the optimization problem in the form

$$\max f(|\ge|),$$

where <sup>|</sup> **<sup>x</sup>** <sup>|</sup>:<sup>=</sup> <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *xi*. Error levels of (1) take only *n* + 1 values. Two instances, the unimodal OneMax problem and the multi-modal Deceptive problem, are considered in this paper.

**Problem 1** (OneMax)**.**

$$\max f(\mathbf{x}) = \sum\_{i=1}^{n} \mathbf{x}\_{i\prime} \quad \mathbf{x} = (\mathbf{x}\_1, \dots, \mathbf{x}\_n) \in \{0, 1\}^n.$$

**Problem 2** (Deceptive)**.**

$$\max f(\mathbf{x}) = \begin{cases} \sum\_{i=1}^{n} \mathbf{x}\_{i\prime} & \text{if } \sum\_{i=1}^{n} \mathbf{x}\_{i} > n - 1, \\\ n - 1 - \sum\_{i=1}^{n} \mathbf{x}\_{i\prime} & \text{otherwise,} \end{cases} \quad \mathbf{x} = (\mathbf{x}\_{1}, \dots, \mathbf{x}\_{n}) \in \{0, 1\}^{n}.$$

For the OneMax problem, both exploration and exploitation are helpful to the convergence of EAs to the optimum, because exploration accelerates the convergence process and exploitation refines the precision of approximation solutions. However, for the Deceptive problem, local exploitation leads to convergence to the local optimum, but it in turn increases the difficulty to jump to the global optimum. That is, exploitation hinders convergence to the global optimum of the Deceptive problem, thus, the performance of EAs is dominantly influenced by their exploration ability.

#### *3.2. Evolutionary Algorithms*

For the sake of analysis on binomial crossover excluding the influence of population and mutation, the (1 + 1)*EA* presented in Algorithm 1 is taken as the baseline algorithm in our study. Its candidate solutions are generated by the bitwise mutation with probability *pm*. The binomial crossover is appended to (1 + 1)*EA*, getting (1 + 1)*EAC* which is illustrated in Algorithm 2. The (1 + 1)*EAC* first performs bitwise mutation with probability *qm*, and then applies binomial crossover with rate *CR* to generate a candidate solution for selection.

The EAs investigated in this paper can be modeled as homogeneous Markov chains [31,32]. Given the error vector

$$\tilde{\mathbf{e}} = (e\_0, e\_1, \dots, e\_L)',\tag{2}$$

and the initial distribution

$$\mathfrak{q}^{[0]} = (q\_0^{[0]}, q\_1^{[0]}, \dots, q\_L^{[0]})' \tag{3}$$

the transition matrix of (1 + 1)*EA* and (1 + 1)*EAC* for the optimization problem (1) can be written in the form

$$\tilde{\mathbf{R}} = (r\_{\hat{i},\hat{j}})\_{(L+1)\times(L+1)\prime} \tag{4}$$

where

$$r\_{i,j} = \Pr\{\mathbf{x}\_{t+1} \in \mathcal{X}\_i \mid \mathbf{x}\_t \in \mathcal{X}\_j\}, \quad i, j = 0, \dots, L.$$

#### **Algorithm 1** (1 + 1)*EA*

2: randomly generate a solution **x**<sup>0</sup> = (*x*1,..., *xn*);

3: **while** the stopping criterion is not satisfied **do**

4: generate the mutant **y***<sup>t</sup>* = (*y*1,..., *yn*) by bitwise mutation:

$$\text{for } i = 1, \ldots, n\_{\prime} \quad y\_{i} = \begin{cases} 1 - \mathbb{x}\_{i\prime} & \text{if } rnd\_{i} < p\_{m\prime} \\ \mathbb{x}\_{i\prime} & \text{otherwise,} \end{cases} \quad rnd\_{i} \sim \mathcal{U}[0, 1]; \tag{5}$$

5: **if** *f*(**y**) ≥ *f*(**x***t*) **then** 6: **x***t*+<sup>1</sup> = **y***t*; 7: **else** 8: **x***t*+<sup>1</sup> = **x***t*; 9: **end if** 10: *t* = *t* + 1; 11: **end while**

<sup>1:</sup> counter *t* = 0;

**Algorithm 2** (1 + 1)*EAC*

1: counter *t* = 0;


$$\text{for } i = 1, \ldots, n\_{\prime} \quad v\_{i} = \begin{cases} 1 - \chi\_{i\prime} & \text{if } rnd1\_i < q\_{m\prime} \\ \chi\_{i\prime} & \text{otherwise,} \end{cases} \quad rnd1\_i \sim \mathcal{U}[0, 1]; \tag{6}$$


$$\text{for } i = 1, \dots, n\_{\prime} \quad y\_{i} = \begin{cases} v\_{i\prime} & \text{if } i = rndi \text{ or } rnd2\_i < \mathbb{C}\_{\mathbb{R}\_{\prime}} \\ \mathbb{x}\_{i\prime} & \text{otherwise,} \end{cases} \\ \text{rnd2}\_i \sim \mathcal{U}[0, 1]; \qquad (7)$$

7: **if** *f*(**y**) ≥ *f*(**x***t*) **then** 8: **x***t*+<sup>1</sup> = **y***t*; 9: **else** 10: **x***t*+<sup>1</sup> = **x***t*; 11: **end if** 12: *t* = *t* + 1; 13: **end while**

Recalling that the solutions are updated by the elitist selection, we know **R˜** is an upper triangular matrix that can be partitioned as

$$
\tilde{\mathbf{R}} = \begin{pmatrix} 1 & \mathbf{r}\_0 \\ \mathbf{0} & \mathbf{R} \end{pmatrix}.
$$

,

where **r**<sup>0</sup> represents the probabilities to transfer from non-optimal statuses to the optimal status, and **R** is the transition submatrix depicting the transitions between nonoptimal statuses.

#### *3.3. Transition Probabilities*

Transition probabilities can be confirmed by considering generation of a candidate **y** with *f*(**y**) ≥ *f*(**x**), which is achieved if "*l* preferred bits" of **x** are changed. If there are multiple solutions that are better than **x**, there could be multiple choices for both the number *l* and the location of "*l* preferred bits".

**Example 1.** *For the OneMax problem, e*(**x**) *equals to the amount of '0'-bits in* **x***. Denoting e*(**x**) = *j and e*(**y**) = *i, we know* **y** *replaces* **x** *if and only if j* ≥ *i. Then, to generate a candidate* **y** *replacing* **x***, "l preferred bits" can be confirmed as follows.*


If an EA flips each bit with an identical probability, the probability of flipping *l* bits are related to *l* and independent of their locations. Denoting the probability of flipping *l* bits by *P*(*l*), we can confirm the connection between the transition probability *ri*,*<sup>j</sup>* and *P*(*l*).

As presented in Example 1, transition from level *j* to level *i* (*i* < *j*) results from flips of *j* − *i* + *k* '0'-bits and *k* '1'-bits. Then, transition probabilities for OneMax are confirmed as

$$r\_{i,j} = \sum\_{k=0}^{M} \mathbb{C}\_{n-j}^{k} \mathbb{C}\_{j}^{k + (j - i)} P(2k + j - i), \tag{8}$$

where *M* = min{*n* − *j*, *i*}, 0 ≤ *i* < *j* ≤ *n*.

According to definition of the Deceptive problem, we get the following map from | **x** | to *e*(**x**).

$$\begin{array}{ccccccccc}\hline \mathbf{x} & \text{:} & \mathbf{0} & \mathbf{1} & \cdots & n-1 & n\\ \mathbf{z}(\mathbf{x}) & \text{:} & \mathbf{1} & \mathbf{2} & \cdots & n & \mathbf{0} \\ \hline \end{array} \tag{9}$$

Transition from level *j* to level *i* (0 ≤ *i* < *j* ≤ *n*) is attributed to one of the following cases.


Accordingly, transition probabilities for Deceptive are confirmed as

$$r\_{i,j} = \begin{cases} \sum\_{k=0}^{M} \mathbb{C}\_{n-j+1}^{k} \mathbb{C}\_{j-1}^{k+(j-i)} P(2k+j-i), & i \ge 1, \\ P(n-j+1), & i=0, \end{cases} \tag{10}$$

where *M* = min{*n* − *j* + 1, *i* − 1}.

#### *3.4. Performance Metrics*

To evaluate the performance of EAs, we propose two metrics for a given iteration budget, the expected approximation error (EAE) and the tail probability (TP) of EAs for *t* consecutive iterations.

**Definition 1.** *Let* {**x***t*, *t* = 1, 2 . . . } *be the individual sequence of an individual-based EA.*

*(1) The expected approximation error (EAE) after t consecutive iterations is*

$$\varepsilon^{[t]} = \mathbb{E}[\varepsilon(\mathbf{x}\_t)] = \sum\_{i=0}^{L} \varepsilon\_i \Pr\{\varepsilon(\mathbf{x}\_t) = \mathbf{e}\_i\}. \tag{11}$$

*(2) Given i* > 0*, the tail probability (TP) of the approximation error that e*(**x***t*) *is greater than or equal to ei is defined as*

$$p^{[t]}(e\_i) = \Pr\{e(\mathbf{x}\_l) \ge e\_i\}.\tag{12}$$

EAE is the fitness gap between a solution and the optimum. It measures solution quality after running *t* generations. TP is the probability distribution of a found solution over non-optimal levels where *i* > 0. The sum of TP is the probability of not finding the optimum.

Given two EAs A and B, if both EAE and TP of Algorithm A are smaller than those of Algorithm B for any iteration budget, we say Algorithm A outperforms Algorithm B on problem (1).

**Definition 2.** *Let* A *and* B *be two EAs applied to problem (1).*

	- *e* [*t*] <sup>A</sup> <sup>−</sup> *<sup>e</sup>* [*t*] <sup>B</sup> <sup>≤</sup> <sup>0</sup>*,* <sup>∀</sup> *<sup>t</sup>* <sup>&</sup>gt; <sup>0</sup>*;*
	- *p* [*t*] <sup>A</sup> (*ei*) <sup>−</sup> *<sup>p</sup>* [*t*] <sup>B</sup> (*ei*) <sup>≤</sup> <sup>0</sup>*,* <sup>∀</sup> *<sup>t</sup>* <sup>&</sup>gt; <sup>0</sup>*,* <sup>0</sup> <sup>&</sup>lt; *<sup>i</sup>* <sup>&</sup>lt; *L.*
	- lim*t*→<sup>∞</sup> *e* [*t*] <sup>A</sup> <sup>−</sup> *<sup>e</sup>* [*t*] <sup>B</sup> <sup>≤</sup> <sup>0</sup>*;*
	- lim*t*→+<sup>∞</sup> *p* [*t*] <sup>A</sup> (*ei*) <sup>−</sup> *<sup>p</sup>* [*t*] <sup>B</sup> (*ei*) <sup>≤</sup> <sup>0</sup>*.*

The asymptotic outperformance is weaker than the outperformance.

#### **4. Comparison of Transition Probabilities of Two EAs**

In this section, we compare transition probabilities of (1 + 1)*EA* and (1 + 1)*EAC*. According to the connection between *ri*,*<sup>j</sup>* and *P*(*l*), a comparison of transition probabilities can be conducted by considering the probabilities of flipping "*l* preferred bits".

#### *4.1. Probabilities of Flipping Preferred Bits*

Denote probabilities of (1 + 1)*EA* and (1 + 1)*EAC* to flip "*l* preferred bits" by *P*1(*l*, *pm*) and *P*2(*l*, *CR*, *qm*), respectively. By (5), we know

$$P\_1(l, p\_m) = (p\_m)^l (1 - p\_m)^{n-l}.\tag{13}$$

Since the mutation and the binomial crossover in Algorithm 2 are mutually independent, we can get the probability by considering the crossover first. When flipping "*l* preferred bits" by (1 + 1)*EAC*, there are *l* + *k* (0 ≤ *k* ≤ *n* − *l*) bits of **y** set as *vi* by (7), the probability of which is

$$P\_{\mathbb{C}}(l+k,\mathbb{C}\_{\mathbb{R}}) = \frac{l+k}{n} (\mathbb{C}\_{\mathbb{R}})^{l+k-1} (1-\mathbb{C}\_{\mathbb{R}})^{n-l-k}.$$

If only "*l* preferred bits" are flipped, we know,

$$\begin{split} P\_2(l, \mathbb{C}\_R, q\_m) &= \sum\_{k=0}^{n-l} \mathbb{C}\_{n-l}^k P\_\mathbb{C}(l+k, \mathbb{C}\_R) (q\_m)^l (1 - q\_m)^k \\ &= \frac{1}{n} [l + (n-l)\mathbb{C}\_R - nq\_m \mathbb{C}\_R] (\mathbb{C}\_R)^{l-1} (q\_m)^l (1 - q\_m \mathbb{C}\_R)^{n-l-1} . \end{split} \tag{14}$$

Note that (1 + 1)*EAC* degrades to (1 + 1)*EA* when *CR* = 1, and (1 + 1)*EA* becomes the random search while *pm* = 1. Thus, we assume that *pm*, *CR*, and *qm* are located in (0, 1). A fair comparison of transition probabilities is investigated by considering the identical parameter setting

$$p\_m = \mathbb{C}\_{\mathbb{R}} q\_m = p\_\prime \quad 0 < p < 1. \tag{15}$$

Then, we know *qm* = *p*/*CR*, and Equation (14) implies

$$P\_2(l, \mathbb{C}\_R, p/\mathbb{C}\_R) = \frac{1}{n} \left[ (n - l) + \frac{l - np}{\mathbb{C}\_R} \right] p^l (1 - p)^{n - l - 1}. \tag{16}$$

Subtracting (13) from (16), we have

$$P\_2(l, \mathbb{C}\_{\mathcal{R}}, p/\mathcal{C}\_{\mathcal{R}}) - P\_1(l, p) = \left\{ \frac{1}{n} \left[ (n - l) + \frac{l - np}{\mathbb{C}\_{\mathcal{R}}} \right] - (1 - p) \right\} p^l (1 - p)^{n - l - 1}$$

$$= \left( \frac{1}{\mathbb{C}\_{\mathcal{R}}} - 1 \right) \left( \frac{l}{n} - p \right) p^l (1 - p)^{n - l - 1}. \tag{17}$$

From the fact that 0 < *CR* < 1, we conclude that *P*2(*l*, *CR*, *p*/*CR*) is greater than *P*1(*l*, *p*) if and only if *l* > *np*. That is, the introduction of the binomial crossover in (1 + 1)*EA* leads to the enhancement of the exploration ability of (1 + 1)*EAC*. We get the following theorem for the case that *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *n* .

**Theorem 1.** *While* <sup>0</sup> <sup>&</sup>lt; *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *<sup>n</sup> , it holds for all* 1 ≤ *l* ≤ *n that P*1(*l*, *p*) ≤ *P*2(*l*, *CR*, *p*/*CR*). **Proof.** The result can be obtained directly from Equation (17) by setting *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *n* .

For the popular setting where the mutation probability of (1+1)EA is set as 1/*n*, the introduction of binomial crossover does increase the ability to generate new candidate solutions. Then, we investigate how this improvement contributes to change of transition probabilities.

#### *4.2. Comparison of Transition Probabilities*

To validate that algorithm A is more efficient than algorithm B, it is assumed that the probability of A to transfer to promising statuses could be not smaller than that of B.

**Definition 3.** *Let* <sup>A</sup> *and* <sup>B</sup> *be two EAs with an identical initialization mechanism.* **A˜** = (*ai*,*j*) *and* **B˜** = (*bi*,*j*) *are the transition matrices of* <sup>A</sup> *and* <sup>B</sup>*, respectively. It is said that* **A˜** *dominates* **B˜** *, denoted by* **A˜ B˜** *, if it holds that*

*1. ai*,*<sup>j</sup>* ≥ *bi*,*j*, ∀ 0 ≤ *i* < *j* ≤ *L; 2. ai*,*<sup>j</sup>* > *bi*,*j*, ∃ 0 ≤ *i* < *j* ≤ *L.*

Denote the transition probabilities of (1 + 1)*EA* and (1 + 1)*EAC* by *pi*,*<sup>j</sup>* and *si*,*j*, respectively. For the OneMax problem and Deceptive problem, we get the relation of transition dominance on the premise that *pm* <sup>=</sup> *CRqm* <sup>=</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *n* .

**Theorem 2.** *For* (1 + 1)*EA and* (1 + 1)*EAC, denote their transition matrices by* **P˜** *and* **S˜***, respectively. On the condition that pm* <sup>=</sup> *CRqm* <sup>=</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *<sup>n</sup> , it holds for problem (1) that* **S˜ P˜***.*

**Proof.** Denote the collection of all solutions at level *k* by S(*k*), *k* = 0, 1, ... , *n*. We prove the result by considering the transition probability

$$r\_{i,j} = \Pr\{\mathbf{y} \in \mathcal{S}(i) \mid \mathbf{x} \in \mathcal{S}(j)\}, \quad (i < j).$$

Since the function values of solutions are merely related to the number of '1'-bits, the probability to generate a solution **y** ∈ S(*i*) by performing mutation on **x** ∈ S(*j*) depends on the Hamming distance *l* = *H*(**x**, **y**). Given **x** ∈ S*j*, S(*i*) is partitioned as <sup>S</sup>(*i*) = *<sup>L</sup> <sup>l</sup>*=<sup>1</sup> S*l*(*i*), where S*l*(*i*) = {**y** ∈ S(*i*) | *H*(**x**, **y**) = *l*}, and *L* is a positive integer that is smaller than or equal to *n*.

Accordingly, the probability to transfer from level *j* to *i* is confirmed as

$$r\_{i,j} = \sum\_{l=1}^{L} \Pr\{\mathbf{y} \in \mathcal{S}\_l(i) \mid \mathbf{x} \in \mathcal{S}(j)\} = \sum\_{l=1}^{L} |\ \mathcal{S}\_l(i) \mid P(l), l$$

where | S*l*(*i*) | is the size of S*l*(*i*), *P*(*l*) the probability to flip "*l* preferred bits". Then,

$$p\_{i,j} = \sum\_{l=1}^{L} \Pr\{\mathbf{y} \in \mathcal{S}\_l(j) \mid \mathbf{x}\} = \sum\_{l=1}^{L} \mid \mathcal{S}\_l(j) \mid P\_1(l, p)\_\* \tag{18}$$

$$s\_{i,j} = \sum\_{l=1}^{L} \Pr\{\mathbf{y} \in \mathcal{S}\_l(j) \mid \mathbf{x}\} = \sum\_{l=1}^{L} |\ \mathcal{S}\_l(j)| \; \mid \Pr\_2(l, \mathbb{C}\_{\mathcal{R}}, p/\mathbb{C}\_{\mathcal{R}}). \tag{19}$$

Since *p* ≤ 1/*n*, Theorem 1 implies that

$$P\_1(l, p) \le P\_2(l, \mathbb{C}\_{\mathbb{R}^\nu} p/\mathbb{C}\_{\mathbb{R}}), \quad \forall \ 1 \le l \le n.$$

Combining it with (18) and (19) we know

$$p\_{i,j} \le s\_{i,j}, \quad \forall \ 0 \le i < j \le n. \tag{20}$$

Then, we get the result by Definition 2.

**Example 2** (**Comparison of transition probabilities for the OneMax problem**)**.** *Let pm* = *CRqm* <sup>=</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *<sup>n</sup> . By (8), we have*

$$p\_{i,j} = \sum\_{k=0}^{M} \mathbb{C}\_{n-j}^{k} \mathbb{C}\_{j}^{k + (j - i)} P\_1(2k + j - i, p), \tag{21}$$

$$s\_{i,j} = \sum\_{k=0}^{M} \mathbb{C}\_{n-j}^{k} \mathbb{C}\_{j}^{k + (j-i)} P\_{2}(2k + j - i, \mathbb{C}\_{R}, p/\mathbb{C}\_{R}). \tag{22}$$

*where M* = min{*n* − *j*, *i*}*. Since p* ≤ 1/*n, Theorem 1 implies that*

$$P\_1(2k+j-i,p) \le P\_2(2k+j-i, \mathcal{C}\_{\mathbb{R}^\nu} p/\mathcal{C}\_{\mathbb{R}})\_{\nu}$$

*and by (21) and (22) we have pi*,*<sup>j</sup>* ≤ *si*,*j*, ∀ 0 ≤ *i* < *j* ≤ *n*.

**Example 3** (**Comparison of transition probabilities for the Deceptive problem**)**.** *Let pm* = *CRqm* <sup>=</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *<sup>n</sup> . Equation (10) implies that*

$$p\_{i,j} = \begin{cases} \sum\_{k=0}^{M} \mathbb{C}\_{n-j+1}^{k} \mathbb{C}\_{j-1}^{k+(j-i)} P\_1(2k+j-i, p), & i > 0, \\ P\_1(n-j+1, p), & i = 0, \end{cases} \tag{23}$$

$$s\_{i,j} = \begin{cases} \sum\_{k=0}^{M} \mathbb{C}\_{n-j+1}^{k} \mathbb{C}\_{j-1}^{k+(j-i)} P\_2(2k+j-i, \mathbb{C}\_{R'} \frac{p}{\mathbb{C}\_R}), & i > 0, \\ P\_2(n-j+1, \mathbb{C}\_{R'} p/\mathbb{C}\_R), & i = 0, \end{cases} \tag{24}$$

*where M* = min{*n* − *j* + 1, *i* − 1}*. Similar to the analysis of Example 2, we get the conclusion that pi*,*<sup>j</sup>* ≤ *si*,*j*, ∀ 0 ≤ *i* < *j* ≤ *n.*

The results demonstrate that when *p* ≤ 1/*n*, the introduction of binomial crossover leads to transition dominance of (1 + 1)*EAC* over (1 + 1)*EA*. In the following section, we would like to answer if transition dominance leads to outperformance of (1 + 1)*EAC* over (1 + 1)*EA*.

#### **5. Analysis of Asymptotic Performance**

In this section, we will prove that (1 + 1)*EAC* asymptotically outperforms (1 + 1)*EA* using the average convergence rate [25,32].

**Definition 4.** *The average convergence rate (ACR) of an EA for t generation is*

$$R\_{EA}(t) = 1 - \left(e^{\left[t\right]} / e^{\left[0\right]}\right)^{1/t}.\tag{25}$$

**Lemma 1** ([32], Theorem 1)**.** *Let* **R** *be the transition submatrix associated with a convergent EA. Under random initialization (i.e., the EA may start at any initial state with a positive probability), it holds*

$$\lim\_{t \to +\infty} R\_{EA}(t) = 1 - \rho(\mathbf{R})\_\prime \tag{26}$$

*where ρ*(**R**) *is the spectral radius of* **R***.*

Lemma 1 presents the asymptotic characteristics of the ACR, by which we get the result on the asymptotic performance of EAs.

**Proposition 1.** *If* **A˜ B˜** *, there exists T* <sup>&</sup>gt; <sup>0</sup> *such that 1. e*[*t*] <sup>A</sup> <sup>≤</sup> *<sup>e</sup>* [*t*] <sup>B</sup> *,* <sup>∀</sup> *<sup>t</sup>* <sup>&</sup>gt; *T;*

*2. p*[*t*] <sup>A</sup> (*ei*) <sup>≤</sup> *<sup>p</sup>* [*t*] <sup>B</sup> (*ei*)*,* <sup>∀</sup> *<sup>t</sup>* <sup>&</sup>gt; *<sup>T</sup>*, 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *L.*

**Proof.** By Lemma 1, we know ∀  > 0, there exists *T* > 0 such that

$$e^{[0]} (\rho(\mathbf{R}) - \epsilon)^t < e^{[t]} < e^{[0]} (\rho(\mathbf{R}) + \epsilon)^t, \quad t > T. \tag{27}$$

From the fact that the transition submatrix **R** of an RSH is upper triangular, we conclude

$$\rho(\mathbf{R}) = \max\{r\_{1,1}, \dots, r\_{L,L}\}. \tag{28}$$

Denote

$$\mathbf{A} = (a\_{i,j}) = \begin{pmatrix} 1 & \mathbf{a}\_0 \\ \mathbf{0} & \mathbf{A} \end{pmatrix}, \quad \mathbf{B} = (b\_{i,j}) = \begin{pmatrix} 1 & \mathbf{b}\_0 \\ \mathbf{0} & \mathbf{B} \end{pmatrix}.$$

While **A˜ B˜** , it holds

$$a\_{j,j} = 1 - \sum\_{i=0}^{j-1} a\_{i,j} < 1 - \sum\_{i=0}^{j-1} b\_{i,j} = b\_{j,j\prime} \ 1 \le j \le L.$$

Then, Equation (28) implies that

$$
\rho(\mathbf{A}) < \rho(\mathbf{B})\,.
$$

Applying it to (27) for  < <sup>1</sup> <sup>2</sup> (*ρ*(**B**) − *ρ*(**A**)), we have

$$e^{[t]}\_{\mathcal{A}} < e^{[0]}(\rho(\mathbf{A}) + \epsilon)^{t} < e^{[0]}(\rho(\mathbf{B}) - \epsilon)^{t} < e^{[t]}\_{\mathcal{B}},\tag{29}$$

which proves the first conclusion.

Noting that the tail probability *p*[*t*] (*ei*) can be taken as the expected approximation error of an optimization problem with an error vector

$$\mathbf{e} = (\underbrace{0, \dots, 0}\_{i} 1, \dots, 1)^{\prime}\_{\prime\prime}$$

by (29) we have

$$p\_{\mathcal{A}}^{[t]}(e\_i) \le p\_{\mathcal{B}}^{[t]}(e\_i), \quad \forall \ t > T, \ 1 \le i \le L.$$

The second conclusion is proven.

By Definition 2 and Proposition 1, we get the following theorem for comparing the asymptotic performance of (1 + 1)*EA* and (1 + 1)*EAC*.

**Theorem 3.** *If CR* <sup>=</sup> *CRqm* <sup>=</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *<sup>n</sup> , the* (1 + 1)*EAC asymptotically outperforms* (1 + 1)*EA on problem (1).*

**Proof.** The proof can be completed by applying Theorem 2 and Proposition 1.

On condition that *CR* <sup>=</sup> *CRqm* <sup>=</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *<sup>n</sup>* , Theorem 3 indicates that after sufficiently many number of iterations, (1 + 1)*EAC* can performs better on problem (1) than (1 + 1)*EA*. A further question is whether (1 + 1)*EAC* outperforms (1 + 1)*EA* for *t* < +∞. We answer the question in next sections.

#### **6. Comparison of the Two EAs on OneMax**

In this section, we show that the outperformance introduced by binomial crossover can be obtained for the uni-modal OneMax problem based on the following lemma [29].

**Lemma 2** ([29], Theorem 3)**.** *Let*

$$
\tilde{\mathbf{e}} = (e\_0, e\_1, \dots, e\_L)'\_{\prime} \quad \tilde{\mathbf{v}} = (v\_0, v\_{1\prime}, \dots, v\_L)'\_{\prime\prime}
$$

*where* <sup>0</sup> <sup>≤</sup> *ei*−<sup>1</sup> <sup>≤</sup> *ei*, *<sup>i</sup>* <sup>=</sup> 1, . . . , *L, vi* <sup>&</sup>gt; 0, *<sup>i</sup>* <sup>=</sup> 0, 1, . . . , *L. If transition matrices* **R˜** *and* **S˜** *satisfy*

$$s\_{j,j} \ge r\_{j,j}, \tag{30}$$

$$\forall \ 1 \le j \le L,\tag{30}$$

$$\sum\_{l=0}^{i-1} (r\_{l,j} - s\_{l,j}) \ge 0, \quad \qquad \qquad \forall \ 0 \le i < j \le L,\tag{31}$$

$$\sum\_{l=0}^{i} (s\_{l,j-1} - s\_{l,j}) \ge 0, \quad \qquad \qquad \forall \ 0 \le i < j - 1 < L,\tag{32}$$

*it holds*

$$\mathfrak{e}' \mathfrak{R}^t \mathfrak{v} \le \mathfrak{e}' \tilde{\mathbf{S}}^t \tilde{\mathbf{v}}\_{\cdot \cdot}$$

For the EAs investigated in this study, conditions (30)–(32) are satisfied thanks to the monotonicity of transition probabilities.

**Lemma 3.** *When p* ≤ 1/*n (n* ≥ 3*), P*1(*l*, *p*) *and P*2(*l*, *CR*, *p*/*CR*) *are monotonously decreasing in l.*

**Proof.** When *p* ≤ 1/*n*, Equations (13) and (14) imply that

$$\frac{P\_1(l+1,p)}{P\_1(l,p)} = \frac{p}{1-p} \le \frac{1}{n-1},\tag{33}$$

$$\frac{P\_2(l+1,\mathbb{C}\_R,p/\mathbb{C}\_R)}{P\_2(l,\mathbb{C}\_R,p/\mathbb{C}\_R)} = \frac{(l+1)(1-\mathbb{C}\_R) + n\mathbb{C}\_R(1-p/\mathbb{C}\_R)}{l(1-\mathbb{C}\_R) + n\mathbb{C}\_R(1-p/\mathbb{C}\_R)} \overset{p}{=} \frac{l+1}{l} \frac{p}{1-p} \le \frac{l+1}{l} \frac{1}{n-1} \tag{34}$$

all of which are not greater than 1 when *n* ≥ 3. Thus, *P*1(*l*, *p*) and *P*2(*l*, *CR*, *p*/*CR*) are monotonously decreasing in *l*.

#### **Lemma 4.** *For the OneMax problem, pi*,*<sup>j</sup> and si*,*<sup>j</sup> are decreasing in j.*

**Proof.** We validate the monotonicity of *pi*,*<sup>j</sup>* for (1 + 1)*EA*, and that of *si*,*<sup>j</sup>* can be confirmed in a similar way.

Let 0 ≤ *i* < *j* < *n*. By (21) we know

$$p\_{i,j+1} = \sum\_{k=0}^{M} \mathbb{C}\_{n-j-1}^{k} \mathbb{C}\_{j+1}^{i-k} P\_1(2k+j+1-i, p),\tag{35}$$

$$p\_{i,j} = \sum\_{k=0}^{M} \mathbb{C}\_{n-j}^{k} \mathbb{C}\_{j}^{i-k} P\_1(2k+j-i, p), \tag{36}$$

where *M* = min{*n* − *j* − 1, *i*}. Moreover, (33) implies that

$$\frac{C\_{j+1}^{i-k}P\_1(2k+j+1-i,p)}{C\_j^{i-k}P\_1(2k+j-i,p)} = \frac{j+1}{(j+1)-(i-k)}\frac{p}{1-p} \le \frac{j+1}{2}\frac{1}{n-1} < 1,$$

and we know

$$\mathbb{C}\_{j+1}^{i-k} P\_1(2k+j+1-i,p) < \mathbb{C}\_j^{i-k} P\_1(2k+j-i,p). \tag{37}$$

Note that

$$\min\{n-j-1,i\} \ge \min\{n-j,i\}, \quad \mathbb{C}^k\_{n-j-1} < \mathbb{C}^k\_{n-j}.\tag{38}$$

From (35)–(38) we conclude that

$$p\_{i,j+1} < p\_{i,j}, \quad 0 \le i < j < n.$$

Similarly, we can validate that

$$s\_{i,j+1} < s\_{i,j}, \quad 0 \le i < j < n.$$

In conclusion, *pi*,*<sup>j</sup>* and *si*,*<sup>j</sup>* are monotonously decreasing in *j*.

**Theorem 4.** *On condition that pm* <sup>=</sup> *CRqm* <sup>=</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *<sup>n</sup> , it holds for the OneMax problem that*

$$(1+1)EA\_{\mathbb{C}} \overset{\succ}{\sim} (1+1)EA\_{\mathbb{C}}$$

**Proof.** Given the initial distribution **q˜**[0] and transition matrix **R˜** , the level distribution at iteration *t* is confirmed by

$$
\tilde{\mathbf{q}}^{[t]} = \tilde{\mathbf{R}}^t \tilde{\mathbf{q}}^{[0]}.\tag{39}
$$

Denote

$$\mathfrak{S} = (\mathfrak{e}\_{0\prime}\mathfrak{e}\_{1\prime}\dots\mathfrak{e}\_{L})^{\prime}\_{\prime} \quad \mathfrak{S}\_{i} = (\underbrace{0\cdots \dots 0}\_{i}1, \dots, 1)^{\prime}\_{\cdot \cdot}.$$

By premultiplying (39) with **e˜** and **o˜***i*, respectively, we get

$$e^{|t|} = \tilde{\mathbf{e}}^{t} \mathbf{R}^{t} \tilde{\mathbf{q}}^{|0|},\tag{40}$$

$$p^{|t|}(e\_i) = \Pr\{e(\mathbf{x}\_l)\} \ge e\_i\} = \mathfrak{d}\_i^\prime \tilde{\mathbf{R}}^\dagger \tilde{\mathbf{q}}^{|\mathbf{0}|}.\tag{41}$$

Meanwhile, by Theorem 2 we have

$$q\_{j,j} \stackrel{<}{\leq} s\_{j,j} \stackrel{<}{\leq} p\_{j,j},\tag{42}$$

$$\sum\_{l=0}^{i-1} (q\_{l,j} - s\_{l,j}) \ge 0, \quad \sum\_{l=0}^{i-1} (s\_{l,j} - p\_{l,j}) \ge 0, \quad \forall \ i < j. \tag{43}$$

and Lemma 4 implies

$$\sum\_{l=0}^{i} (s\_{l,j-1} - s\_{l,j}) \ge 0, \quad \sum\_{l=0}^{i} (p\_{l,j-1} - p\_{l,j}) \ge 0 \tag{44} \qquad \forall \ i < j - 1. \tag{44}$$

Then, (42)–(44) validate satisfaction of conditions (30)–(32), and by Lemma 2 we know

$$\begin{aligned} \mathfrak{s}' \mathfrak{S}^t \mathfrak{S}^t \mathfrak{q}^{[0]} &\leq \mathfrak{e}' \mathfrak{P}^t \mathfrak{q}^{[0]}, & \forall t > 0;\\ \mathfrak{o}'\_i \tilde{\mathbf{S}}^t \tilde{\mathbf{q}}^{[0]} &\leq \mathfrak{o}'\_i \mathbf{P}^t \tilde{\mathbf{q}}^{[0]}, & \forall t > 0, \ 1 \leq i < n. \end{aligned}$$

Then, we get the conclusion by Definition 2.

The above theorem demonstrates that the dominance of transition matrices introduced by the binomial crossover operator leads to the outperformance of (1 + 1)*EAC* on the uni-modal problem OneMax.

#### **7. Comparison of the Two EAs on Deceptive**

In this section, we show that the outperformance of (1 + 1)*EAC* over (1 + 1)*EA* may not always hold on Deceptive. Then, we propose an adaptive strategy of parameter setting arising from the theoretical analysis, with which (1 + 1)*EAC* performs better in terms of tail probability.

#### *7.1. Numerical Demonstration for Inconsistency between the Transition Dominance and the Algorithm Outperformance*

For the Deceptive problem, we first present a counterexample to show even if the transition matrix of an EA dominates another EA, we cannot conclude that the former EA outperforms the latter.

**Example 4.** *We construct two artificial Markov chains as the models of two EAs. Let EA*<sup>R</sup> *and EA*<sup>S</sup> *be two EAs starting with an identical initial distribution*

$$\mathbf{p}^{[0]} = \left(\frac{1}{n}, \frac{1}{n}, \dots, \frac{1}{n}\right)^t.$$

*and the respective transition matrices are*

$$
\tilde{\mathbf{R}} = \begin{pmatrix}
1 & \frac{1}{n^3} & \frac{2}{n^3} & \cdots & \frac{n}{n^3} \\
& 1 - \frac{1}{n^3} & \frac{1}{n^2} \\
& & 1 - \frac{1}{n^2} - \frac{2}{n^3} & \ddots \\
& & & \ddots & \frac{n-1}{n^2} \\
& & & & 1 - \frac{1}{n}
\end{pmatrix}
$$

*and*

$$\mathbf{S} = \begin{pmatrix} 1 & \frac{2}{n^3} & \frac{4}{n^3} & \cdots & \frac{2n}{n^3} \\ & 1 - \frac{2}{n^3} & \frac{1}{n^2} + \frac{1}{2n} \\ & & 1 - \frac{n^2 + 2n + 8}{2n^3} & \ddots \\ & & & \ddots & \frac{n-1}{n^2} + \frac{n-1}{2n} \\ & & & & 1 - \frac{n^2 + n + 2}{2n^2} \end{pmatrix}.$$

*Obviously, it holds* **S˜ R˜** *. Through computer simulation, we get the curve of EAE difference of the two EAs in Figure 1a and the curve of TPs difference between the two EAs in Figure 1b. From Figure 1b, it is clear that EA*<sup>R</sup> *does not always outperform EA*<sup>S</sup> *because the difference of TPs is negative at the early stage of the iteration process but later positive.*

**Figure 1.** Simulation results on the difference of EAEs and TPs for the counterexample. (**a**) Difference of expected approximation errors (EAEs). (**b**) Difference of tail probabilities (TPs).

Now we turn to discuss (1 + 1)*EA* and (1 + 1)*EAC* on Deceptive. We demonstrate (1 + 1)*EAC* may not outperform (1 + 1)*EA* over all generations although the transition matrix of (1 + 1)*EAC* dominates that of (1 + 1)*EA*.

**Example 5.** *In* (1 + 1)*EA and* (1 + 1)*EAC, set pm* = *CRqm* = 1/*n. For* (1 + 1)*EAC, let qm* = <sup>1</sup> <sup>2</sup> *, CR* <sup>=</sup> <sup>2</sup> *<sup>n</sup> . The numerical simulation results of EAEs and TPs for 5000 independent runs are depicted in Figure 2. It is shown that when n* ≥ 9*, both EAEs and TPs of* (1 + 1)*EA could be smaller than those of* (1 + 1)*EAC. This indicates that the dominance of the transition matrix does not always guarantee the outperformance of the corresponding algorithm.*

**Figure 2.** Numerical comparison for (1 + 1)*EA* and (1 + 1)*EAC* applied to the Deceptive problem, where *n* refers to the problem dimension. (**a**) Numerical comparison of expected approximation errors (EAEs). (**b**) Numerical comparison of tail probabilities (TPs).

With *pm* = *CRqm* = *p* ≤ 1/*n*, although the binomial crossover leads to transition dominance of (1 + 1)*EAC* over (1 + 1)*EA*, the enhancement of exploitation plays a governing role in the iteration process. Thus, the imbalance of exploration and exploitation leads to poor performance of (1 + 1)*EAC* at some stage of the iteration process. As shown in the previous two examples, the outperformance of (1 + 1)*EAC* cannot be drawn from the dominance of transition matrices.

The fitness landscape of Deceptive confirms that global convergence of EAs on Deceptive is principally attributed to the direct transition from level *j* to level 0, quantified

by the transition probability *r*0,*j*. By investigating the impact of binomial crossover on the transition probability *r*0,*j*, we arrive at an adaptive strategy for the regulation of the mutation rate and the crossover rate, by which performance of both (1 + 1)*EA* and (1 + 1)*EAC* are enhanced.

#### *7.2. Comparisons on the Probabilities to Transfer from Non-Optimal Statuses to the Optimal Status*

A comparison between *p*0,*<sup>j</sup>* and *s*0,*<sup>j</sup>* is performed by investigating their monotonicity. Substituting (13) and (14) into (23) and (24), respectively, we have

$$p\_{0,j} = P\_1(n-j+1, p\_m) = (p\_m)^{n-j+1}(1-p\_m)^{j-1},\tag{45}$$

$$\begin{split} s\_{0,j} &= P\_3(n-j+1, \mathbb{C}\_{\mathbb{R}}, q\_m) \\ &= \frac{1}{n} [(j-1)(1-\mathbb{C}\_{\mathbb{R}}) + n\mathbb{C}\_{\mathbb{R}}(1-q\_m)] \mathbb{C}\_{\mathbb{R}}^{n-j}(q\_m)^{n-j+1} (1-q\_m\mathbb{C}\_{\mathbb{R}})^{j-2} . \end{split} \tag{46}$$

We first investigate the maximum values of *p*0,*<sup>j</sup>* to get the ideal performance of (1 + 1)*EA* on the Deceptive problem.

**Theorem 5.** *While*

$$p\_m^\* = \frac{n-j+1}{n},\tag{47}$$

*p*0,*<sup>j</sup> gets its maximum values pmax* 0,*<sup>j</sup>* = *n*−*j*+1 *n <sup>n</sup>*−*j*+1 *<sup>j</sup>*−<sup>1</sup> *n <sup>j</sup>*−<sup>1</sup> *.*

**Proof.** By (45), we know

$$\frac{\partial}{\partial p\_m} p\_{0,j} = (n - j + 1 - np\_m) p\_m^{n-j} (1 - p\_m)^{j-2}.$$

While *pm* = *<sup>n</sup>*−*j*+<sup>1</sup> *<sup>n</sup>* , *p*0,*<sup>j</sup>* gets its maximum value

$$p\_{0,j}^{\max} = P\_1(n-j+1, \frac{n-j+1}{n}) = \left(\frac{n-j+1}{n}\right)^{n-j+1} \left(\frac{j-1}{n}\right)^{j-1}.$$

$$\mathbf{0}$$

Influence of the binomial crossover on *s*0,*<sup>j</sup>* is investigated on condition that *pm* = *qm*. By regulating *CR*, we compare *p*0,*<sup>j</sup>* with the maximum value *smax* 0,*<sup>j</sup>* of *s*0,*j*.

**Theorem 6.** *On condition that pm* = *qm, the following results hold.*


**Proof.** Note that (1 + 1)*EAC* degrades to (1 + 1)*EA* when *CR* = 1. Then, if the maximum value *smax* 0,*<sup>j</sup>* of *<sup>s</sup>*0,*<sup>j</sup>* is obtained by setting *CR* = 1, we have *<sup>s</sup>max* 0,*<sup>j</sup>* = *p*0,*j*; otherwise, it holds *smax* 0,*<sup>j</sup>* > *p*0,*j*.

1. For the case that *j* **= 1**, Equation (46) implies

$$s\_{0,1} = q\_m^n (\mathbb{C}\_R)^{n-1}.$$

Obviously, *s*0,1 is monotonously increasing in *CR*. It gets the maximum value while *C <sup>R</sup>* = 1. Then, by (45) we get *<sup>s</sup>max* 0,1 = *p*0,1.

2. While *j* **= 2**, by (46) we have

$$\frac{\partial s\_{0,2}}{\partial \mathbb{C}\_R} = \frac{n-1}{n} q\_m^{n-1} (\mathbb{C}\_R)^{n-3} (n - 2 + (1 - nq\_m)\mathbb{C}\_R).$$


$$C\_R^\star = \frac{n-2}{nq\_m - 1}.\tag{48}$$

Then, we have *smax* 0,2 > *p*0,2.

3. For the case that **<sup>3</sup>** *<sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup> <sup>−</sup>* **<sup>1</sup>**, we denote

$$s\_{0,j} = \frac{n-j+1}{n} q\_m^{n-j+1} I\_1 + \frac{(j-1)(1-q\_m)}{n} q\_m^{n-j+1} I\_{2,j}$$

where

$$\begin{aligned} I\_1 &= (\mathcal{C}\_R)^{n-j} (1 - q\_m \mathcal{C}\_R)^{j-1}, \\ I\_2 &= (\mathcal{C}\_R)^{n-j+1} (1 - q\_m \mathcal{C}\_R)^{j-2}. \end{aligned}$$

Then,

$$\begin{aligned} \frac{\partial I\_1}{\partial \mathbb{C}\_R} &= (\mathbb{C}\_R)^{n-j-1} (1 - q\_m \mathbb{C}\_R)^{j-2} (n - j - (n - 1) q\_m \mathbb{C}\_R), \\\frac{\partial I\_2}{\partial \mathbb{C}\_R} &= (\mathbb{C}\_R)^{n-j} \left( 1 - \frac{\mathbb{C}\_R}{n} \right)^{j-3} (n - j + 1 - (n - 1) q\_m \mathbb{C}\_R). \end{aligned}$$


$$\mathbb{C}\_{\mathbb{R}}^{\star} \in \left( \frac{n-j}{(n-1)q\_{m}}, \frac{n-j+1}{(n-1)q\_{m}} \right). \tag{49}$$

Accordingly, we know *smax* 0,*<sup>j</sup>* > *p*0,*j*.

• If *<sup>n</sup>*−*<sup>j</sup> <sup>n</sup>*−<sup>1</sup> <sup>&</sup>lt; *qm* <sup>&</sup>lt; *<sup>n</sup>*−*j*+<sup>1</sup> *<sup>n</sup>*−<sup>1</sup> , *<sup>I</sup>*<sup>1</sup> gets its maximum value when *CR* <sup>=</sup> *<sup>n</sup>*−*<sup>j</sup>* (*n*−1)*qm* , and *<sup>I</sup>*<sup>2</sup> is monotonously increasing in *CR*. Then, *s*0,*<sup>j</sup>* get its maximum value *smax* 0,*<sup>j</sup>* at some

$$C\_R^\star \in \left(\frac{n-j}{(n-1)q\_m}, 1\right],\tag{50}$$

and we know *smax* 0,*<sup>j</sup>* > *p*0,*j*.

4. While *j* **=** *n*, Equation (46) implies that

$$\frac{\partial \mathfrak{s}\_{0,n}}{\partial \mathbb{C}\_R} = (n-1)(1 - q\_m \mathbb{C}\_R)^{n-3}(1 - 2q\_m - (n - 1 - nq\_m)q\_m \mathbb{C}\_R).$$

Denoting

$$g(q\_{m\prime}C\_{\mathbb{R}}) = 1 - 2q\_m - (n - 1 - nq\_m)q\_mC\_{\mathbb{R}\prime}$$

we can confirm the sign of *∂s*0,*n*/*∂CR* by considering

$$\frac{\partial}{\partial \mathcal{C}\_R} \lg(q\_{m\prime} \mathcal{C}\_R) = -(n - 1 - nq\_m)q\_m.$$

• While 0 <sup>&</sup>lt; *qm* <sup>≤</sup> *<sup>n</sup>*−<sup>1</sup> *<sup>n</sup>* , *g*(*qm*, *CR*) is monotonously decreasing in *CR*, and its minimum value is

$$\lg(q\_{m'}1) = (nq\_m - 1)(q\_m - 1).$$

The maximum value of *g*(*qm*, *CR*) is

$$\lg(q\_{m\prime}0) = 1 - 2q\_m\ldots$$

(a) If 0 <sup>&</sup>lt; *qm* <sup>≤</sup> <sup>1</sup> *<sup>n</sup>* , we have

$$\lg(q\_{m\prime}C\_R) \ge \lg(q\_{m\prime}1) > 0.$$

Thus, *<sup>∂</sup>s*0,*<sup>n</sup> <sup>∂</sup>CR* ≥ 0, and *s*0,*<sup>n</sup>* is increasing in *CR*. For this case, *s*0,*<sup>n</sup>* get its maximum value when *C <sup>R</sup>* = 1, and we have *<sup>s</sup>max* 0,*<sup>n</sup>* = *p*0,*n*.

(b) If <sup>1</sup> *<sup>n</sup>* <sup>&</sup>lt; *qm* <sup>≤</sup> <sup>1</sup> <sup>2</sup> , *<sup>s</sup>*0,*<sup>n</sup>* gets the maximum value *<sup>s</sup>max* 0,*<sup>n</sup>* when

$$\mathcal{C}\_{\mathbb{R}}^{\star} = \frac{1 - 2q\_m}{q\_m(n - 1 - nq\_m)}.$$

Thus, *smax* 0,*<sup>n</sup>* > *p*0,*n*.


$$\lg(q\_{m'}1) = (nq\_m - 1)(q\_m - 1) < 0.$$

Then, *s*0,*<sup>n</sup>* is monotonously decreasing in *CR*, and its maximum value is obtained by setting *C <sup>R</sup>* = 0. Accordingly, we know *<sup>s</sup>max* 0,*<sup>n</sup>* > *p*0,*n*.

In summary, *smax* 0,*<sup>n</sup>* > *<sup>p</sup>*0,*<sup>n</sup>* while *qm* > <sup>1</sup> *<sup>n</sup>* ; otherwise, *<sup>s</sup>max* 0,*<sup>n</sup>* = *p*0,*n*.

Theorems 5 and 6 present the "best" settings to maximize the transition probabilities from non-optimal statuses to the optimal level, by which we get a parameter adaptive strategy that greatly enhances the exploration of compared EAs.

#### *7.3. Parameter Adaptive Strategy to Enhance Exploration of EAs*

Since the level index *j* is equal to the Hamming distance between **x** and **x**∗, improvement of level index *j* is bounded by reduction of the Hamming distance obtained by replacing **x** with **y**. Then, while the local exploitation leads to a transition from level *j* to a non-optimal level *i*, the practically adaptive strategy of parameters can be obtained according to the Hamming distance between **x** and **y**.

When (1 + 1)*EA* is located at the solution **x** at status *j*, Equation (47) implies that the "best" setting of mutation rate is *p <sup>m</sup>*(*j*) = *<sup>n</sup>*−*j*+<sup>1</sup> *<sup>n</sup>* . Once it transfers to solution **y** at status *i*(*i* < *j*), the "best" setting changes to *p <sup>m</sup>*(*i*) = *<sup>n</sup>*−*i*+<sup>1</sup> *<sup>n</sup>* . Then, the difference of "best" settings is *<sup>j</sup>*−*<sup>i</sup> <sup>n</sup>* , bounded from above by *<sup>H</sup>*(**x**,**y**) *<sup>n</sup>* . Accordingly, the mutation rate of (1 + 1)*EA* can be updated to

$$p'\_m = p\_m + \frac{H(\mathbf{x}, \mathbf{y})}{n}.\tag{51}$$

For (1 + 1)*EAC*, the parameter *qm* is adapted using the strategy consistent to that of *pm* to focus on influence of *CR*. That is,

$$q\_m' = q\_m + \frac{H(\mathbf{x}, \mathbf{y})}{n}.\tag{52}$$

Since *s*0,*<sup>j</sup>* demonstrates different monotonicity for varied levels, one cannot get an identical strategy for the adaptive setting of *CR*. As a compromise, we would like to consider the case that 3 ≤ *j* ≤ *n* − 1, which is obtained by random initialization with overwhelming probability.

According to the proof of Theorem 6, we know *CR* should be set as great as possible for the case *qm* <sup>∈</sup> (0, *<sup>n</sup>*−*<sup>j</sup> <sup>n</sup>*−<sup>1</sup> ]; while *qm* <sup>∈</sup> ( *<sup>n</sup>*−*<sup>j</sup> <sup>n</sup>*−<sup>1</sup> , 1], *<sup>C</sup> <sup>R</sup>* is located in intervals whose boundary values are *<sup>n</sup>*−*<sup>j</sup>* (*n*−1)*qm* and *<sup>n</sup>*−*j*+<sup>1</sup> (*n*−1)*qm* , given by (49) and (50), respectively. Then, while *qm* is updated by (52), the update strategy of *CR* can be confirmed to satisfy that

$$\mathbf{C}'\_{R}q'\_{m} = \mathbf{C}\_{R}q\_{m} + \frac{H(\mathbf{x}, \mathbf{y})}{n-1}.$$

Accordingly, the adaptive setting of *CR* could be

$$\mathbf{C}'\_{R} = \left(\mathbf{C}\_{R}q\_{m} + \frac{H(\mathbf{x}, \mathbf{y})}{n-1}\right) / q'\_{m'} \tag{53}$$

where *q <sup>m</sup>* is updated by (52).

By incorporating the adaptive strategy (51) to (1 + 1)*EA*, we compare the performance of its adaptive variant with the adaptive (1 + 1)*EAC* that regulates its mutation rate and crossover rate by (52) and (53), respectively. For 13–20 dimensional Deceptive problems, numerical simulation of the tail probability is implemented by 10,000 independent runs. The initial value of *pm* is set as <sup>1</sup> *<sup>n</sup>* . To investigate the sensitivity of the adaptive strategy on initial values of *qm*, the mutation rate *qm* in (1 + 1)*EAC* is initialized with values <sup>√</sup><sup>1</sup> *n* , 3 2 <sup>√</sup>*<sup>n</sup>* and <sup>√</sup><sup>2</sup> *<sup>n</sup>* , and the corresponding variants are denoted by (<sup>1</sup> <sup>+</sup> <sup>1</sup>)*EA*<sup>1</sup> *<sup>C</sup>*, (<sup>1</sup> + <sup>1</sup>)*EA*<sup>2</sup> *<sup>C</sup>* and (1 + 1)*EA*<sup>3</sup> *<sup>C</sup>*, respectively.

The converging curves of averaged TPs are illustrated in Figure 3. Compared to the EAs with fixed parameters during the evolution process, the performance of the adaptive EAs on Deceptive has been significantly improved. Furthermore, we also note that the converging curves of adaptive (1 + 1)*EAC* are not sensitive to the initial mutation rate. Although transition dominance does not necessarily lead to outperformance of (1 + 1)*EAC* over (1 + 1)*EA*, the proposed adaptive strategy can greatly enhance global exploration of (1 + 1)*EAC* to a large extent, and consequently, we get the improved adaptive (1 + 1)*EAC* that is not sensitive to initial mutation rates.

**Figure 3.** *Cont.*

**Figure 3.** Numerical comparison on tail probabilities (TPs) of adaptive (1 + 1)*EA* and (1 + 1)*EAC* applied to the Deceptive problem, where *n* is the problem dimension. (1 + 1)*EA*<sup>1</sup> *<sup>C</sup>*, (<sup>1</sup> <sup>+</sup> <sup>1</sup>)*EA*<sup>2</sup> *<sup>C</sup>*, and (1 + 1)*EA*<sup>3</sup> *<sup>C</sup>* are three variants of (<sup>1</sup> <sup>+</sup> <sup>1</sup>)*EAC* with *qm* initialized as <sup>√</sup><sup>1</sup> *<sup>n</sup>* , <sup>3</sup> 2 <sup>√</sup>*<sup>n</sup>* , and <sup>√</sup><sup>2</sup> *<sup>n</sup>* , respectively.

#### **8. Conclusions and Discussions**

Under the framework of fixed-budget analysis, we conduct a pioneering analysis of the influence of binomial crossover on the approximation error of EAs. The performance of EAs after running finite generations is measured by two metrics: the expected value of the approximation error and the error tail probability, by which we make a case study by comparing the performance of (1 + 1)*EA* and (1 + 1)*EAC* with binomial crossover.

Starting from the comparison of the probability of flipping "*l preferred bits*", it is proven that under proper conditions, incorporation of binomial crossover leads to the dominance of transition probabilities, that is, the probability of transferring to any promising status is improved. Accordingly, the asymptotic performance of (1 + 1)*EAC* is superior to that of (1 + 1)*EA*.

It is found that the dominance of transition probability guarantees that (1 + 1)*EAC* outperforms (1 + 1)*EA* on OneMax in terms of both expected approximation error and tail probability. However, this dominance does lead to the outperformance on Deceptive. This means that using binomial crossover may improve the performance on some problems but not on other problems.

For Deceptive, an adaptive strategy of parameter setting is proposed based on the monotonicity analysis of transition probabilities. Numerical simulations demonstrate that it can significantly improve the exploration ability of both (1 + 1)*EAC* and (1 + 1)*EA*, and superiority of binomial crossover is further strengthened by the adaptive strategy. Thus, a problem-specific adaptive strategy is helpful for improving the performance of EAs.

Our future work will focus on a further study for the adaptive setting of crossover rate in population-based EAs on more complex problems, as well as the development of adaptive EAs improved by the introduction of binomial crossover.

**Author Contributions:** Conceptualization, J.H. and X.Z.; formal analysis, C.W.; writing—original draft preparation, C.W.; writing—review and editing, Y.C. and J.H.; funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Fundamental Research Funds for the Central Universities grant number WUT:2020IB006.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

