*Generalization*

An algorithm that effectively optimizes a certain function should optimize as effectively functions characterized by the same computational properties. An interesting study on this issue is the investigation of "algorithm footprints" [3].

Some configurations of EAs, among which "standard" settings are usually comprised, can reach similar results on many problems, while others may exhibit performance characterized by a larger variability. While it is obviously important to find a good parameter set for a specific EA dealing with a specific problem, it is even more important to understand how much changing it can affect the performance of the EA.

## *Constraints and Quality Indices*

Comparing algorithms (or different instances of the same algorithm) requires a precise definition of the conditions under which the comparison is made. As will be shown later in the plots *Q10K* and *Q100K* in Figure 7 (top left), convergence to a good solution can occur with very different modalities. Some parameter settings may lead to fast convergence to a sub-optimal solution, while others may need many more fitness evaluations to converge, but lead to better solutions. In several real-world applications it is often sufficient to reach a point which is "close enough" to the global optimum; in such cases, an EA that is consistently able to reach good sub-optimal results timely is to be preferred to slower, although more precise, algorithms. Instead, in problems with weaker time constraints, an EA that keeps refining the solution over time, even very slowly, is usually preferable.

The previous considerations indicate that comparing different algorithms is very difficult because, for the comparison to be fair, each algorithm should be used "at its best" for the given problem. In fact, there are many examples in the literature where the effort spent by the authors on tuning and optimizing the method they propose is much larger than the effort spent on tuning the ones to which it is compared. This may easily lead to biased interpretations of the results and to wrong conclusions.

The importance of methods (usually termed Meta-EAs) that tune EAs' parameters to optimize their performance has been highlighted since 1978 [4]. However, mainly due to the relevant computational effort they require, Meta-EAs and other parameter tuning techniques have become a mainstream research topic only recently.

We are aware that using as Meta-EA an algorithm whose behavior, as well, depends on its setup, would imply that the Meta-EA itself should undergo parameter tuning. There are obvious practical reasons related to the method's computational burden for not doing so. As well, it can be argued that if the application of a Meta-EA can effectively lead to solutions that are closer to the global optimum for the problem at hand than those found by a standard setting of the algorithm that is being tuned, then, even supposing one uses several optimization meta-levels, the improvement margins for each higher-level Meta-EA become smaller and smaller with the level. This intuitively implies that the variability of the results depending on the higher-level Meta-EAs parameter settings also becomes smaller and smaller with the level. Therefore, even if, most probably, better settings of the Meta-EA could further improve the optimization performance, we consider that a "standard" setting of the Meta-EA is generally enough to achieve some relevant performance improvement with respect to a random setting.

In [5], we proposed SEPaT (Simple Evolutionary Parameter Tuning), a single-objective Meta-EA in which GPU-based versions of Differential Evolution (DE, [6]) and Particle Swarm Optimization (PSO, [7]) were used to tune PSO on some benchmark functions, obtaining parameter sets that yielded results comparable with the state of the art and better than "standard" or manual settings.

Even if results were good, the approach was mainly practical, aimed at providing one set of good parameters, but no hints about their generality or about the reasons why they had been selected. One of the main limitations of the approach was related to its performing a single-objective optimization, which prevented it from considering other critical goals, such as generalization, besides the obvious one to optimize an EA's performance on a given problem.

In this paper, we go far beyond such results, investigating what additional hints a multi-objective approach can provide. To do so, we use a very general framework, which we called EMOPaT (Evolutionary Multi-Objective Parameter Tuning), that was described in [8]. EMOPaT uses the well-known Multi-Objective Evolutionary Algorithm (MOEA) Non-dominated Sorting Genetic Algorithm (NSGA-II, [9]) to automatically find good parameter sets for EAs.

The goal of this paper is not proposing EMOPaT as a reference environment. Instead, we use it, as virtually the simplest possible multi-objective derivation of SEPaT, to focus on some of the many additional hints that a multi-objective approach to EA tuning can provide with respect to a single-objective one. We are well conscious that more sophisticated and possibly better performing environments aimed at the same goal can be designed. SEPaT and EMOPaT have been developed with no intent to advance the state of the art of meta-optimization algorithms but as generic frameworks, with as few specific features as possible, aimed at studying EA meta-optimization. Consistently with this principle, within EMOPaT, we use NSGA-II as the multi-objective algorithm tuner, since it is possibly the most widely available, generally well-performing and easy to implement multi-objective stochastic optimization algorithm. Indeed, NSGA-II can be considered a natural extension of a single-objective genetic algorithm (GA) to multi-objective optimization. As well, we chose to test EMOPaT in tuning PSO and DE for no other reasons than the easy availability and good computational efficiency of these algorithms. EMOPaT is a general environment and can be used to tune virtually any other EA or metaheuristic.

EMOPaT is not only aimed at finding parameter sets that achieve good results considering the nature of the problems, the quality indices and, more in general, the conditions under which the EA is tuned. It allows one to extract information about the parameters' semantics and the way they affect the algorithm by analyzing the Pareto fronts approximated by the solutions obtained by NSGA-II. A similar strategy has been presented by [10] under the name of *innovization*(innovation through optimization).

As well, we show that EMOPaT can evolve parameter sets that let an algorithm perform well not only on the problem(s) on which it has been tuned, but also on others. Section 2 briefly introduces the three EAs used in our experiments, Section 3 reviews the methods that inspired our work, and Section 4 describes EMOPaT. In Section 5 we first use EMOPaT to find good parameter sets for optimizing the same function under different conditions: doing so, we show that the analysis of EMOPaT's results can clarify the role of EAs' parameters and study EMOPaT's generalization abilities; finally, EMOPaT is used to optimize seven benchmark functions and generalize its results to previously unseen functions. Section 6 summarizes all results and suggests possible future extensions of this work.

Additionally, in a separate appendix, we demonstrate that EMOPaT can be considered an extension of SEPaT and has equivalent performance in solving single-objective problems, as well as assessing its correct behavior by considering some controlled situations, on which we show it to be able to perform tuning as expected.

## **2. Background**

## *2.1. Differential Evolution*

In every generation of DE, each individual in the population acts as a parent vector for which a donor vector *D*- *i* is created. A donor vector is generated by combining three random and distinct individuals *Xr*1, *Xr*2 and *Xr*3 according to this simple mutation equation:

$$
\vec{D\_i} = \vec{X\_{r1}} + F \cdot (\vec{X\_{r2}} - \vec{X\_{r3}}) \tag{1}
$$

where *F* (scale factor) is usually in the interval [0.4, 1]. Several different mutation strategies have been applied to DE; in our work, along with the *random* mutation reported above, we consider *best* and *target-to-best* (or *TTB*) mutation strategies, whose definitions are, respectively:

$$
\vec{D}\_i = X\_{best}^\bullet + F \cdot (\vec{X\_{r1}} - \vec{X\_{r2}}) \tag{2}
$$

$$
\vec{D\_i} = \vec{X\_i} + F \cdot (\vec{X\_{best}} - \vec{X\_i}) + F \cdot (\vec{X\_{r1}} - \vec{X\_{r2}}) \tag{3}
$$

After mutation, every parent-donor pair generates a child ( *Ti*), called trial vector, by means of a crossover operation. Two kinds of crossover are usually employed in DE: *binomial* and *exponential* (see [11] for more details). Both crossover strategies depend on the crossover rate CR. The newly generated individual - *Ti* is evaluated by comparing its fitness to its parent's. The better individual survives and will be part of the next generation.

## *2.2. Particle Swarm Optimization*

In PSO ([7]), a set of particles moves within the search space, according to these equations, that describe particle *i*'s velocity and position:

$$\vec{w\_i(t)} = w \cdot \vec{v\_i(t-1)} + c\_1 \cdot rand() \cdot (B\vec{P\_i} - P\_i(\vec{t}-1)) + c\_2 \cdot rand() \cdot (B\vec{G\_i} - P\_i(\vec{t-1})) \tag{4}$$

$$P\_i(\vec{t}) = P\_i(\vec{t-1}) + \upsilon\_i(\vec{t})\tag{5}$$

where *c*1, *c*2, and *w* (inertia factor) are real-valued constants, *rand*() returns random values uniformly distributed in [0, 1], *BP*- *i* is the best-fitness position visited so far by the particle, and *BGP*- *i* the best-fitness position visited so far by any individual in the particle's neighborhood, that can comprise the entire swarm or only a subset. In this work, we consider three of the most commonly used neighborhood topologies (see Figure 1).

**Figure 1.** The three PSO topologies used in this work: global, ring, and star.

## *2.3. NSGA-II*

The NSGA-II algorithm is basically a classical GA in which selection is based on the so-called non-dominated sorting. In case two individuals have the same rank, the one with the greater crowding distance is selected. This distance can take into consideration the fitness values or the encoding of the individuals, to increase the diversity of the results or of the population, respectively. In this work, NSGA-II crossover and mutation rates have been set as suggested in [9], while we have set the population size and the number of generations "manually", based on the complexity of the problem at hand.

## **3. Related Work**

The importance of parameter tuning has been frequently addressed in the last years, not only in theoretical or review papers such as [12] but also in papers with extensive experimental evidence which provide a critical assessment of such methods. In [13], while recognizing the importance of finding a good set of parameters, the authors even sugges<sup>t</sup> that using approaches to algorithm tuning that are computationally demanding may be almost useless, since a relatively limited random search in the algorithm parameter space can often offer good results.

Meta-optimization algorithms can be grouped into two main classes:


Along with Meta-EAs, several methods which do not strictly belong to that class but use similar paradigms have been proposed: one of the most successful is Relevance Estimation and Value Calibration (REVAC) by [15], a method inspired by the Estimation of Distribution Algorithm (EDA, [16]) that was able ([17]) to find parameter sets that improved the performance of the winner of the competition on the CEC 2005 test-suite [18]. In [19], PSO tuned itself to optimize neural network training; Reference [20] used a simple metaheuristic, called Local Unimodal Sampling, to tune DE and PSO, obtaining good performance while discovering unexpectedly good parameter settings. Reference [21] proposed ParamILS, whose local search starts from a default parameter configuration which is then iteratively improved by modifying one parameter at a time. Reference [22] used a Meta-EA as an optimization method in a massively parallel system to generate on-the-fly optimizers that directly solved the problem under consideration. In [23], the authors propose a self-adaptive DE for feature selection.

Other approaches to parameter tuning include model-based methods like Sequential Parameter Optimization (SPO) proposed by [24] and racing algorithms [25,26]: they generate a population of possible configurations based on a particular distribution; members of this population are then tested and possibly discarded as soon as a statistical test shows that there is at least another individual which outclasses them; these operations are repeated until a set of good configurations is obtained. A recent trend approaches parameter tuning as a two-level optimization problem [27,28].

The first multi-objective Meta-EA was proposed in [29] where NSGA-II was used to optimize speed and precision of four different algorithms. However, that work took into consideration only one parameter at a time, so the approach described therein cannot be considered a full parameter set optimization algorithm. A similar method has been proposed by [30]. The authors describe a variation of a MOEA called Multi-Function Evolutionary Tuning Algorithm (M-FETA), in which the performance of a GA on two different functions represent the different goals that the MOEA must optimize; the final goal is to discriminate algorithms that perform well on a single function from those that do on more than one, respectively called "specialists" and "generalists", following the terminology introduced by [31].

In [32], the authors propose an interesting technique, aimed at identifying the best parameter settings for different possible computational budgets (i.e., number of fitness evaluations) up to a maximum. This is obtained using a MOEA in which the fitness of an individual is a vector whose components are the fitness values obtained in every generation. In this way, it is possible to find a family of parameter sets which obtain the best results with different computational budgets.

A comprehensive review of Meta-EAs can be found in [33].

More recently, MO-ParamILS has been proposed as a multi-objective extension of the state-of-the-art single-objective algorithm configuration framework ParamILS [34]. This automatic algorithm produces good results on several challenging bi-objective algorithm configuration scenarios. In [35], MO-ParamILS is used to automatically configure a multi-objective optimization algorithm in a multi-objective fashion.

#### **4. EMOPaT, a General Framework for Multi-Objective Meta-Optimization**

This section describes EMOPaT's main structure and operation, introduced in [5] as a straightforward multi-objective extension of the corresponding single-objective general framework SEPaT.

SEPaT and EMOPaT share the same very general scheme, presented in Figure 2.

**Figure 2.** Scheme of SEPaT/EMOPaT. The lower part represents a classical EA. In the meta-optimization process, each individual of Tuner-EA represents a set of Parameters. For each set, the corresponding instance of the lower-level EA (LL-EA) is run *N* times to optimize the objective function(s). Quality indices (one for SEPaT, more than one for EMOPaT) are values that provide a global evaluation of the results obtained by LL-EA in these runs.

The block in the lower part of the image represents a traditional optimization problem in which an EA, referred to as Lower-Level EA (LL-EA) optimizes one or more objective functions. The Tuner EA operates within the search space of the parameters of the LL-EA. This means that the tuner evolves a population of possible parameter sets of LL-EA parameters. Each parameter set corresponds to an instance of LL-EA that is tested *N* times on LL-EA's objective function(s) (from now on, we will consider "configuration" and "parameter set" as equivalent terms). The *N* results are synthesized into one or more "Quality Indices" that represent the objective function(s) of the tuner.

The difference between SEPaT and EMOPaT therefore stands in the different number of quality indices. In SEPaT, any single-objective EA can be used as Tuner EA, while EMOPaT requires a multi-objective EA. In the case described in this paper, we used NSGA-II.

It should be noticed that as evidenced in the figure, the tuning of the (usually, but not necessarily, single-objective) LL-EA may be aimed at finding the best "generalist" setting for optimizing any number of functions. For instance, in [5] PSO and DE were used as tuners in SEPaT to optimize the behavior of PSO over 8 objective functions. In that case, an EA configuration was considered better than another if it obtained better results over the majority of the functions. The quality index, in this case, was therefore a score computed according to a tournament-like comparison among the individuals.

In [5], the parameter set found by SEPaT was compared to the set found using *irace* [25,36] and to "standard" parameters, with results similar to *irace* and better than the "standard" settings.

On the one hand, using this approach, besides allowing one to synthesize the results as a single score, brings the advantage that the functions for which the LL-EAs are tuned do not need to assume values within comparable ranges, avoiding the need for normalization. On the other hand, being based on a comparison may sometimes limit the effectiveness of this approach. In fact, a configuration may win even if it cannot obtain good results on some of the functions, since it is required only to perform better than the others on the majority of them. Therefore, the resulting parameter sets, despite being good on average, may not be as good on all functions. This is one of the limitations that EMOPaT tries to overcome (see Section 5.2).

The multiple objectives taken into consideration by EMOPaT may differ depending on the function under consideration, the quality index considered, or the constraints applied, such as the number of evaluations, time constraints or others. The output of the tuning process is not a single solution as in SEPaT, but an entire set of non-dominated EA configurations, i.e., ideally, a sampling of the Pareto front for the objectives under consideration (see Figure 6 for two examples of Pareto fronts, highlighted in yellow). This allows a developer to analyze the parameters' selection strategy more in depth. We think that this approach can be particularly relevant, in light of the conclusions drawn in [37]: according to the outcome of the experiments, even if the Meta-EAs they considered performed better than SPO and REVAC, the authors pointed out that they were unable to provide insights about EA parameters.
