**Appendix A**

In this appendix, we first show that EMOPaT can be considered a generalization of its single-objective version SEPaT. This implies that for single-objective optimization problems, we can extend the same conclusions drawn for SEPaT in [5,41] to EMOPaT. Then, to assess its general soundness, we demonstrate EMOPaT's ability to give insights about the algorithm parameters and on their influence on the optimization process by showing that EMOPaT can correctly deal with some peculiar situations, such as the presence of useless or bad parameters.

## *Appendix A.1. Comparison with SEPaT*

The equivalence between SEPaT and EMOPaT in the single-objective case has been tested on seven functions (see Table A1) from the CEC 2013 benchmark [38], with the only difference that the function minima were set to 0.

**Table A1.** Comparison between EMOPaT and SEPaT. Experimental settings.


First, we performed tuning as a single run of EMOPaT considering these functions as seven different objectives (optimizing all the functions together), and then by running seven times SEPaT, once for each function. More details about these experiments are summarized in Table A1.

We checked whether the best solutions for each objective that EMOPaT evolved in a single run (also called "top solutions" or "top configurations" in the following), were actually indistinguishable from those obtained by SEPaT when applied to the same objective. To do so, we ran ten independent experiments with both SEPaT (once for each function) and EMOPaT. The best EA configuration for each function found in each run was then tested 100 times on the optimization of the corresponding function. We computed the median for each set of 100 tests and, based on it, selected the overall best configuration for each function.

Table A2 compares the best PSO and DE configurations obtained by SEPaT in ten independent runs to the best configurations obtained, for each corresponding function, in ten independent runs of EMOPaT; the parameters obtained by the two methods are significantly similar. For instance, the nominal parameters chosen for both DE and PSO are almost always the same except for the PSO topology for Composition Function 3. This is the only case in which the parameters chosen by the two methods are clearly different (one population is three times as large as the other, *c*1 is four times larger and the topology is different): nevertheless, the results obtained by the two configurations are virtually equivalent (see Table A3), so the two settings correspond to two equivalent minima of the meta-fitness landscape.

Table A3 shows the median fitness obtained on each function by the best-performing EA configurations found by the tuners and by a standard configuration, and the *p*-values of Wilcoxon's signed-rank test under the Null Hypothesis "There are no differences between two configurations' performance" comparing EMOPaT's best configuration to SEPaT's best and to a standard configuration. While, in general, EMOPaT's configurations perform better than standard parameters (last column), there is no statistical evidence that the best performance of the configurations found by the two methods differ, except for two cases (Rotated Cigar and Rotated Ackley using DE) for which EMOPaT performs slightly better than SEPaT. These results show that EMOPaT can be thought as being generally equivalent to SEPaT in finding the minima of single-objective problems. However, as shown in the paper, one can extract even more information from EMOPaT's results, thanks to its multi-objective nature.


**Table A2.** Best-performing parameters obtained over 10 runs of EMOPaT and SEPaT, and standard settings for PSO ([42]) and DE ([11]).

**Table A3.** Median fitness over 100 independent runs of the best solutions found by EMOPaT, by SEPaT, and by a standard configuration of the optimization algorithm.


## *Appendix A.2. Empirical Validation*

We have artificially created four test cases characterized by:


A similar approach has been proposed by [43], showing the ability of *irace*, ParamILS and REVAC to recognize an operator which was detrimental for the fitness. The results of these tests increase the confidence in the actual ability of EMOPaT to recognize the usefulness or, more in general, the role of a parameter of an EA. We limited our tests to optimizing PSO on the Sphere and Rastrigin functions (see Table A4). In these tests, we modified the original encoding of PSO configurations (Figure 3) as shown in Figure A1.

**Figure A1.** Encoding of PSO configurations in the four cases presented in Appendix A.2. From top left clockwise: useless parameter, harmful numerical parameter, equivalent and harmful topology.

**Table A4.** Empirical validation of EMOPaT. Experimental settings.


## Appendix A.2.1. Useless Parameter

In this experiment, we extended the PSO configuration encoding by adding a parameter *γ* that does not appear in the algorithm and therefore has no effects on it. Our goal was to analyze how EMOPaT dealt with such a parameter with respect to the actually effective ones. Table A5 shows mean and standard deviation of the (normalized) numerical parameters in all NSGA-II individuals at the end of ten independent runs. As can be observed, the useless parameter *γ* has a mean value close to 0.5 and its variance is 0.078, which is very close to 112 , expected for a uniform distribution in [0, 1]: this does not happen with the other parameters. Figure A2 plots the values of the Sphere function against the values of PSO parameters (after recovering their actual value). While the values of the real parameters show a clear trend, the values of *γ* are scattered uniformly all over the graph. As well, the correlation of *γ* with the other numerical parameters is very low (last row of Table A5). This suggests that a useless parameter can be easily identified by a (quasi-)uniform distribution of its values.

**Table A5.** Mean and variance values for PSO's numerical parameters and correlation with a useless one (*γ*). Parameter values are normalized between 0 and 1.


**Figure A2.** Values of fitness (Sphere function) versus PSO parameters at the end of the tuning procedure. The last graph refers to the useless parameter *γ* which, unlike the others, spans across all possible values with no correlation with fitness.

## Appendix A.2.2. Harmful Numerical Parameter

In this experiment, we added to the representation of each PSO configuration a parameter *β* ∈ [0, 1] whose only effect is to worsen the actual fitness *f* proportionally to its value as follows:

$$f = (f + \beta) \cdot (1 + \beta) \tag{A1}$$

Parameter *β* was constantly assigned values close to 0 (mean 7 × <sup>10</sup>−4, variance 7 × <sup>10</sup>−6) by EMOPaT. Figure A3 plots values of *β* versus number of generations, averaged over ten EMOPaT runs. *β* starts from an average of 0.5 (due to random initialization) but, after a few iterations, its value quickly reaches 0.

**Figure A3.** Evolution of the "bad parameter" *β*, averaged over all individuals in ten independent runs of EMOPaT, versus generation number.

Appendix A.2.3. Harmful Nominal Parameter Setting

In this experiment, we added a "fake" fourth topology to PSO configurations. When it is selected, PSO just returns a bad fitness value. We wanted to verify whether that choice would be always

discarded and which values the corresponding gene would take. Figure A4 shows that the fake topology is actually discarded and, after only two generations, is never selected anymore. Moreover, the values of the corresponding gene are always lower than all others; in particular, they are lower than the ones representing the *star* topology, which is also never selected despite being a valid choice.

**Figure A4.** Average values and selection percentages of the genes representing the four topologies versus number of EMOPaT generations. Results averaged over 64 individuals in 10 runs.

## Appendix A.2.4. Equivalent Settings

In the last experiment of this section, we added to the basic representation of the PSO configuration a fourth topology that when selected, acts exactly as the *global* topology. Our goal was to see whether EMOPaT would allow one to understand that the two topologies were in fact the same one. Figure A5 shows the results in the same format as Figure A4. There is no clear correlation between the two "global" versions, but it can be observed that at the end of the evolution, the sum of their selection percentages has converged to the value reached by *global* in the previous experiment. This means that splitting this choice into two distinct values did not affect EMOPaT's performance. Nevertheless, these results were reached more slowly, showing that it takes time for EMOPaT to reach the correct values of a nominal parameter when many choices are available.

**Figure A5.** Average values of the genes representing the four topologies (including the replicated one) and selection percentages. The *x* axis reports the number of EMOPaT generations.
