*4.2. Performance*

The first set of experiments focused on studying how the algorithms evolved over the course of the executions. Figures 2 and 3 show the evolution—over the number of evaluations—of the mean fitness values for both objectives, for the KNP and TSP, respectively. For this set of experiments, the stopping criterion was set to a large number of function evaluations in order to analyze the convergence of the different approaches studied. This will allow us to set the stopping criteria for all the approaches and instances in subsequent experiments. Finally, we note that this preliminary experiment was not applied to the complete set of instances. Instead, a representative set of instances of different types and sizes was chosen for this preliminary overview.

For the KNP (see Figure 2), we solved a total of nine instances: three instances of size 50 and type UNCOR, three instances of different sizes (300, 600, 900) and type WEAK; and finally three of type STRONG with sizes of 300, 600, and 900. Each set of three instances of the same type (UNCOR, WEAK, or STRONG) is shown in the same row. For each instance, two graphs are shown; at the top, the one for objective 1, and the one for objective 2 below it. Note that, in most cases, the algorithms converge quickly for the initial evaluations. The convergence is only slower for the last instances, i.e., those that are strongly correlated. In general, we can see that all the algorithms yield a sharp increase in solution quality in the early generations of the search. We note that, from approximately 50 · 10<sup>3</sup> evaluations, the difference in performance remains constant during almost the entire run; as a result, we used this point as the stopping criterion for the next experiment.

Based on the behavior among the different instances, we can state the following. For the uncorrelated instances (first two rows; six graphs in total), we note that the SOEAs dominate for both objectives at all times, although the MOEAs are very close, with a relatively constant difference. However, for the weakly correlated instances, there is no apparent difference among the approaches, although the gGA appears to be slightly superior. Finally, for the strongly correlated instances, we see a clear dominance of the MOEAs for objective 1, with a notable difference between eES and the remaining approaches.

**Figure 2.** KNP evolution of the mean fitness for objectives 1 and 2.

**Figure 3.** TSP evolution of the mean fitness for objectives 1 and 2.

For the TSP (see Figure 3), we notice that the size of the instances is directly related to the behavior of the algorithms: regardless of the type of instance and the objectives, we can differentiate three types of behaviors. For the small instances (with 100 cities), the SOEAs predominate at the beginning of the runs, but we see how NSGA-II always achieves better objective values from the middle of the runs until the end. There is also a noticeable difference in SMS-EMOA, which stagnates from the beginning and fails to converge in all the small instances. However, in the case of medium-size instances (with 300 cities), the SMS-EMOA converges rapidly, together with NSGA-II, exhibiting better performance and surpassing the other algorithms, but only up to 2 · 10<sup>6</sup> evaluations approximately, where again SMS-EMOA stagnates and is overtaken by SOEAs, which eventually outperforms the other algorithms. For large instances (with 500 cities), once more, SMS-EMOA converges very quickly, in this case accompanied by the other two MOEAs, NSGA-II and MOEA/D, until almost the end of the runs, by which point the SOEAs manage to catch up to the other algorithms. Finally, as concerns the convergence and stagnation of the approaches, we have set the stopping criterion of subsequent experiments to 10 · 10<sup>6</sup> evaluations, in the case of the TSP.

Since, as we noted, the performance of the MOEAs in the TSP improves with the instance size—especially for SMS-EMOA—we decided to run the two largest instances in the TSP data set. These instances have 750 and 1,000 cities. Figure 4 shows that, for both objectives, MOEAs were able to provide better mean objective values than SOEAs during the entire run. In particular, SMS-EMOA yields the best results, despite being the algorithm that obtained the worst results in the small instances.

**Figure 4.** TSP (large instances) evolution of the mean fitness for objectives 1 and 2.
