1. Introduction and Literature Review
Vehicle Routing Problems (VRPs) have been widely studied for decades to address a great variety of real-world problems that involve freight distribution to (or collection from) a set of locations ([
1,
2,
3,
4,
5,
6]). In most classical cases, the set of locations or nodes to be visited are previously determined and fixed, denoting the traditional mandatory nature of these problems, as shown in
Figure 1a.
Nevertheless, in several situations, the nodes to be served must be selected simultaneously with visit sequences, denoting the selective nature of the visits for these cases (as shown in
Figure 1b). Some examples are the Selective VRP, SVRP ([
7,
8,
9,
10]), Orienteering Problems ([
11,
12,
13,
14,
15,
16,
17]), and the Generalized VRP, GVRP ([
18,
19,
20,
21,
22,
23,
24,
25,
26,
27]). With this regard, other related problems belonging to the family of extensive facility location problems, in which the main goal is to determine the topology of the network for serving a set of customers by means of tours, paths, trees, or other types of networked structures have been proposed in the literature ([
28,
29,
30,
31,
32,
33,
34,
35,
36]).
However, there are a wide variety of applications that involve serving a set of islands or isolated regions (as shown in
Figure 2), which cannot be directly addressed by any aforementioned VRPs or related works. Some distinctive features of these applications are the selection of the ports, docks, or locations to visit each island (similar to selective VRPs) jointly with the transportation process and costs inside the islands.
Figure 2 shows a simple strategy to transport the freight between demand locations and ports, based on a greedy criterion (i.e., the nearest selected port for each demand location). Note that if the transportation costs inside the islands are insignificant, then the problem becomes the Generalized VRP or the Generalized Traveling Salesman Problem (GTSP). Other advanced transportation processes, such as routing or consolidation inside the islands, can be considered, which may require advanced mathematical models and solution techniques.
Only a few recent studies ([
37,
38,
39]) have proposed some formulations for this type of problem under different settings and assumptions. These studies introduced formulations as variants of the SVRP and the GVRP. The corresponding class of problems is referred to as the Insular Vehicle Routing Problem (InVRP), including the Insular Traveling Salesman Problem (InTSP) studied by [
39] and in this present research.
Motivated by a real case study in an archipelago of Chiloe and Palena in southern Chile, the aforementioned studies predominantly focus on designing a household waste collection system, where a set of small rural islands must be served by one barge to collect household waste. Each island has at least one port that should be selected to operate as a waste collection site. In this problem, the node selection and visit sequence decisions must be optimized simultaneously.
The InVRP modelling structure presents the advantage of being quite appropriate in a variety of applications. Some examples are food deserts in inland regions ([
40,
41,
42,
43,
44,
45]), and the maintenance of wind farm facilities ([
46,
47]). Overall, situations that involve serving isolated regions (inland or offshore) for different purposes (e.g., distributing, collection, maintenance, etc.) are suitable applications.
As shown in
Figure 2, the ground transportation costs (GTC) and its associated processes inside the islands are relevant drivers for the problem. In this case, each demand location is assigned to the nearest selected port. Based on this ground transportation strategy inside the islands, [
39] introduced a first approximation for these costs by explicitly aggregating the actual demand locations with fictitious centroids. Subsequently, based on this approximation, the authors formulated the Bi-Objective Insular Traveling Salesman Problem (BO-InTSP), which consists of minimizing the maritime transportation costs (MTC) incurred by a barge to visit the selected ports, and the GTC incurred inside the islands to transport the freight between users, households, or demand locations, and such selected ports.
It is worth mentioning that the family of clustered VRPs (or TSP) are also related to our research, since they consider the existence of node groups or clusters to be served ([
48,
49,
50]). However, in these problems, all of the nodes must be visited, as opposed to the BO-InTSP, where visiting all of the nodes is not mandatory.
Representing several real-world problems, multi-objective VRPs have been extensively researched in the literature; see [
51,
52,
53] for reviews on multi-objective VRPs and related problems. Rather than exact methods, heuristic methods have been mainly applied to multi-objective routing problems for obtaining an approximation to the Exact Pareto Front ([
54,
55,
56,
57]). Naturally, finding the full Exact Pareto Front for large-size instances is a difficult task in this type of Mixed Integer Programming (MIP), NP-hard problems.
Other problems related to the InVRP and the BO-InTSP are the multi-objective network design problems, in which the topology of the network is designed while optimizing different relevant objectives, such as the total cost, CO
2 emissions, and waste generation, among others ([
58,
59,
60]). Other examples of extensive bi-objective facility locations are the median path problem and the median shortest path problem ([
61,
62]). Both problems minimize the total distance of the tour and the travel distance from the non-selected nodes to the closest stop on the tour. Furthermore, other variants are the bi-objective ring star problem ([
63,
64]), the bi-objective covering tour problem ([
61,
65]), and the traveling purchaser problem ([
66,
67]).
The main assumptions of the approximation in [
39] are: (i) the demands are aggregated and located at a certain number of centroids instead of at the real demand locations, (ii) as input parameters, the total island demand is homogeneously fragmented and assigned to the defined centroids, (iii) each centroid is associated to a single port and vice-versa; thus, if such a port is visited, then the associated centroid demand will be served through this port, and (iv) if a port is not visited (i.e., a non-selected port), then the demand of its associated centroid will be homogeneously split among the other selected ports of the island.
In some cases, when real information is not completely available or computational issues arise, this aggregated approximation may be a reasonable approach. However, there is a lack of evidence about the quality of this approximation, and, moreover, there is no systematic methodology to assess its quality. Notice that if an inappropriate GTC approximation is employed, then an incorrect port selection would be obtained in terms of the quantity and location, thus yielding solutions with an inadequate trade-off between MTC and GTC. Thus, a thorough analysis of the solutions obtained by the Approximated Model and the Exact Formulation is worth studying.
This paper proposes a novel Exact Formulation for the BO-InTSP based on the actual demand locations inside the islands, assuming that each user or inhabitant would prefer the nearest operating port (i.e., node), instead of aggregating demand locations at a set of fictitious centroids, as in [
39]. Subsequently, this research proposes and develops a systematic evaluation approach to compare the sets of non-dominated points obtained with the two bi-objective formulations using the same exact algorithm.
It is worth highlighting that the proposed evaluation approach substantially differs from traditional multi-objective approaches, which usually compare the sets of non-dominated points generated by different approximated algorithms (i.e., heuristic) for a single problem or model formulation. Thus, this research contributes to an enhanced analysis and comparison among models with different accuracy or aggregation levels. The proposed approach might be particularly important when trying to balance the effort needed to solve a problem either through an Exact Formulation or through an Approximated Model, as in this research. Furthermore, the proposed strategy employed to compare different models may be extended to models with more than two objectives (multi-objective problems).
In multi-objective optimization, several performance indicators exist to measure the quality of a given set of non-dominated points ([
68,
69,
70,
71]). Some examples of these indicators are the hypervolume index (or dominated area for the 2-dimension case), uniformity index, covering index, or simply the obtained number of non-dominated points. In general, finding a good approximation to the set of non-dominated points would be equivalent to: (i) maximize the number of obtained non-dominated points, (ii) maximize the associated dominated area, (iii) minimize the distance between each pair of non-dominated points, and (iv) maximize the range covered by the set of non-dominated points for each objective function. As can be observed, the problem of finding a good quality set of non-dominated points is a multi-objective problem itself. Usually, all these quality indicators are employed to compare the performance of different multi-objective heuristic algorithms based on the sets of (approximated) non-dominated points found by each algorithm. It is clear that this kind of performance indicators are not needed if an algorithm is able to provide the actual set of non-dominated points.
Thus, this paper aims at evaluating and comparing the set of non-dominated points obtained by using the Approximated Model with respect to the set of non-dominated points obtained by the novel Exact Formulation. This comparison involves the hypervolume index among other natural performance indicators. Furthermore, note that the actual set of non-dominated points obtained by the Approximated Model is indeed an approximation to the actual set of non-dominated points of the Exact Formulation.
Summarizing, this paper proposes an Exact Formulation for the BO-InTSP, and compares the results obtained with this formulation to those associated with the aggregated-based mathematical formulation (i.e., Approximated Model) in [
39]. The comparison is focused on the set of non-dominated points obtained when employing each formulation.
Three types of centroids were tested for the aggregated formulation: (i) manually defined centroids, (ii) geometric centroids based on the shape of the islands, and (iii) centre-of-mass obtained by averaging the coordinates of the non-aggregated demand locations. Therefore, the comparison shows the effect of using different procedures to determine the centroid locations. The Pareto Front for each formulation is obtained using the AUGMECON2 method described in [
72,
73].
The remainder of this paper is organized as follows.
Section 2 presents the problem description, the previous Approximated Model, and the proposed novel Exact Formulation.
Section 3 introduces the computational application along with a description of methodological foundations required for its implementation and analysis.
Section 4 presents and analyzes the main results from the computational applications. Finally,
Section 5 concludes the main findings of this work.
2. Problem Description and Formulations
2.1. General Problem Description
The BO-InTSP aims at determining a set of efficient, single sequences to visit a collection of islands with a single barge, while minimizing both the MTC and GTC based on a bi-objective approach. In this problem, the barge must collect all of the freight in a single period (e.g., a day or a week), and it is assumed that the barge has a sufficient capacity for collecting all of the freight. The decisions involved in this problem are the port or dock selection at each island along with the respective visit sequence. Finally, a single depot or transfer station is considered as the start and end of the barge route.
Each island has one or more available ports that may be potentially employed as a collection site, as indicated in
Figure 3. Accordingly, the model should optimize the collection site selection and the visit sequence of the selected ports, ensuring that each island is visited at least in one port. Note that all selected nodes of an island are not necessarily visited consecutively prior to visiting other islands. In the BO-InTSP, the total MTC of the barge for visiting all selected ports, and the total GTC incurred by its inhabitants to carry the freight to the selected ports at each island, must be minimized.
Figure 3 presents two examples in which a single visit sequence includes only one port per island (
Figure 3a) and all island ports (
Figure 3b).
In the studied waste collection case, the freight inside the islands is generated by a set of rural households, and the inhabitants must transport their freight to the ports by employing modest transportation (e.g., walking, cow, tumbril, horse, etc.). In this scenario, the GTC inside the islands is incurred by inhabitants, whereas the MTC is incurred by local authorities or a logistic provider to serve the islands. In other words, these costs are incurred by different agents or stakeholders. If only the GTC is minimized, then the problem leads to a solution where all of the ports are visited, as in
Figure 3a, since the freight of each centroid is picked up using its nearest port. On the contrary, if only the MTC is minimized, then the problem may lead to a solution where only one port is visited for each island, as in
Figure 3b. Therefore, the GTC and MTC objectives are in conflict; therefore, a bi-objective approach must be adopted. Thus, the set of efficient solutions that jointly minimize both the MTC and the GTC should be found. Each efficient solution comprises both the selection of the ports that must be visited and the visit sequence of such ports. Considering that the Selective Traveling Salesman Problem (STSP) and the GTSP are both NP-hard problems ([
7,
27]), and that they are particular cases of the InTSP (if all of the nodes or ports conform a single big island, then the InTSP becomes a STSP; furthermore, if the ground transportation cost is zero, then the InTSP becomes a GTSP), then InTSP is NP-hard as well.
The following
Section 2.2 and
Section 2.3 describe the previous Approximated Model and the proposed Exact Formulation for the studied problem, respectively.
2.2. Approximated Model Based on Demand Aggregation
In [
39], the authors proposed an approximated approach for computing the GTC. In this approximation, it is assumed that each island is fictitiously segmented into a certain number of zones. Each zone is associated with the nearest port and vice-versa. These fictitious zones are defined so that if all ports of an island are selected, then the freight generated at each zone will be fully collected through its associated port. Furthermore, each zone is represented by a fictitious centroid that concentrates the total freight of the zone.
Hereafter, the location of each centroid and its associated freight must be computed as model parameters. In [
39], a manual procedure is employed to locate the centroid for each zone, and the total freight of the island is homogeneously split among the centroids. These assumptions are motivated by a complete absence of information related to freight generation locations within each island.
An additional assumption is employed for splitting the freight of each centroid among the selected ports on the island. This assumption is relevant only if the port associated to this centroid is not selected and its freight should be collected through other selected ports. In this case, the Approximated Model assumes a linear modelling structure, where the freight located at a centroid is homogeneously split among the other selected ports. Otherwise, if the associated port is selected, then the centroid demand will be completely collected through this port.
The sets, parameters, and decision variables of the problem are the following:
H: Set of islands to be served by a barge;
N: Set of nodes, including only the ports at the islands;
N0: Set of nodes, including ports and the depot i0;
Ωh: Set of nodes at each island h;
Ψh: Set of all possible combinations of ports for island h;
Khs: Set of nodes belonging to island h that are selected as collection sites under combination s ∈ Ψh;
δh: Number of nodes at each island h;
MCij: MTC from node i to node j;
Qh: Freight volume to be collected from each island h;
GChs: Total GTC at island h, if the port combination s is selected. This cost parameter represents the total GTC incurred by the island inhabitants if port combination s is selected;
Zi: Binary variable indicating if node i is selected to be visited;
Yij: Binary variable indicating if node j is visited immediately after node i;
Xhs: Binary variable indicating if island h is visited using combination s;
Fij: Fictitious flows from node i to node j.
Accordingly, the Approximated Model is formulated as (1)–(13). Expressions (1) and (2) correspond to the MTC and GTC objective functions, respectively. Constraints (3) ensure the exact selection of a single port scenario
s for each island
h. Constraints (4) and (5) are logical relationships between the decision variables
Z and
X, ensuring that if a combination of port s is selected for each island h (
Xhs = 1), only the selection variables for the associated ports are activated (
Zi =1, for each
i ∈
Khs). Constraints (6) and (7) ensure that, for each selected port (i.e.,
Zi = 1), the barge must enter and exit exactly once, respectively. Constraints (8)–(11) are the sub-tour elimination constraints ([
74]). Finally, Constraints (12) and (13) are of domain.
2.3. An Exact Formulation for the BO-InTSP
The proposed novel exact mathematical formulation for the BO-InTSP considers a disaggregated scheme, relying on the knowledge of the real demand locations (e.g., households, restaurants, or hospitals) and their respective demand values. In this model, island inhabitants must travel to their nearest selected port for transporting their freight. Thus, when capacity constraints at the ports are not relevant, the model will allocate each demand location to its nearest selected port, as shown in
Figure 4. It can be shown that any efficient solution is equivalent to a greedy assignment scheme that all inhabitants may follow. Note that this figure presents an example of a part of a complete feasible solution (ground transportation component).
Under this alternative disaggregated scheme, the total GTC for every island is explicitly computed based on the actual allocation of the demands to the selected ports. The proposed model contains additional variables compared to the previous Approximated Model but provides a better problem representation.
The Exact Formulation considers the definition of the following additional parameters and decision variables:
Mh: Set of demand locations belonging to each island h;
dil: Distance between port i and demand location l;
ql: Demand of location l;
ϕil: Binary variable indicating if demand location l is allocated to node i.
Due to the absence of freight generation information for each specific demand location, it is assumed that the total demand of an island is homogeneously split among all demand locations, as ql = Qh/|Mh|, ∀h ∈ H, ∀l ∈ Mh. Notice that this assumption does not affect the structure of the Exact Formulation, since ql is only a parameter of the model that may be computed following any other information or assumption.
Consequently, the novel Exact Formulation of the BO-InTSP is as (14) to (27). Expressions (14) and (15) correspond to the MTC and GTC objective functions, respectively. MTC is the same as the Approximated Model, whereas GTC corresponds to a new Exact Formulation based on demand location assignment decisions. Constraints (16)–(18) are new constraints that allow for computing GTC in an exact manner. Constraints (16) ensure that at least one port is selected for each island. Constraints (17) guarantee that each demand location is assigned to a single port, and restrictions (18) allow for assignment decisions (
ϕij) only involving selected ports (
Zi = 1). Expressions (19)–(24) are routing-related constraints, and are equivalent to Equations (6)–(11). Finally, Constraints (25)–(27) are of domain.
3. Computational Issues and Comparison Methodology
This section details the computational experience and methodological foundations of the proposed comparison approach. The goal is to compare the results obtained with both the Approximated Model ([
39]) and the Exact Formulation proposed in this research (
Section 2.3).
The two formulations are solved using the AUGMECON2 method described in [
72]. This method guarantees obtaining the Exact Pareto Front for multi-objective mixed integer programming problems whose objective functions involve only integer variables (which are satisfied in both formulations). Additionally, possessing the nadir point is required, which, in this implementation, is obtained at the beginning of the algorithm.
The AUGMECON2 algorithm works similarly to the well-known epsilon constraints algorithm, in which a mono-objective reformulation of the original multi-objective problem is solved iteratively by optimizing only one of the objective functions, and the remaining functions are instead handled as model constraints. The AUGMECON2 algorithm performs an accelerated update of the right-hand side parameters of the model constraints associated with the objective functions. For more details, see [
72,
73]. The algorithm was implemented using AMPL as a programming language, CPLEX 12.8 as the optimization package, and a computer with an Intel i7-4770 processor @ 3.40 GHz with 8 GB RAM, where the number of threads is set to one.
It is worth highlighting that, depending on the approach employed to obtain the Pareto Front of a bi-objective problem, different results may be produced. For example, if heuristic approaches are employed, only approximated Pareto Fronts would be obtained, and there would not be a guarantee for obtaining optimal solutions and the Exact Pareto Fronts. Thus, any comparison would not be completely definitive or conclusive. However, in this research, we aim to compare the exact Pareto Fronts obtained for both formulations (the exact and the approximated models) in order to gain more conclusive insights. Thus, a state-of-the-art, exact solution approach for multi-objective MIP problems is employed; in this case, the AUGMECON2 algorithm. Nonetheless, it is possible and advisable to develop or apply advanced solution approaches, particularly for larger instances, such as Multi Objective Evolutionary Algorithms based on Decomposition (MOEA/D), Non-Sorting Genetic Algorithms II (NSGA-II), and Pareto Local Search, including all of its variations and improvements ([
75,
76,
77,
78,
79,
80]).
3.1. Test Instances and Experiment Design
This study considers a variety of instances that are based on two original instances presented in [
39]: one real and one fictitious. The original real instance contains 21 islands from southern Chile, from which, 11 islands have 1 port, 8 islands have 2 ports, and 2 islands have 3 ports. The original fictitious instance comprises 20 synthetic circular islands, where 8 islands contain 2 ports, 8 islands contain 3 ports, and 4 islands contain 4 ports. A first set of real instances are generated by randomly selecting islands from the original real instance with 21 islands. Similarly, a second set of fictitious instances are generated from the original synthetic instance with 20 circular islands.
Although larger instances may be solved to optimality for the single objective TSP in a reasonable time, solving its bi-objective version is more difficult from a computational point of view. Therefore, this research considers small and medium-sized instances. Furthermore, given that this study aims at solving the two formulations at optimality, the resolution of larger instances is prohibitive. In this case, the use of non-exact algorithms or heuristics is recommended, which is out of the scope of this study.
The two original instances were built considering an aggregated GTC, in which the demands were modelled by employing aggregated centroids at each island. Conversely, the proposed disaggregated Exact Formulation relies on the direct GTC (i.e., Euclidean distances) between the disaggregated demand locations and the island ports. Accordingly, disaggregated demand locations inside the islands are randomly generated in this study, since real demand locations are not available.
For solving the Approximated Model, three alternative centroid generation approaches, namely Manual Centroids, Geometric Centroids, and Centre-of-Mass, are employed for each instance. Only Manual and Centre-of-Mass centroids are employed for the real instances. For solving the Exact Formulation, disaggregated demand locations were created in a random manner independently for the real and the fictitious instances, yielding between 22 and 50 nodes per island. Homogeneous demands are considered for all generated demand locations. Furthermore, the Centre-of-Mass centroids employed for the Approximated Model are computed using the coordinates of these disaggregated demand locations.
The computational experience is divided into two parts. The first experiment (Part I) aims at showing and analyzing the conceptual and structural differences between the set of non-dominated points obtained with both the Approximated and the Exact Formulations considering only small instances. For this analysis, one real and three fictitious instances are generated based on the two aforementioned original instances. These are defined as Real-0820, Fict-0660, Fict-0064, and Fict-0004, where the digits of each instance name (real or fictitious) indicate the number of islands with 1, 2, 3, or 4 ports, respectively. For example, Real-0820 denotes 0 islands with 1 port, 8 islands with 2 ports, 2 islands with 3 ports, and 0 islands with 4 ports.
The second experiment (Part II) aims at evaluating the aggregated behavior and the computational performance of the two formulations. In this case, only Centre-of-Mass is employed for generating centroids, given the results of the first computational experience discussed in
Section 4.1. This experiment focuses on solving 10 real instances with 18 islands that are randomly selected from the original 21-island real instance, which comprise 8 islands with 1 port, 8 islands with 2 ports, and 2 islands with 3 ports. Additionally, 10 fictitious instances are considered containing 17 islands that are randomly selected from the original 20-island fictitious instance, where each instance comprises 7 islands with 2 ports, 7 islands with 3 ports, and 3 islands with 4 ports. Following a similar notation associated with the first part of the experiment, the large instances are named as Real-wxyz-n and Fict-wxyz-n, where w, x, y, and z define the number of islands with 1, 2, 3, and 4 ports, respectively, and the additional index n defines a correlative identification number for each instance, ranging from 01 to 10.
Table 1 summarizes the instances considered in this study.
3.2. Multi-Objective Comparison Approach
Following [
81,
82], the concept of the dominated area, DA (also known as the hypervolume or
S metric), is employed to compare the set of non-dominated points obtained with the two formulations, as illustrated with the examples in
Figure 5. This figure presents different Pareto Fronts in the objective space, which comprises the values of the objective functions for feasible solutions. The index DA represents the area inside the rectangle limited by
,
, and
that is weakly dominated by a set of non-dominated points, as shown with the grey area in
Figure 5a, where the reference points
and
are the ideal and the anti-ideal points, respectively. As may be expected, more accurate and complete sets of non-dominated points yield larger values of the DA.
Normally, sets with more non-dominated points that are uniformly distributed along the objective space, and closer to the ideal point, yield higher DA values. In
Figure 5b, the DAs of two sets are partially overlapped, where the Pareto Front with black dots presents a higher DA value than the Pareto Front with white dots. Particularly, the true set of non-dominated points provides the maximum feasible value for the DA index. Thus, a goal in our study is to compute the hypervolume for two Pareto Fronts: one Pareto Front is obtained with the Approximated Model and another Pareto Front is obtained with the Exact Formulation.
In order to perform a proper comparison between the solutions obtained with the two formulations, the actual set of non-dominated points obtained with the Approximated Model must be projected into the objective space of the Exact Formulation. The projection of points into the objective space of the Exact Formulation consists of evaluating the solutions obtained with the Approximated Model with the objective function of the Exact Model. Consequently, these projected points represent an approximation to the true Pareto Front of the Exact Formulation. Let this projected set be the Approximated Pareto Front, and the true set of non-dominated points obtained by the Exact Formulation be the Exact Pareto Front. Naturally, it is expected that some points belonging to the Approximated Pareto Front are dominated by some points belonging to the Exact Pareto Front, whereas other points may coincide with points of the Exact Pareto Front. An Approximated Pareto Front can be considered a good approximation to the Exact Pareto Front if it contains points that are close to the Exact Pareto Front. Furthermore, for a very good approximation, a large share of its points coincides with some points of the Exact Pareto Front.
In addition to the DA index, a variety of error measurements are computed to evaluate the quality of the points obtained with the Approximated Model after projecting them into the objective space associated with the Exact Formulation. These error measurements are computed by comparing each point of the Approximated Pareto Front to all points from the Exact Pareto Front that dominate this point, as shown in
Figure 6. If a point of the Approximated Pareto Front coincides to a point belonging to the Exact Pareto Front, then its associated error is zero. Naturally, the smaller the error, the better the Approximated Pareto Front, and, therefore, the Approximated Model.
In the example of
Figure 6, points S1, S2, and S3 are part of an Approximated Pareto Front, whereas S3, S4, S5, and S6 belong to the Exact Pareto Front. Notice that a point belonging to the Exact Pareto Front may also be obtained through the Approximated Model, as S3, whose error is zero. On the contrary, some points in the Approximated Pareto Front may be dominated by some points belonging to the Exact Pareto Front. In the example of
Figure 6, S2 is dominated by S6, and S1 is dominated by S4 and S5. In this case, when comparing S2 with S6, a relative error
e is defined for each objective component, as shown in Equation (28). In this study,
x and
y refer to the MTC and the GTC, respectively.
Based on Expression (3), and considering that
P is the Exact Pareto Front and
is the Approximated Pareto Front, two types of errors are defined: the maximum and Euclidean errors for each point
, with respect to each point
that dominates point
k. Both expressions are associated with alternative norms of a vector, where norm ∞ is associated with the maximum error and norm 2 is associated with the Euclidean error. These errors are shown in Expression (29), where
j <
k indicates that point
j dominates point
k.
Finally, considering that more than one point from the Exact Pareto Front may dominate a point from the Approximated Pareto Front, a combined error is computed by considering the minimum value for each type of norm (norm ∞ and norm 2), as indicated in Expression (30) and (31).
4. Results and Discussion
4.1. Results for Small Instances
In this section, the results associated with both the approximated and the exact formulations are presented and discussed for one real and three synthetic instances, as described in
Section 3.1. As previously mentioned, this comparison is performed by contrasting the set of non-dominated points obtained by the Approximated Model with the set of non-dominated points obtained with the Exact Formulation. In addition, since GTC is not computed in an exact manner within the Approximated Model, this cost must be recomputed for each obtained non-dominated point prior to performing an appropriate comparison. Subsequently, the comparison assumes the following definitions:
NDP*: Exact Pareto Front (set of non-dominated points obtained with the Exact Formulation);
NDP-0: Exact set of non-dominated points obtained with the Approximated Model;
DP-1: Set of points in NDP-0 that are dominated by points in NDP* after GTC is recomputed;
NDP-1: Set of points in NDP-0 that are actually non-dominated after GTC is exactly computed. This set is also referred to as the Approximated Pareto Front;
DP-2: Set of points in NDP-1 that do not belong to NDP*;
NDP-2: Set of points in NDP-1 that belong to NDP*.
Figure 7 shows the set of non-dominated points obtained with the Approximated Model (NDP-0) for the instance Fict-0660 using Manual Centroids. The set of non-dominated points with the approximated GTC is shown with grey dots, whereas the points associated with the same solutions with an exact GTC computation is denoted by black dots and grey triangles. Notice that the MTC remains constant for each point, whereas the GTC vertically varies. For this instance, the approximated GTC is always lower than the exact GTC for each point, as shown in
Figure 7. As a consequence, once the exact GTC is computed, some points are dominated by others belonging to the same original set, and thus, they must be removed. In other words, the solutions associated with these points are not efficient for the Exact Formulation. The removed points are represented by grey triangles in
Figure 7 (DP-1), whereas the black dots denote those points that remain as non-dominated (NDP-1).
For this instance, 81 points (grey dots) are originally obtained for the approximated GTC using Manual Centroids (NDP-0), whereas only 43 points (black dots) remain as non-dominated points after the exact GTC is computed (NDP-1). As expected, only a part of NDP-1 belongs to the exact set of non-dominated points (NDP*), which may be observed once the two sets are contrasted (see
Figure 8). In this example, NDP* comprises 127 points, and only 17 points from NDP-1 belong to NDP* (i.e., NDP-2). The remaining 26 points (DP-2) are dominated by other points in NDP*.
Figure 9 and
Figure 10 show similar results for the same instance Fict-0660, but using Centre-of-Mass instead of Manual Centroids.
Table 2 summarizes all of the previous results for instance Fict-0660, including those results obtained with the Geometric Centroids. In this case, the approximated GTC is an underestimation of the exact GTC, analogously to the Manual Centroid approach. The terminology in
Table 2 is as follows:
Average GTC-Error: Average of the relative error of the approximated GTC with respect to the exact GTC for all obtained non-dominated points (exact–approximated). It is positive (negative) when the approximated GTC is lower (higher) than the exact GTC;
DA1X: Relative DA for a set X with respect to the DA for the Exact Pareto Front;
DA2
X: Relative DA for a set
X with respect to the area of the square that defines DA for the ideal point
, as shown in
Figure 5a;
Min. Error with Norm-∞: Minimum value of the maximum relative errors for each dominated solution;
Min. Error with Norm-2: Minimum value of the Euclidean relative errors for each dominated solution;
Avg1: Average values of all dominated points, including zero values;
Avg2: Average values of all dominated points, excluding zero values.
Notice that the most appropriate outcome when using the Approximated Model is NDP-1 instead of NDP-0, assuming that it is possible to compute the exact GTC for any point of the Approximated Pareto Front, which is not equivalent to solving the Exact Formulation. In concordance with the previous discussion,
Table 2 shows that the initial number of non-dominated points obtained with the Approximated Model (NDP-0 = 81, 78, and 80 for the Manual, Geometric, and Centre-of-Mass centroids, respectively) includes points that must be removed from the set of non-dominated points after GTC is exactly recomputed (NDP-0–NDP-1 = 38, 36, and 25 for the Manual, Geometric, and Centre-of-Mass centroids, respectively). In other words, when the exact GTC is recomputed for the original set of non-dominated points (NDP-0), there is no guarantee that all points remain as non-dominated points. This result manifests an important inefficiency of using the Approximated Model. Conversely, the Exact Formulation provides the exact set of non-dominated points without the need for removing any point (i.e., for the Exact Formulation, NDP* = NDP-1 = NDP-2 and DP-1 = DP-2 = 0).
The results in
Table 2 also reveal that the utilization of the Centre-of-Mass approach provides a better set of non-dominated points for instance Fict-0660, when compared to the other two methods (Manual and Geometric). This inference is obtained by observing the indicators in this table (i.e., Average GTC-Error, |NDP-1|, DA1
X, DA2
X, |NDP-2|, and the error measurements of the dominated points).
All error measurements shown in
Table 2 are relatively small for the three types of centroids, having average values less than or equal to 1.05%. Moreover, by observing errors with norm ∞, the dominated points obtained with the Approximated Model (DP-2) that do not belong to NDP* present an average deviation of at most 0.92% in terms of the MTC or GTC (with respect to the points in NDP* that dominate it). Although norm 2 is an upper bound to norm ∞, both norms seem to depict very similar values. Finally, despite the small error measurements, the number of points provided by the Approximated Model (NDP-1) appears to be significantly smaller compared to the points in the Exact Pareto Front NDP* (approximately 37%).
Similarly,
Figure 11 and
Figure 12 present the results for instance Real-0820 with Manual and Centre-of-Mass Centroids, respectively.
Figure 11 shows that the approximated GTC with Manual Centroids overestimates the exact GTC, as opposed to the above fictitious instance, whereas
Figure 12 illustrates that the Approximated Model with Centre-of-Mass centroids underestimates the exact GTC, similar to instance Fict-0660. In other words, the manual approach for determining the centroids in the Approximated Model may present erratic results, which may be explained by the non-systematic and random nature of the Manual Centroid method.
Finally,
Figure 13 shows the Approximated Pareto Front (NDP-1) considering two types of centroids, Centre-of-Mass (black dots) and Manual Centroids (unfilled squares), and the Exact Pareto Front, NDP* (grey triangles). Although the number of non-dominated points obtained with the two centroid methods is significantly small (on average, 36% of the NDP*), these sets of points are quite close to the Exact Pareto Front NDP*, where it is difficult to visually distinguish between them.
Table 3 presents a summary of the results associated with instance Real-0820, and
Table 4 and
Table 5 summarize the results for instances Fict-0064 and Fict-0004, respectively. These results illustrate an overestimation of the GTC for the Manual Centroids, and an underestimation of the GTC for Centre-of-Mass Centroids for instance Real-0820. In addition, a GTC underestimation is observed for instance Fict-0064 with Manual and Centre-of-Mass Centroids, as shown in
Table 4 and
Figure 14. This underestimation is not clear for the case of the Geometric Centroids.
In
Table 3,
Table 4 and
Table 5, column “Avg. Abs. GTC-Error” is added to show the average of the absolute GTC errors. This indicator becomes relevant in some cases, where some points obtained with the Approximated Model (NDP-0) present a GTC overestimation, and other points of the same set present a GTC underestimation. Thus, columns “Average GTC-Error” and “Avg. Abs. GTC-Error” may yield different absolute values, such as for the Geometric Centroids for instance Fict-0064 (See
Table 4), and for all types of centroids for instance Fict-0004 (see
Table 5 and
Figure 15).
Two main types of errors related to the GTC estimation for the Approximated Model may be distinguished. One type of error, Error Type A, consists of an underestimation of the GTC between the Centre-of-Mass centroids and the ports compared to the actual GTC between the real demand locations and the ports for the Exact Formulation, as shown in
Figure 16. This type of error only applies for the Centre-of-Mass Centroids, since the other two methods may present an erratic error behavior. In other words, these methods may yield either an underestimation or overestimation of the GTC, whose behavior cannot be anticipated nor systematically studied.
The other type of error, Error Type B, relates to the centroid demand splitting, when its associated port is not selected. In this situation, the Approximated Model assumes that this demand is homogeneously split among the selected ports (for all types of centroids), as indicated in
Figure 17. This assumption does not apply when either one or all ports are selected. If only one port is selected, then the whole island demand is collected through this single port. Whereas, if all of the ports are selected, then the demand of each centroid is fully collected through its associated port, and thus, this type of error is non-existent.
The aforementioned approximated manner of splitting the demand of a centroid may be acceptable in the case that two out of three available ports are selected, as in
Figure 17a. On the contrary, as
Figure 17b suggests, this assumption does not seem to be reasonable for a four-port island, since the centroid demand should be more concentrated towards the two nearest selected ports (as opposed to the homogenous splitting assumption). Consequently, this type of error produces a GTC overestimation.
According to the previous discussion, instances containing islands with fewer than four ports mainly present Error Type A, which yields a significant GTC underestimation (only for the Centre-of-Mass Centroid method). This underestimation is clearly observed in
Figure 14 and
Figure 17 and
Table 1 and
Table 2. Conversely, the two types of errors may be significant for instances with four or more ports, producing a combination of GTC in
Figure 14 and
Figure 15 and
Table 4 and
Table 5.
4.2. Computational Performance Evaluation with Larger Instances
This section presents a discussion on the results obtained for the set of 20 instances (10 fictitious and 10 real). The fictitious instances contain seven two-port islands, seven three-port islands, and three four-port islands (instances Fict-0773-01 through Fict-0773-10), and the real instances contain eight islands of one port, eight islands of two ports, and two islands of three ports (instances Real-8820-01 through Real-8820-10).
Table 6 presents a summary of the results when solving the 20 instances with the Approximated Model and the Exact Formulation, assuming the same notation of the previous tables. In this case, the Approximated Model is solved considering only Centre-of-Mass Centroids.
Appendix A graphically shows the set of dominated points obtained with the two formulations for instances Real-8820-01 through Real-8820-10 with the Manual Centroids method.
Table 6 shows that the number of non-dominated points using the Exact Formulation (NDP*) is significantly larger for the fictitious instances than for the real instances, since the former has considerably more nodes (51 vs. 30). Therefore, a larger number of nodes produces significantly more points in the objective space due to the combinatorial nature of the problem.
In addition, the average GTC error is considerably larger for the fictitious instances (7.63% vs. 3.71%), which could be mainly explained by a smaller Error Type A in the real instances. Notice that the real instances may not present a significant Error Type B because these instances do not have four-port islands (as discussed in
Section 4.1). These smaller GTC errors for the real instances yield better results in terms of the quality of the obtained sets of non-dominated points (NDP-1), which is observed in the performance indicators (|NDP-1| and |NDP-2|). The DA
NDP-1 is the only indicator that performs similarly for both sets of fictitious and real instances. For example, on average, 37.64% of the points obtained for the fictitious instances remain as non-dominated after GTC is recomputed (|NDP-1|/|NDP-0|), while these points also represent 33.83% of the Exact Pareto Front (|NDP-1|/|NDP*|). These percentages rise to 50.51% and 57.05% for the real instances, respectively. Furthermore, the algorithm obtains, on average, 9.9% of the points in the Exact Pareto Front for the fictitious instances, and 28.7% for the real instances (|NDP-2|/|NDP*|).
Overall, the performance indicators are quite similar within each instance group. For example, the values of the “Avg. GTC Error” present a coefficient of variation close to 0.1 for the real instances and 0.07 for the fictitious instances, denoting the low dispersion of these values. Additionally, |NDP-1|/|NDP*| presents a coefficient of variation close to 0.09 and 0.1 for the real and fictitious instances, respectively.
Concurring with the results associated with the smaller instances in
Section 4.1, all error measurements in
Table 6 show that all dominated points (DP-2) obtained with the Approximated Model are relatively close to the exact set of non-dominated points (on average at most 1% in GTC or MTC, according to norm ∞).
For the same groups of real and fictitious instances,
Table 7 adds the following notation:
N: Number of executions within the AUGMECON2 algorithm (i.e., a mono-objective optimization);
Time: Total computing time to complete the N executions;
Δ1: Relative variation in the indicator Time/N (Exact vs. Approximated), which relates to the time variation when solving a mono-objective optimization problem;
Δ2: Relative variation in the indicator Time/NDP-0 (Exact vs. Approximated), which relates to the time variation for obtaining non-dominated points prior to the GTC re-computation;
Δ3: Relative variation in the indicator Time/NDP-1 (Exact vs. Approximated), which relates to the time variation for obtaining non-dominated points after the GTC re-computation.
Notice that NDP* = NDP-0 = NDP-1 for the Exact Formulation, since the non-dominated points obtained with the Exact Formulation actually belong to the Exact Pareto Front.
The efficiency of employing the Approximated Model to obtain non-dominated points performs slightly better for the real instances. 35.8% of the executions provide non-dominated points (NDP-1) for the fictitious instances, whereas 42% of the executions provide non-dominated points for the real instances (|NDP-1|/
N). These results may be explained by a better GTC estimation for the real instances than for the fictitious instances, which is observed in
Table 6 through the column “Avg. GTC-Error”.
The average computing times are significantly larger for the fictitious instances than for the real instances (17,742 s vs. 673 s for the Approximated Model, and 22,712 s vs. 984 s for the Exact Formulation). Obviously, this is explained by the instance sizes (51 ports for the fictitious instances vs. 30 ports for the real instances). Additionally, as may be expected, the Exact Formulation is more time-consuming than the Approximated Model (984 s vs. 673 s for the real instances, and 22,712 s vs. 17,742 s for the fictitious instances). However, the advantage of the Approximated Model with respect to the Exact Formulation in terms of the computing time decreases for the fictitious instances. On average, the Approximated Model consumes 32% less time than the Exact Formulation for the real instances (673 s vs. 984 s), whereas, for the fictitious instances, the Approximated Model consumes 22% less time than the Exact Formulation (17,742 s vs. 22,712 s). These results are due to the fact that the complexity of the Approximated Model is significantly conditioned by the number of ports per island (in addition to the total number of ports), which, in this case, is larger for the fictitious instances (two, three, and four ports per island for the fictitious islands vs. one, two, and three ports per island for the real instances).
The previous statement holds since Expression (1.2), which is employed to approximate the GTC, relies on a combinatory of existing ports that determine all possible groups of selected ports for each island. Conversely, the complexity of the Exact Formulation is less sensitive to the number of ports per island, since it depends on the combination of ports and demand locations through a linear formulation in Expression (2.2).
The most relevant result observed in
Table 7 is the significant improvement regarding the time required for providing a non-dominated point (NDP*) for the Exact Formulation and NDP-1 for the Approximated Model (Δ3). As discussed in
Section 4, NDP-1 is obtained after the GTC is recomputed and the resulting dominated points are removed from the set. For the real instances, this time reduction is, on average, 13.7%, whereas, for the fictitious instances, the time reduction is, on average, 61.1%. This result is strictly related to a better algorithm performance when the Exact Formulation is employed, where only in 41.96% of the executions (
N) with the Approximated Model provide a non-dominated point (i.e., NDP-1) for the real instances and 35.79% for the fictitious instances. In contrast, when employing the Exact Formulation, the AUGMECON2 algorithm provides, on average, a non-dominated point (NDP-1 = NDP*) in 82.62% of the executions for the real instances, and in 94.94% of the executions for the fictitious instances. As a conclusion, although the quality of the set of non-dominated points obtained when employing the Approximated Model may be considered as acceptable, when observing the results in
Table 5, the number of non-dominated points obtained with similar computing times is significantly lower for the Approximated Model than for the proposed novel Exact Formulation (i.e., on average, the size of NDP-1 is around 45% of the size of NDP*, of which, only 40% finally belongs to NDP*).
5. Conclusions
This paper studies the Bi-Objective Insular Traveling Salesman Problem (BO-InTSP), which minimizes both the maritime transportation costs (MTC) incurred by a barge that visits a set of islands for freight collection (or distribution) purposes, and the ground transportation costs (GTC) incurred by the inhabitants inside the islands when moving the freight. A first formulation is proposed in [
39], which relies on an approximation of the GTC that assumes the existence of a set of centroids instead of using the actual demand locations. Naturally, the use of an approximated GTC may lead to an inappropriate set of solutions, manifesting the need for studying the quality of this approximation. Accordingly, this paper proposes a novel Exact Formulation for the studied problem by modelling the actual demand locations and an explicit assignment of these locations to the visited ports at each island. Subsequently, this research proposes and develops a systematic evaluation approach for comparing the results obtained with both the Exact and the Approximated Formulations, in which the bi-objective nature of the problem is taken into account. A key step of the comparison approach is the exact re-computation of the GTC for each point that is obtained with the Approximation Model. In other words, the Pareto Front obtained with the Approximated Model is projected into the objective space of the Exact Formulation.
Commonly, traditional multi-objective approaches compare the sets of non-dominated points generated by different approximated algorithms (i.e., heuristic) for a single problem or model formulation. In this research, we propose an enhanced analysis that compares between models with different accuracy or aggregation levels, which may be relevant when trying to balance the effort needed to solve a problem either through an Exact Formulation or through an Approximated Model.
The results show significant differences when the set of non-dominated points obtained by the proposed Exact Formulation is compared with those points obtained by the previously studied Approximated Model. First, when the GTC is exactly recomputed for each point obtained by the Approximated Model, many of these solutions are not actually efficient and must be removed from the set, representing approximately 56% of the initial set obtained for the tested instances (51% for the real instances, and 62% for the fictitious instances). Additionally, approximately 40% of the points from the Approximated Pareto Front actually belong to the Exact Pareto Front (51% for the real instances and 29% for the fictitious ones), representing approximately 20% of the Exact Pareto Front (29% for the real instances and 10% for the fictitious ones).
The proposed novel Exact Formulation for the BO-InTSP yields significantly larger sets of non-dominated points in similar computing times when compared to the previous Approximated Model in [
39], which provides an Approximated Pareto Set whose cardinality is approximately 45% of the Exact Pareto Set. Thus, the novel proposed Exact Formulation clearly outperforms the Approximated Model, at least for the set of instances explored in this research. Nevertheless, the quality of the set of points obtained with the previous Approximated Model is acceptable considering the high values of its dominated areas and the low error values of its dominated solutions. Notice that this comparison cannot be performed without the proposed Exact Formulation.
Based on this research, it is worth highlighting the great relevance of the modelling approach for the GTC (in addition to multi-objective solution approaches in order to solve the underlying optimization models), since it significantly affects the quality of the obtained Pareto Front and also the consumed computing resources.
In this paper, the performance of different Pareto Fronts has been compared by developing a systematic approach that relies on a variety of state-of-the-art performance indicators. In this regard, well-known multi-objective methodologies are extended for comparing the Pareto Sets obtained with both Exact and Approximated Formulations for a given optimization problem. Thus, this paper proposes a systematic approach to provide a fair and thorough comparison between the respective sets. Moreover, this approach may be naturally extended to problems with three or more objective functions, denoting a significant contribution of this research.
Alternative approximations or modelling techniques for the GTC inside the islands, such as probabilistic and gravitational-based models, are suggested as a relevant future research. In particular, a stochastic approach should be adopted in case of real demand locations that are of a random nature. Naturally, considering the bi-objective and NP-hard nature of the studied problem, multi-objective heuristics techniques for solving both formulations are highlighted as a significant and necessary future research to be performed, particularly when larger instances are intended to be solved (e.g., MOEA/D, NSGA-II, and Pareto Local Search, and all of their variations and improvements [
75,
76,
77,
78,
79,
80]). Finally, some interesting extensions may be addressed, such as multi-period, multi-vehicle, and multi-depot scenarios.