*4.2. Heuristic Algorithms*

Heuristic algorithms can be simply described as a set of rules to follow. A rule can be as simple as taking whatever that is the best in that particular instance (see Algorithm 1). Numbers of variations of heuristic algorithms were used in solving both MCLP and carsharing fleet management. Church and ReVelle [5] first proposed a heuristic algorithm, which adds one facility location at a time. The latter has also been applied to solve shared fleet placement problems [20,21] and is used in the comparative study in this work as well.


This heuristic algorithm relies on an iterative search. The algorithm starts with an empty list of locations. Then, in each iteration, each location is evaluated according to the fitness function. The algorithm then adds the location with the highest fitness score in the list and that selected location is removed from the location pool, in order not to be re-selected in the subsequent iteration. This algorithm has a complexity of *O*(*sn*) where *s* is the number of (desired) stations and *n* is the number of street nodes. The pseudocode of the algorithm is shown in Algorithm 2.


Even though the algorithm was first introduced to solve single objective optimization problems, there are several ways to adapt it to solve multi-objective problems, e.g., weighted sum, and -constraint. These variations can be extended further to yield an approximated front as a result. Using a weighted sum approach, each objective fitness is normalized as shown in Equation (1).

$$F\_n = \frac{F\_a - LB}{lIB - LB} \tag{1}$$

where *Fn* is the normalised fitness and *Fa* is the actual fitness (e.g., coverage, walking distance, or bi-objective) before normalization. *UB* is the upper bound (the highest fitness) and *LB* is the lower bound (the lowest fitness). With this weighted sum approach, the priority of each objective can be adjusted as shown in Equation (2). The iterative search can be launched multiple times with different weight ratios for each optimization objective. Let us for instance assume that there is the following list of weights: [(0.1,0.8,0.1), (0.33, 0.33, 0.33), (0.1, 0.1, 0.8)]. With these weights, up to three solutions can be reached.

$$F\_t = \sum\_{i=1}^{m} w\_i F\_{\bar{t}} \tag{2}$$

where *Ft* is the total fitness score from the weighted sum and *wi* is the weight associated to the objective *i*. Finally, *Fi* is the normalized fitness score of objective *i*. The granularity of weights can also be adjusted at the cost of a higher computation cost since more combinations of weights require more executions of the algorithm. Once a certain amount of solutions (decided by the decision maker) is collected, an approximated front can be constructed.

In this work, we study six variants of heuristic algorithms based on the greedy and iterative versions. They are;


The purpose is to establish a baseline for comparison and to evaluate the performance of the state-of-the-art heuristic algorithms against other algorithms.

#### *4.3. Non-Dominated Sorting Genetic Algorithm-II (NSGA-II)*

In the Evolutionary Algorithm (EA) approach to solving a problem, a well known concept called 'metaheuristic' can be concisely defined as a higher-level procedure or strategy for a partial search. Hence, a global optimum is not guaranteed, but it generally yields acceptable results. Metaheuristics usually contain a stochastic process, which make

them non-deterministic. NSGA-II is a metaheuristic optimization algorithm that is based on Pareto-dominance [45]. Pareto-dominance is defined as follows:

$$\begin{aligned} z \succ z' &\Leftrightarrow \forall i \in \{1, 2, \ldots, n\}, z\_i \leq z'\_i \cup \{1, 2, \ldots, n\}, \\ \exists j &\in \{1, 2, \ldots, n\}, z\_j < z'\_j \text{ ./} \end{aligned}$$

where *z* and *z* are vectors of objectives in *Z* and *z z* means *z* dominates *<sup>z</sup>*. If the selected solutions are both non-dominated, one of the parent solutions is selected at uniformly random. Other metaheuristic algorithms in this category are Simple Evolution Algorithm for multi-objective Optimization (SEAMO) [60], Strength Pareto Evolutionary Algorithm II (SPEA2) [46], and Pareto Envelope-based Selection Algorithm (PESA) [61]. NSGA-II was shown to be more efficient than the previously mentioned algorithms in a GSM antenna location problem (a variant of FLP) [17].

Among these algorithms, NSGA-II is renowned due to its numerous proven applications. It is largely based on the Genetic Algorithm (GA), starting from population initialization, selection of parents, crossover, and mutation to obtain a new population of solutions. Individuals in both the parent and offspring populations are sorted according to their rank, and the best solutions are chosen to create a new population. If individuals have the same rank, a density estimation based on the crowding distance to the surrounding individuals of the same rank is used. A new reference-point-based NSGA-II called NSGA-III is proposed, with the intention on solving problems with three or more objectives [62]. Hence, in this work, NSGA-II is selected as the problem is bi-objective. The pseudocode of NSGA-II is shown in Algorithm 3.


Next, the solution encoding population initialization and evolutionary operators employed in NSGA-II to solve the fleet placement problem are presented in detail. More details on NSGA-II's specific operators can be found in [45].

**Solution Encoding:** A solution is a string of fleet locations denoted by 'id numbers' from Openstreetmap. An example is shown in Figure 3, blue dots are candidate fleet locations on the streets. The shown numbers are ids from Openstreetmap. An example solution contains five locations (location number 1 to 5). Naturally, it is one of many possibilities in this example area. This encoding is more suitable than the binary encoding due to the size of the problem instances, which can be exceedingly large (more than 100,000 street nodes is possible in reality). The encoding contains no order and swapping genes in the chromosome does not change the fitness of the solution.

**Figure 3.** Example of a chromosome representing a possible solution. Street nodes are represented by IDs (numbers) and a solution is consisted of these IDs.

**Population Initialization:** A solution is initialized by randomly choosing (based on a uniform distribution law) street nodes from all street nodes in a problem instance. Note that each street node in the algorithm may at most be selected once in each solution during the initialization.

**Crossover:** A two-point crossover is adopted in this work. The process randomly selects two points in both solutions as starting and ending points for exchanging portions and recombines these portions to create two new solutions as shown in Figure 4. Although, this crossover process may introduce solutions with redundant placements, due to fitness score calculation (e.g., redundant locations lead to lower coverage) those solutions will be deemed as low quality and hence be eliminated in the next generation of selection. This reduces the execution time of the algorithm.

**Figure 4.** Two-point crossover process.

**Mutation:** The uniform mutation operates a replacement coming from the pool of all vehicle locations defined by street node IDs. Figure 5 illustrates the mutation operator where *Sample* is a function to randomly pick one location from the pool and 1, 2, 3,..., *n* denotes all street node IDs. However, if the replacement already exists in the current solution, the process is repeated until a valid replacement is found. As mentioned before in crossover, redundant solutions will be eliminated by the process.

**Figure 5.** Uniform mutation process.

It is worth noting that evolutionary operators inherit known shortcomings such as local optimal and plateau. Evolutionary operators in this work are no exception. Several combinations of crossover and mutations have been experimented and none resulted in guaranteed superiority over all others. This work focuses on the implementation of the FPP model rather than finding suitable parameters and operators for the metaheuristic algorithm (i.e., NSGA-II) under consideration. Therefore, it may be possible that adjustment and tuning of evolutionary operators for their suitability may be necessary for its application in some instances that share too few characteristics with instances in this work.

#### *4.4. Multi-Objective Performance Metrics*

Several multi-objective quality metrics exist, which can be categorized based on the quality aspect that they assess, i.e., convergence (distance to the optima), diversity, and both convergence and diversity altogether [33]. In this work, we consider the three commonly used three indicators which measure the complementary aspects of the yielded solutions, namely, Inverted Generational Distance (IGD) [63], Spread [45] and Hypervolume (HV) [33]. Since the exact Pareto front can only be computed on small size instances, for large instances a reference front is obtained by combining the approximated Pareto fronts resulting from the heuristic and metaheuristic algorithms.

**Inverted Generational Distance (IGD)** [63]: This metric measures the distance between the obtained approximated solutions and the Pareto front. IGD is defined in Equation (3), where *di* is the Euclidean distance from point *i* in the approximated front to the closest one in the Pareto front, and *n* is the number of solutions in the Pareto front. *IGD* = 0 indicates that the evaluated Pareto front consists only of solutions from the optimal Pareto front.

$$IGD = \frac{\sqrt{\sum\_{i=1}^{k} d\_i^2}}{n} \ . \tag{3}$$

**Spread** [45]: This metric measures the diversity of the obtained approximated front and is defined as:

$$\mathbf{m} = \frac{d\_f + d\_l + \sum\_{i=1}^{N-1} \left| d\_i - \bar{d} \right|}{d\_f + d\_l + (N-1)\bar{d}} \; \; \; \tag{4}$$

where *di* is the Euclidean distance between consecutive solutions, ¯ *d* is the mean of these distances, and *df* and *dl* are the Euclidean distances to the *extreme* solutions of the Pareto front. A zero value indicates an ideal distribution, i.e., pointing out a perfect spread of the solutions in the evaluated set of solutions.

$$HV = \text{volume} \left( \bigcup\_{i=1}^{|Q|} v\_i \right) \tag{5}$$

**Hypervolume** [64]: This metric assesses both convergence and diversity of a Pareto front. It calculates the m-dimensional volume (in the objective space) covered by the solutions in the evaluated Pareto front *Q* and a dominated reference point *W*. For each solution *i* ∈ *Q*, a hypercube *vi* is constructed with the reference point *W* and the solution *i* as the diagonal corners of the hypercube. The hypervolume is calculated as the union of all hypercubes, as shown in Equation (5). The higher the hypervolume the better the algorithm performed.

#### **5. Execution of the Proposed Model**

This section is composed of three parts. The first two parts present the process of building graphs from real street maps and the demographic data integration into the graphs. The third part describes the parameters and environment the execution.

#### *5.1. Building Graph Instances*

Most of the public street maps are usually not available in a graph format. To build graphs, street map data and footprints of the buildings are obtained from "OpenStreetMap" [65] using "OSMnx" [66] and "NetworkX" [67] tools. Then, a simplified graph is created, it combines a street map and buildings together with respect to the real position of the street nodes and buildings in *V*. Each edge between a building and its nearest street node is built using Open Source Routing Machine (OSRM) [54]. Edges are assigned a weight that is proportional to the walking distance extracted from OSRM. An example of a resulting instance is shown in Figure 6. Gray edges represent streets. The nodes on the streets represent street nodes, while the nodes outside the streets represent buildings in the area. Input data and parameters for all instances are summarized in Table 4.

**Figure 6.** LU2 instance. Blue nodes represent possible locations for carsharing stations. Yellow nodes represent residential buildings.

#### *5.2. Problem Instances*

The performance of PolySCIP, heuristic algorithms (simple and iterative) and NSGA-II algorithms is compared on three real instances; LU1, LU2, and MU1 as shown in Table 4. LU1 is a small portion of Luxembourg city containing 63 street nodes and 47 buildings. LU2 consists of 2 districts of Luxembourg city that contain a total of 2026 street nodes and 1063 buildings. MU1 is an inner part of the city of Munich, which contains 16,075 street nodes and 21,816 buildings.

For the LU1 instance, the radius of the carsharing station (*r*) is set to zero and the maximum walking distance for users (*w*) is 150 m for all evaluated algorithms. The number of stations is set to four. The LU1 setup is ideal for observing and understanding the operational details of the evaluated algorithms.

The size of the LU2 graph reflects two city districts and is the smallest portion for real-world planning (deemed by the business expert). The maximum acceptable walking distance depends on the selected mode of transportation. Daniels and Mulley [53] show that people are willing to walk significantly longer to take a train than a bus as long as they deem it worthy. For carsharing, a realistic walking-to-the-car distance ( *w*) is around 500 m [68]. The population is selected to match the real numbers reported by the city of Luxembourg [69] and is distributed uniformly in residential buildings. The area covered by a carsharing station has a radius (*r*) of 100 m, while the number of stations is set to 10.

The MU1 instance aims to evaluate the performance of algorithms on a city-wide scale. The population in each district is taken from municipalities and is distributed uniformly in residential buildings . We estimated the number of carsharing users at 2% of the total population. The station radius (*r*) and walking distance ( *w*) are the same as in the LU2 instance, but the number of stations is increased to 100 stations. In all three instances, carsharing users are distributed uniformly among available buildings on the map. All of these settings have been defined by domain experts based on the study and real deployment plan.

**Table 4.** Problem instances.


#### *5.3. Algorithms Implementation and Parameters*

PolySCIP (version 4.0) and heuristic algorithms are deterministic and do not require any parameters apart from those shown in Table 4. As NSGA-II is evolutionary based, it requires initial population and parameters for GA operators. Table 4 reveals the values adopted for these three instances. NSGA-II is executed for 30 times due to its stochastic nature. We develop heuristic algorithms and NSGA-II using Python 3.7 and DEAP (a library for metaheuristic algorithms) [70]. It is a common practice to include seed solutions in the initial population of metaheuristic algorithms. In this work, these are injected into the initial population as a seed solution from coverage-focused and distance-focused iterative heuristic algorithms. The configurations for NSGA-II are mentioned in Table 5.

**Table 5.** NSGA-II configuration parameters.

