**1. Introduction**

The Travelling Salesman Problem, known as TSP [1], is one of the most studied statements belonging to the combinatorial optimisation problems. In this, we are given a set of cities and the distances between them with which we must try to find the best route to travel all the towns, minimizing the total length.

Both the TSP and its more well-known derivative, the Vehicle Routing Problem (VRP), are routing problems with a grea<sup>t</sup> impact on most of the issues in our society. For this reason, and because both are NP-Hard [2], the scientific community has continued the search for a better formulation that makes their resolution efficient. However, unfortunately, we cannot use traditional search methods based on differentiability when defining the problem with discrete variables.

One of the models that allows us to write TSP like problems more generically is Quadratic Unconstrained Binary Optimization (QUBO) [3]. QUBO is a framework that enables us to model problems in a quadratic form subject to linear restrictions natively. However, with the help of penalty functions, it is possible to reformulate the tasks of order greater than two and inequality constraints to the QUBO model. Another characteristic that makes QUBO a very important modelling environment is its close connection with the Ising model [4]. The QUBO model constitutes a central problem for adiabatic quantum computing [5], which is solved through a physical process called quantum annealing [6,7].

It is known that the best current QUBO formulation of the TSP requires *N*<sup>2</sup> binary variables Appendix A.1. However, when we try to generalise this formulation to other set of problems such as VRP, we find that polynomial terms of order greater than two appear in these models. As QUBO modelling requires that the function to minimize must be quadratic, it is necessary to decrease the degree of these terms by introducing auxiliary variables, which greatly increases the number of required variables. This is crucial to

**Citation:** Gonzalez-Bermejo, S.; Alonso-Linaje, G.; Atchade-Adelomou, P. GPS: A New TSP Formulation for Its Generalizations Type QUBO. *Mathematics* **2022**, *10*, 416. https://doi.org/10.3390/ math10030416

Academic Editors: Fernando L. Pelayo, Mauro Mezzini and Armin Fügenschuh

Received: 9 December 2021 Accepted: 24 January 2022 Published: 28 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

53

achieving good results through the solvers dedicated to it, especially if it is going to be implemented on a quantum computer.

Quantum annealing is the paradigm of using quantum processes to solve combinatorial optimization problems. This paradigm uses entropy as a target to force exploration, given that any function that smoothes the probability in the search space can have the same purpose according to the adiabatic theorem [8,9].

The D-Wave Quantum Processor Unit (QPU) is considered as a heuristic that minimizes the objective QUBO functions using a physically performed version of quantum annealing; this shows how the number of variables in the QUBO model is related to the number of qubits in a quantum computer [10].

The VRP encompasses two different problems: one in which the distance travelled by vehicles subject to capacity restrictions is minimized [11–14] and another, in which the time taken for cars to complete their routes is minimized. In this article, everything related to the VRP will optimize the time to complete these routes, equivalent to reducing the total of the distances travelled by all vehicles.

As we will explain later in the section dedicated to the VRP, to generalise a QUBO formulation from the TSP to the VRP, the objective function for calculating the distance travelled by the vehicles must be linear so that the formulation discussed above with *N*<sup>2</sup> variables cannot be used.

The most widely used QUBO model of the TSP that can represent distance linearly uses more than *N*<sup>3</sup> variables (native TSP formulation). However, there is indeed a formulation that uses *<sup>N</sup>*<sup>2</sup>*log*2(*N*) variables; this formulation is known as MTZ [15]. But generalising the MTZ formulation for the VRPs that minimises the maximum distances travelled by all the vehicles give us quadratic restrictions and, therefore, a penalty function of the order greater than two.

Our purpose in this work is to present a new QUBO model of the TSP in which the travelled distance calculation is linear, and uses only 3 *N*<sup>2</sup> variables, considerably improving the existing TSP models (both of *N*<sup>3</sup> and *<sup>N</sup>*<sup>2</sup>*log*2(*N*) variables). Furthermore, this new formulation of the TSP we refer to as GPS will be generalized to define an efficient formulation which consider the number of variables of a new VRP formulation. Unfortunately, after reviewing state of the art, we have not found a QUBO formulation of the VRP that minimises the maximum of the distances travelled by all the vehicles, thus we have not been able to make comparisons. When we talk about the number of variables in a model, we will describe it according to its dominant term, so for a model that, for example, requires *N*<sup>3</sup> + 3*N*<sup>2</sup> + 2*N* variables, we will say that it is modelling with *N*<sup>3</sup> variables since it gives us enough information on its scalability.

The document is organised as follows. Section 2 presents our main motivation behind this work. Section 3 shows previous work on the TSP algorithm and its derivatives. Section 4 presents the QUBO framework and its connection to quantum annealing. In Section 5, we recall the native formulation of TSP and the MTZ QUBO model. Section 6 presents our TSP proposal with the improvements in the numbers of variables. A generalisation of our contribution is seen in Section 7 where we propose our VRP into the QUBO model. Section 8 presents the obtained results, and finally, Section 9 concludes the work carried out, and we open ourselves to some lines of the future of the proposed model.

## **2. Motivation**

Our primary motivation is to find a suitable formulation that uses the minimum number of variables; and thus, the minimum number of qubits when implementing said models in quantum computers. This motivation is increased by solving the problem presented in our article [14] in which we desire that the mobile robots minimise the time, which is equivalent to reducing the maximum of the distances travelled by all the vehicles. What implies reducing the number of qubits necessary to implement this model in this era of very few qubits.

#### **3. Related Work**

In the mid-1920s, the following referenced articles [16,17], were the first articles to provide a solution to the minimal spanning tree (MST) problem. Based on these works, the mathematical researcher, Joseph B. Kruskal Jr, applied these solutions to the TSP [18], giving life to some of the first solutions to this problem that will arise during the next decades.

Towards the end of the sixties, the following article [19] offered a compilation and synthesis of the research on the travelling salesman problem. The authors began by defining the problem and presenting several relevant theorems. They also classified and detailed the solution techniques and computational results. Before that, in the mid-1960s, the TSP started to emerge in many different contexts. The following article [20] highlights some applications that began to occur in everyday life, such as vehicle routing or job shop scheduling problems. Other applications such as planning, logistics and the manufacture of electronic circuits became of particular interest.

By making a few small modifications to the original TSP, we could apply it in many fields such as SWP [21] and DNA sequencing [22,23] among others. In this last application, the concept of 'city' would come to be fragments of DNA and the idea of 'distance', a measure of similarity between the pieces of DNA. In many applications, additional restrictions such as resource limits or time windows make the problem considerably difficult.

Computationally, the TSP [24] is an NP-Hard problem within combinatorial optimization. As an NP-Hard problem, it is computationally complex, and heuristics are continually being developed to ge<sup>t</sup> as close as possible to the optimal solution. However, considering the computational complexity nature of these problems, the new approach that quantum computing takes is very promising.

Many works are related to the standard/native TSP or some related variant in a quantum environment within this new approach. For example, the referenced work [25], the authors introduced a different quantum annealing scheme based on a path-integral Monte Carlo process to address the symmetric version of the Travelling Salesman Problem (sTSP). In these other articles [26,27], the authors did a comparative study using the D-Wave platform to evaluate and compare the efficiency of quantum annealing with classical methods for solving standard TSP.

In this reference [28], several comparisons of heuristic techniques were made for some TSP Libraries (TSPLIB) [29] problems, both symmetric and asymmetric, and their results have been compared to other methods such as Self Organizing Maps and Simulated Annealing [30]. In both cases, the local search technique was applied to the results found with Wang's Recurrent Neural Network with "Winner Takes All" that improved the Self Organizing Maps [31]. Other techniques such as the co-adaptive neural network approach to the Euclidean Travelling Salesman Problem [32] equally important.

One of the generalizations of the TSP, known as the VRP, was studied on the D-Wave platform [33,34]. In tasks where routing and planning capacity (time) was required, the TSP with time windows (TSPTW) was generalized [35,36], and has high inherent complexity which presents enormous resolution difficulties. In the following references [21,37–39], the authors modelled combinatorial optimisation problems in which social workers visit their patients at their respective homes and attend to them at a specific time, called Social Workers' Problem (SWP). SWP is a significant problem because additional time constraints allow more realistic scenarios to be modelled than native TSP. The optimal or near-optimal solution for such issues is important in minimising distance and time and environmental problems such as reducing fuel consumption.

The generalization of the TSP that we will use in our work will be the VRP. However, there are other TSP derivatives, such as the Job Shop Scheduling Problem (JSSP) [40] that are not included in the study of this work.

During state of the art of these formulations carried out, we have found several articles [33,34,41] that solve the TSP and VRP (focusing on minimising distance and not time) for annealing computers [7,30,42]. However, the number of variables is still intractable for the current size of quantum computers. For this reason, we propose a new TSP formulation

with a representation of the linear distance that uses only 3*N*<sup>2</sup> variables, which we will use to outperform the current best VRP modelling in terms of the number of required variables. For example, a possible formulation of the VRP uses *N*<sup>3</sup> variables where *N* is the number of cities, thus with only 10 towns, we would go to 1000 necessary variables. In quantum computing, each of these variables can be represented with a qubit, and that is why computers possessing 1000 qubits would be needed to carry out these tasks. However, the gate-based computers that mark this era of quantum computing [43] have around 100 qubits making this task intractable today. The number of qubits is higher for computers based on quantum annealing, reaching 2000 qubits like the D-Wave computer. However, the correspondence between variables and qubits will not be one-to-one due to the architecture of these computers, so that we will have a smaller number of useful qubits. The following reference [44] deals with the topology and graph problem mapping on the D-Wave 2000Q QPU computer in detail.

#### **4. QUBO Model in Quantum Computing**

Quantum computing as a new computational paradigm can help solve a set of complex problems (routing, scheduling, banking problems, etc.) or solve tasks that respond to the law of quantum mechanics. However, before solving a problem, we first need to express it in a mathematical formulation that is largely compatible with the underlying physical hardware. This methodology is also useful for quantum computation. One of the frameworks that allows us to define said mathematical formulation to be solved in a quantum computer is the QUBO.

Adiabatic computation was born from the use of the adiabatic theorem [8,9] to perform the calculations using the tunnel effect to go from the global minimum of a simple Hamiltonian (A Hamiltonian system is a dynamic system governed by Hamilton equations. In physics, these active systems describe the evolution of a physical system, such as an electron in an electromagnetic field.) [45,46] to the global minimum of the problem of interest.

One of the market leaders for this type of computing is D-Wave, which roughly solves the quadratic unconstrained binary optimisation problem (QUBO). The QUBO formulation (1) is suitable for running a D-Wave architecture [47]; however, QUBO can be mapped to the Ising [6] model and thus be used in computers based on quantum gates, for example, IBMQ, Rigetti, Xanadu (strawberryfields), etc.

The problems that D-Wave quantum computers are prepared to solve are those that consist of finding the minimum of a function of the following form:

$$\sum\_{i=1}^{n} b\_{i} \mathbf{x}\_{i} + \sum\_{i=1}^{n} \sum\_{j=1}^{n} q\_{i,j} \mathbf{x}\_{i} \mathbf{x}\_{j} \tag{1}$$

where the variables *xi* ∈ {0, 1} and the coefficients *bi*, *qi*,*j* ∈ R.

We are then, given a problem, we need to model it with the above structure where the variables that form the solution will only take the values 0 or 1. Let us observe that, by taking the variables *xi* the values 0 or 1, it is true that *x*2*i* = *xi*. Therefore, we can group the linear terms with the quadratic terms and express the above equation in matrix format:

$$\mathbf{x}^{\dagger} \mathbf{Q} \mathbf{x}\_{\prime} \tag{2}$$

with *x* ∈ {0, 1}*n* and *Q* ∈ M*n*×*n* which is compactly representing the QUBO formulation. QUBO can be mapped into the Ising model with the change variable: *z* = 1 − 2*x*. Thus, we pass a binary variable (0, 1) to a spin variable (−1, <sup>1</sup>). Therefore, given a formulation of a problem to the QUBO, we can implement it and solve it in computers based on quantum gates, only applying the change of variable mentioned.

#### **5. TSP Formulation**

As discussed in the introduction, before presenting our GPS model in section four, we will analyze the native TSP model and the MTZ that we aim to improve in terms of the number of variables.

#### *5.1. Native Formulation*

In this section, we will recall the formulation of the native TSP [41]. This modelling, which has been defined in [48], despite appearing in a very natural way which facilitates its understanding, requires *N*<sup>3</sup> variables to be implemented.

The variables that appear in this model are the variables *xi*,*j*,*<sup>t</sup>* such that *i*, *j* ∈ {0, . . . , *N* + 1} and *t* ∈ {0, ... , *<sup>N</sup>*}. Let us consider that the variables *xi*,*i*,*<sup>t</sup>* do not exist in this model. The interpretation of the variables *xi*,*j*,*<sup>t</sup>* is simple, since *xi*,*j*,*<sup>t</sup>* = 1 if at instant *t* we traverse the edge that connects the cities *i* and *j*, and *xi*,*j*,*<sup>t</sup>*= 0 for all other cases.

 We can define the objective function of the native (Native in the sense of general, the most used) TSP [41] as:

$$\sum\_{\mu=0}^{N+1} \sum\_{\upsilon=0}^{N+1} \sum\_{t=0}^{N} \mathcal{X}\_{\mu,\upsilon,t} d\_{\mu,\upsilon}. \tag{3}$$

where *du*,*<sup>v</sup>* represents the distance between nodes *u* and *v*. This objective function is subject to a series of restrictions:

• Constraint 1. The salesman must leave each city once.

$$\text{For each } u \in \{0, \dots, N\} \colon \sum\_{v=1}^{N+1} \sum\_{t=0}^{N} x\_{u,v,t} = 1. \tag{4}$$

• Constraint 2. Each city must be reached once.

$$\text{For each } v \in \{1, \ldots, N+1\} \colon \sum\_{\mathbf{u}=0}^{N} \sum\_{t=0}^{N} \mathbf{x}\_{\mathbf{u}, \mathbf{v}, t} = 1. \tag{5}$$

	- **–** Imposing that once he leaves a city he cannot return to it. For each *u* ∈ {1, . . . , *N* + <sup>1</sup>}:

$$\sum\_{\nu=0}^{N+1} \sum\_{t=0}^{N} \sum\_{\nu=0}^{N+1} \sum\_{j=t+1}^{N} \chi\_{u,\nu,t} \chi\_{\overline{u},u,j} = 0. \tag{6}$$

**–** Imposing that once he arrives in a city, he must leave it. For each *t* ∈ {0, . . . , *N* − <sup>1</sup>}, *u*, *v* ∈ {0, . . . , *<sup>N</sup>*}:

$$\mathbf{x}\_{\mathfrak{u},\upsilon,t}(1-\sum\_{\upsilon=1}^{N+1}\mathbf{x}\_{\upsilon,\upsilon,t+1})=0.\tag{7}$$

This formulation requires *N*<sup>3</sup> variables. Next, we will analyse another model used to define the TSP which is less commonly used in quantum *annealing* articles.

#### *5.2. MTZ Formulation*

Recalling the idea of this formulation is to consider the variables *xi*,*j* = 1 if the edge that connects the cities *i* and *j* appears in the solution path, where *xi*,*j* = 0 for all other cases. Once we have these variables, we can establish order on the route by employing a set of variables that will represent the moment the salesman arrives at that city (the variable *ui*

expressed in binary format, will take the integer value *t* if the city *i* is reached in the *t*th position.). This model requires *<sup>N</sup>*<sup>2</sup>*log*2(*N*), greatly improving the number of variables in the general formulation. However, when implemented using *annealing* it presents surprisingly inaccurate results. This is because the *annealing* algorithm gets stuck trying to minimise the part of the objective function generated by the sub-tour's constraint [15], since the representation of integers in their binary format has the disadvantage that close numbers such as 2*n* − 1 and 2*n* differ by a large number of qubits, so from the *annealing* they are perceived as very different solutions.

Once the two most common QUBO models of the TSP have been presented, let us analyze the formulation with which we improve the number of variables of the previous two.

#### **6. GPS Formulation**

Now we are starting to present our work. To develop this model, we take the variables *xi*,*j*,*<sup>r</sup>* with *i*, *j* ∈ {0, ... , *N* + 1} and *r* ∈ {0, 1, <sup>2</sup>}. In all the modelling, the variables *xi*,*j*,*<sup>r</sup>* such that *i* = *j* are not considered. We work with directional edges, that is, if in the model the edge (*i*, *j*) appears, we understand that first we must go through node *i* and immediately after that we go to *j*. Let us analyse what each variable represents:


Let us, therefore, see some examples (Figure 1) in which these variables do not take the value zero:

**Figure 1.** Example of a TSP solution with six different cities. It begins at node 0, and the arrows indicate the order in which the towns will be visited.

In this particular case, *<sup>x</sup>*4,5,0 = 0 and *<sup>x</sup>*4,5,2 = 0 since the edge (4, 5) does appear in the solution. On the other hand *<sup>x</sup>*4,5,1 will also be 0 because although edge (4, 5) does appear in the graph, node 5 will be visited before node 4.

Let us, therefore, see examples in which these variables do not take the value zero:


From the definition of our variables, we can define the distance travelled through the following objective function as:

$$\sum\_{i=0}^{N+1} \sum\_{j=0}^{N+1} d\_{i,j} \mathbf{x}\_{i,j,1} \,. \tag{8}$$

The constraints that must be met are:

• Constraint 1: For each *i*, *j* one and only one of the 3 cases of *r* must be given, so

$$\text{For all } i, j \colon \sum\_{r=0}^{2} x\_{i, j, r} = 1. \tag{9}$$

• Constraint 2: Each node must be exited once.

$$\text{For each } i \in \{0, \dots, N\}; \sum\_{j=0}^{N+1} x\_{i,j,1} = 1. \tag{10}$$

• Constraint 3: Each node must be reached once.

$$\text{For each } j \in \{1, \dots, N+1\} \colon \sum\_{i=0}^{N} x\_{i,j,1} = 1. \tag{11}$$

• Constraint 4: If node *i* is reached before *j*, then node *j* is reached after *i*, so, for all *i*, *j* ∈ {0, . . . , *N* + 1} such that *i* = *j*:

$$
\pi\_{i,j,2} = 1 - \pi\_{j,i,2}.\tag{12}
$$

It would also have to be specified for *r* = 0 and *r* = 1, however this restriction is sufficient since by (9) it is implicit.

• Constraint 5: If node *i* is reached before node *j* and node *j* is reached before node *k*, then node *i* must be reached before *k*. This condition will prevent the route from returning to a node from which it had already exited, thus preventing cycles from forming. We then arrive at the penalty function Equation (13).

$$\sum\_{i=1}^{N} \sum\_{j=1}^{N} \sum\_{k=1}^{N} (\mathbf{x}\_{j,i,2}\mathbf{x}\_{k,j,2} - \mathbf{x}\_{j,i,2}\mathbf{x}\_{k,i,2} - \mathbf{x}\_{k,j,2}\mathbf{x}\_{k,i,2} + \mathbf{x}\_{k,i,2}).\tag{13}$$

With only the cases in which *i* = *j*, *i* = *k* and *j* = *k* will be taken in the summation and in the annex (Appendix B) we will provide the approach followed to arrive at it.

The following is deduced from the Equation (13). We have *xi*,*j*,<sup>2</sup> = 0 if *i* is reached before *j* and *xi*,*j*,<sup>2</sup> = 1 in the case where *j* is reached before *i*. Thus, with the previous equation we penalise these following cases in which *xi*,*j*,<sup>2</sup> = 0, *xj*,*k*,<sup>2</sup> = 0 and *xi*,*k*,<sup>2</sup> = 1 and *xi*,*j*,<sup>2</sup> = 1, *xj*,*k*,<sup>2</sup> = 1 and *xi*,*k*,<sup>2</sup> = 0 which lead to cases in which it would be forming cycles (for these two situations the value of the parentheses is 1 and for the rest 0). In (27) we have a similar situation for our VRP formulation, where we offer more details. For this condition, we must have directly constructed a penalty function that avoids erroneous cases without first posing linear conditions through which to generate its corresponding penalty function.

Formulated in this way we have managed to reduce the number of variables required from *N*<sup>2</sup> log2 *N* to 3*N*2, achieving very noticeable reductions when working with large problems. Once we have this formulation, let us see how we can generalise it to the new VRP formulation.

#### **7. New VRP Formulation**

This section develops our VRP into the QUBO model using the GPS formulation.

As discussed in the introduction, this model is optimal concerning the number of binary variables used. However, this generalisation does not appear as naturally as expected because it requires a delicate step to ge<sup>t</sup> the constraints of the Equation (27). To do this, we will detail each step and explain each of the constraints step by step.

*Original Formulation* 5*N*<sup>2</sup>*Q*

For this new VRP, we will consider that *N* is the number of cities and *Q* is the number of available vehicles. We first present the variables that will form the problem. We then take the following set of variables.

$$\text{in } \mathbf{x}\_{i,j,r,q} \text{ with } i, j \in \{0, \dots, N+1\}, r \in \{0, 1, 2, 3, 4\} \text{ and } q \in \{1, \dots, Q\} \tag{14}$$

In all the modelling, the variables *xi*,*j*,*<sup>r</sup>* such that *i* = *j* are not considered. The variables *i*, *j* refer to the cities must travel to, and the variable *q* refers to the vehicle. The nodes 0 and *N* + 1 correspond to the starting and ending points. Note that they may be the same node but we will separate them for convenience in the formulation. The values *di*,*<sup>j</sup>* with *i*, *j* ∈ {0, ... , *N* + 1} correspond to the distance between node *i* and *j*. Let us dive into the interpretation of each variable:


Even if no vehicle passes through the objects *i* and *j*, the formulation must establish an order between them. However, this restriction does not make the modelling meaningless, since we can assume that if the vehicles are ordered in the order of {1, ... , *Q*}, then *i* will be reached before *j* if the vehicle that passes through node *i* has a lower number than the one that passes through node *j*. Once the interpretation of each variable is explained, let us analyse the constraints that must be met.

• Constraint 1: For each *i*, *j*, *q*, one and only one of the possibilities must be met for *r*, so:

$$\text{For all } i, j, q; \sum\_{r=0}^{4} \mathbf{x}\_{i, j, r, q} = 1,\tag{15}$$

• Constraint 2: Each vehicle has to fulfill that it leaves the starting position. For this situation, we are going to impose that:

$$\text{For all } q; \sum\_{j=1}^{N+1} x\_{0,j,1,q} = 1, \text{} \tag{16}$$

No vehicle can return to the starting position from a city, so:

$$\text{For all } q \colon \sum\_{i=0}^{N+1} \mathbf{x}\_{i,0,1,q} = 0, \text{} \tag{17}$$

• Constraint 3: Every vehicle must finish in the final position. For this, it must be fulfilled that:

$$\text{For all } q \colon \sum\_{i=0}^{N} x\_{i,N+1,1,q} = 1, \tag{18}$$

No vehicle can leave the final position. We then have that:

$$\text{For all } q \colon \sum\_{j=0}^{N+1} x\_{N+1,j,1,q} = 0. \tag{19}$$

Vehicles that do not travel on any road will meet all constraints when taking the following condition:

$$\mathbf{x}\_{0,N+1,1,f} = 1.$$

• Constraint 4: The vehicle must leave once and only once from each city, then:

$$\text{For each } i \in \{1, \dots, N\}: \sum\_{q=1}^{Q} \sum\_{j=1}^{N+1} x\_{i,j,1,q} = 1. \tag{20}$$

• Constraint 5: The vehicle must arrive once and only once to each city, then:

$$\text{For each } j \in \{1, \dots, N\}: \sum\_{q=1}^{\underline{Q}} \sum\_{i=0}^{N} x\_{i,j,1,q} = 1. \tag{21}$$

• Constraint 6: The city *i* is reached before the city *j* does not depend on each vehicle. Therefore, for all the vehicles that either arrive at city *i* earlier than *j*, or arrive at city *j* earlier than *i*. Introducing the auxiliary variables *ai*,*j*, we have the following constraint. For all *i*, *j* ∈ {1, . . . , *<sup>N</sup>*}:

$$\sum\_{q=1}^{Q} x\_{i,j,0,q} + x\_{i,j,1,q} + x\_{i,j,3,q} = a\_{i,j}Q.\tag{22}$$

It will then be true that for each *i*, *j* or *ai*,*j* = 1, which means that the city *i* is reached earlier than the city *j* and therefore for each *q* we will have *xi*,*j*,*r*,*<sup>q</sup>* = 1 for any value of the *r* in which *i* is reached before *j*, or *ai*,*j* = 0, and we will have *xi*,*j*,*r*,*<sup>q</sup>* = 0 for all the vehicles and for values *r* where *i* is reached before *j*.

• Constraint 7: If the vehicle *q* arrives in the city *j*, then the vehicle *q* must leave the city *j*. For this we impose the constraint that for *i* ∈ {0, . . . , *<sup>N</sup>*}, *j* ∈ {1, . . . , *N*} and *q* ∈ {1, . . . , *Q*}:

$$
\chi\_{i,j,1,\emptyset}(1 - \sum\_{k=1}^{N+1} x\_{j,k,1,\emptyset}) = 0.\tag{23}
$$

Let us now impose the conditions that make vehicles run on a tour.

• Constraint 8: It must be fulfilled that either the vehicle pass through the city *i* before the *j* or arrive before to the city *j* rather than the city *i*. Therefore, it must be verified that, for *i* ∈ {0, . . . , *<sup>N</sup>*}, *j* ∈ {1, . . . , *N*} and *q* ∈ {1, . . . , *Q*}:

$$
\mathbf{x}\_{i,j,0,q} + \mathbf{x}\_{i,j,1,q} + \mathbf{x}\_{i,j,3,q} = 1 - (\mathbf{x}\_{j,i,0,q} + \mathbf{x}\_{j,i,1,q} + \mathbf{x}\_{j,i,3,q}).\tag{24}$$

• Constraint 9: If city *i* is reached before *j* and city *j* is reached before city *k*, then city *i* must be reached before city *k*. This condition will prevent the vehicle from returning to a city it has already passed through and therefore prevents a cycle from forming. To introduce this constraint, we will directly calculate a penalty function worth 0 in the correct cases and 1 in those that are not. To facilitate the understanding of the penalty function, we are going to take, for *i*, *j*, *k*, *q*, the following variables:

$$\begin{aligned} a\_{i,j} &= \mathbf{x}\_{i,j,0,1} + \mathbf{x}\_{i,j,1,1} + \mathbf{x}\_{i,j,3,1} \\ a\_{j,k} &= \mathbf{x}\_{j,k,0,1} + \mathbf{x}\_{j,k,1,1} + \mathbf{x}\_{j,k,3,1} \\ a\_{i,k} &= \mathbf{x}\_{i,k,0,1} + \mathbf{x}\_{i,k,1,1} + \mathbf{x}\_{i,k,3,1} \end{aligned} \tag{25}$$

Remember that it is not necessary to introduce these conditions because the constraint (22) establishes the correct values of the variables *ai*,*j*. Therefore, *ai*,*j* = 1 means that the city *i* is reached before the city *j* and the same with *j* and *k*. Also, it is very important to remember that due to the same constraint (22), we can take any of the vehicles as a reference. In this case, we have taken the first vehicle as a reference.

In this way, fixed *i*, *j*, *k*, we have the 3 variables *ai*,*j*, *aj*,*k*, *ai*,*k*. Remember that *ai*,*j*, *aj*,*k*, *ai*,*<sup>k</sup>* only take the values 0 or 1. Also, let us note that the cases that lead to values of the variables for which cycles can be formed and that we must discard are (*ai*,*j*, *aj*,*k*, *ai*,*<sup>k</sup>*)=(0, 0, 1) and (*ai*,*j*, *aj*,*k*, *ai*,*<sup>k</sup>*)=(1, 1, <sup>0</sup>).

In the case (0, 0, 1) we would have that the city *j* is reached after the *i*, the *k* after the *j*, and ye<sup>t</sup> the city *k* is reached rather than *i*, which is absurd. The case (1, 1, 0) cannot be given either, since it reaches *i* before *j* and *j* before *k*, so it cannot be that we also reach *k* before *i*. We therefore must construct a penalty function so that for *f*(*ai*,*j*, *aj*,*k*, *ai*,*<sup>k</sup>*) it holds that *f*(0, 0, 1) > 0, *f*(1, 1, 0) > 0 y *f*(*ai*,*j*, *aj*,*k*, *ai*,*<sup>k</sup>*) = 0 for all other cases. A function that satisfies these conditions is from the Equation (26).

$$f(a\_{i,j}, a\_{j,k}, a\_{i,k}) := a\_{i,j}a\_{j,k} - a\_{i,j}a\_{i,k} - a\_{j,k}a\_{i,k} + a\_{i,k}^2. \tag{26}$$

then, adding to the cost function the Equation (27)

$$\lambda \sum\_{i=1}^{N} \sum\_{j=1}^{N} \sum\_{k=1}^{N} (a\_{i,j}a\_{j,k} - a\_{i,j}a\_{i,k} - a\_{j,k}a\_{i,k} + a\_{i,k}^2)\_{,} \tag{27}$$

we will have that the best solutions will be those that comply with this constraint. • Constraint 10: The objective we seek is to minimise vehicle travel time. What we could do is see how long each vehicle takes to complete the route and try to minimise as much of the time as possible. However, this function soon becomes complex so we have decided to develop a different idea that simplifies the process and smoothes the objective function. If we impose the condition that all vehicles travel less distance than the distance travelled by vehicle number 1, we will have that minimising the maximum of the distances will be equivalent to minimising the distance travelled by the first vehicle. We then have the following condition. For each *q* ∈ {2, . . . , *Q*}:

$$\sum\_{i=0}^{N+1} \sum\_{j=0}^{N+1} d\_{i,j} \mathbf{x}\_{i,j,1,q} \le \sum\_{i=0}^{N+1} \sum\_{j=0}^{N+1} d\_{i,j} \mathbf{x}\_{i,j,1,1} \tag{28}$$

We transform this inequality into equality by taking once again *Dmax* := ∑*ni*=<sup>0</sup> max*j*{*di*,*j*} and the variables *bh*,*<sup>q</sup>* (the variables *bh*,*<sup>q</sup>* are like the sub tour's one in the MTZ slack variables and they are in their binary expression) in:

$$\sum\_{i=0}^{N+1} \sum\_{j=0}^{N+1} d\_{i,j} \mathbf{x}\_{i,j,1,q} + \sum\_{h=0}^{h\_{\text{max}}} 2^h b\_{h,q} - \sum\_{i=0}^{N+1} \sum\_{j=0}^{N+1} d\_{i,j} \mathbf{x}\_{i,j,1,1} = 0. \tag{29}$$

Under these conditions the function to be minimized corresponds to:

$$\sum\_{i=0}^{N+1} \sum\_{j=0}^{N+1} d\_{i,j} \mathbf{x}\_{i,j,1,1} \,. \tag{30}$$

This condition has the disadvantage that we are eliminating solutions where it is another vehicle that travels the longest distance. Let us explore how to avoid this problem and ge<sup>t</sup> more flexibility in the model to make it easier for the Quantum Annealing to find the optimum one. We can establish an auxiliary variable *D* and we set that the distance travelled by each vehicle must be less than this variable, that is to say:

$$\sum\_{i=0}^{N+1} \sum\_{j=0}^{N+1} d\_{i,j} \mathbf{x}\_{i,j,1,q} \le D \text{ , for all } q \in \{1, \dots, Q\}. \tag{31}$$

The variable *D* is an integer, so we must treat it in some way in order to include it in the model. As we explained in the introduction of the section dedicated to the formulation of the MTZ model Section 5.2, it is convenient to try to avoid the binary representation of integer variables. To do so, we can express *D* as a combination of the distances between edges by taking *D* = ∑ *N*+1 *i*=0 ∑ *N*+1 *j*=0 *xi*,*jbi*,*j*. Thus after imposing the constraint (31) we have that the function to minimize is *D*.

Thanks to this modelling of the new VRP we have been able to reduce the number of variables required to the order of 5 *<sup>N</sup>*<sup>2</sup>*Q*. However, we have managed to reduce it even further to 3 *<sup>N</sup>*<sup>2</sup>*Q*, which is detailed in Appendix A.2. However, we have preferred to present this other model due to its easy understanding.
