*3.2. Drone Path Planning Model Based on Risk Cost and Service Benefit*

The flight path of drones performing logistics services needs to mitigate the path risk cost based on ensuring service completion. Therefore, the objectives of logistics drone path planning include risk mitigation and customer service. The integrated risk cost quantification model established in the previous section can be used for risk mitigation. Customers' locations in cities often overlap with risk factors, such as buildings, crowds, and roads. This would cause drone service paths to pass through risk areas, so it is necessary to balance the path risk cost and service benefit. Figure 7 depicts the impact of considering risk cost mitigation and service benefits on the path planning results. The solid white arrows indicate the shortest path, the white dashed arrow is the path considering risk cost mitigation, and the white dotted line indicates the path that balances risk cost and service benefit, where the path is changed to fulfil customer needs based on the most risk cost-effective path.

**Figure 7.** The path planning based on risk and customer.

In this section, our primary work is to establish a multi-drone path planning method to guide drones to find a path with the highest service benefit and lowest risk cost under the constraints of flight performance indicators such as energy consumption and step length. Furthermore, a global search strategy is proposed to solve the above paths.

#### 3.2.1. Service Benefits Modelling

Assume that each customer has an initial requirement *C*<sup>0</sup> *demand*−*<sup>j</sup>* that needs to be handled by drone. We also assume that the drone can only serve a certain distance from the customer's location. Therefore, for a customer *j*, the range that can be served is denoted as *s*(*pj*, *R*), where *pj* is the location of the customer *j* and *R* is the radius of the acceptable service range. Service starts when the drones enter the service range of the customer *j*. Each drone has a constant service speed *<sup>τ</sup>*. The remaining demand *Cdemand*−*<sup>j</sup>* of the customer served by *k* drones simultaneously over time Δ*t* is shown in Equation (12)

$$\mathbf{C}\_{demand-j}^{t} = \mathbf{C}\_{demand-j}^{t+\Delta t} - \tau k \Delta t \tag{12}$$

Assuming a nonlinear relationship between customer residual demand *Cdemand*−*<sup>j</sup>* and service revenue *Cb*(*Cdemand*−*j*), this paper uses a sigmoid-like function to improve performance, as shown in Equation (13)

$$\mathbb{C}\_b(\mathbb{C}\_{demand-j}) = 1 - \exp\left[-\frac{(\mathbb{C}\_{demand-j})^\chi}{\mathbb{C}\_{demand-j} + \psi}\right] \tag{13}$$

where *<sup>χ</sup>* and *<sup>ψ</sup>* are control parameters. For each customer, the service revenue *Cb*(*Cdemand*−*j*) decreases rapidly with its remaining demand *Cdemand*−*j*. It is guaranteed that serving the customer with the highest remaining demand generates the greatest revenue, thus increasing global customer service completion.

#### 3.2.2. Energy Consumption Modelling of Drones

Assuming that the lifting and lowering process of the drone is ignored and only straight-line flight is considered, the energy consumed for moving a distance *d* at a constant speed *v* is shown in Equation (14),

$$E\_{\upsilon} = P(w)\frac{d}{\upsilon},\ \Delta E\_{\upsilon} = P(w)\Delta t\tag{14}$$

where *P*(*w*) is the power of the drone moving at a constant speed *v*. For the *n*-rotor drone, its power is shown in Equation (15),

$$P(w) = (\mathcal{W} + w)^{\frac{3}{2}} \sqrt{\frac{\mathcal{g}^3}{2\rho\_A \mathcal{G}^n}} \tag{15}$$

where *W* is the self-weight of the drone, *w* is the weight of the load carried by the drone, *ρ<sup>A</sup>* is the fluid density of air, *ς* is the area of the rotating blades, and *g* is the acceleration of gravity. The total power of the drone is shown in Equation (16),

$$E\_{total} = \eta \mathbf{C} V\_n \tag{16}$$

where *η* is the energy conversion efficiency, *C* is the capacity of the cell, and *Vn* is the nominal voltage of the *n* cells.

The drone departs with an empty load. As the drone services the customer, the drone's load increases while the customer's remaining demand decreases. After the current customer is served, the drone maintains the current load until it starts serving the next customer.

Assuming that the demand is proportional to the load and the scale factor is *ε*, then for a drone *i* serving *x* customers at the same time, the load varies with time, as shown in Equation (17),

$$w\_i^{t + \Delta t} = w\_i^t + \tau \ge \Delta t \tag{17}$$

#### 3.2.3. Global Path Planning Model

Based on the risk cost and service benefit quantification model, We introduce a costbenefit matrix to measure the benefits and costs between any two points on the map. The map is represented as an *<sup>N</sup>* <sup>×</sup> *<sup>N</sup>* grid, and the cost-benefit matrix *TCmn* between any points *pm* and *pn* is shown in Equation (18)

$$T\mathbb{C}\_{mn} = d\_{pn,pn} + \frac{M\_{benefit}}{1 + \sum\_{n \in s(p\_j; R)} \mathbb{C}\_b(\mathbb{C}\_{demand-j})} + M\_{risk} \int\_{(\chi, y) \in \mathbb{C}} R\_{total}(\chi, y) \tag{18}$$

where *pm*, *<sup>m</sup>* <sup>∈</sup> 1, 2, ··· , *<sup>N</sup>*<sup>2</sup> ! is the current position of the drone, *pn*,*<sup>n</sup>* <sup>∈</sup> 1, 2, ··· , *<sup>N</sup>*<sup>2</sup> ! is the next position of the drone. *dpm*,*pn* is the Euclidean distance between *pm* and *pn*. <sup>∑</sup>*n*∈*s*(*pj*,*R*) *Cb*(*Cdemand*−*j*) is the benefit generated by the demand of all customers that can be served at point *pn*. *Mbenefit* and *Mrisk* are the coefficients of service benefit and risk cost, which affect the path planning strategy. In practice, *Mbenefit* and *Mrisk* can be adjusted according to preference. For example, if the tolerance for risk cost is poor, then *Mrisk* can be set to a higher value to amplify the impact of risk cost.

The goal of the present work is to plan a service path with minimum total cost. The total cost includes the risk cost and the inverse of the service benefit. The objective function is shown in Equation (19)

$$\min: TC(P) = \sum\_{\varepsilon\_{\text{ir}} \in P} TC(\varepsilon\_{\text{ir}}), i > 0, r = i + 1 \tag{19}$$

where *P* is the flight path consisting of edge *e*, *TC*(*P*) is the total cost of the path *P*, and *TC*(*eir*) is the cost of the edge *eir*.

According to the drone energy consumption model, the power available for flight is limited. Therefore, the logistics drone must complete the service and reach the endpoint as soon as possible before consuming the planned available power. The constraint is defined as

$$\forall l\_{ir} \ge l\_{\min}, e\_{ir} \in P, r = i + 1, i, r > 0 \tag{20}$$

$$E\_{consume} = \sum\_{e\_{ir} \in P} E\_{consume}^{ir} \le E\_{plan} \tag{21}$$

$$\frac{(\mathbf{x}\_{i} - \mathbf{x}\_{i-1}, y\_{i} - y\_{i-1})^{T} (\mathbf{x}\_{i+1} - \mathbf{x}\_{i}, y\_{i+1} - y\_{i})}{\left\| (\mathbf{x}\_{i} - \mathbf{x}\_{i-1}, y\_{i} - y\_{i-1}) \right\| \cdot \left\| (\mathbf{x}\_{i+1} - \mathbf{x}\_{i}, y\_{i+1} - y\_{i}) \right\|} \geq \cos \beta\_{\max} \tag{22}$$

Equation (20) represents the shortest distance constraint for an edge between two adjacent nodes in the drone path, *l*min is the minimum distance of the edge, and *lir* is the length of the edge *eir*. Equation (21) represents that the total energy consumption of the drone must not exceed the available power, *Econsume* is the total energy consumption of the path *P*, *Eir consume* is the energy consumption of each side *eir* in the path *P*, and *Eplan* is the total available power. Equation (22) represents the constraint on the maximum turning angle of the drone, (*xi*, *yi*), (*xi*−1, *yi*−1), (*xi*+1, *yi*<sup>+</sup>1) are the coordinates of three consecutive path points, and *β*max is the maximum acceptable turning angle.

#### 3.2.4. Path Planning Algorithm

To solve the least-cost flow problem for large scale in this study, heuristic methods (e.g., A\* algorithm) have better performance in terms of computational time to solve the path planning problem. The standard A\* algorithm generally uses the Manhattan or Euclidean distance to select the following move location. However, in the cost-benefit environment established in this paper, the cost of each raster is different and unevenly distributed, so considering only the distance cannot reflect the actual cost of the path. As the complexity of the environment increases, the traditional A\* algorithm has difficulty finding a suitable path and deadlocks. Therefore, the following path search rule is proposed to improve the environment's exploration, and the rule's effectiveness is verified in the experimental stage.

(1) Environmental exploration strategy

In this work, a heuristic factor is set according to the Boltzmann distribution to ensure a complete exploration of the environment. The drone is currently at the path point *pi*, *<sup>i</sup>* <sup>∈</sup> 1, 2, ··· , *<sup>N</sup>*<sup>2</sup> ! , and the probability of the point *pr*, *<sup>r</sup>* <sup>∈</sup> 1, 2, ··· , *<sup>N</sup>*<sup>2</sup> ! being selected as the next path point is calculated based on the value *TCir*, as shown in Equation (23)

$$p(i,r) = \frac{\exp\left[\frac{T}{T\overline{C}\_{ir}}\right]}{\sum\_{k \in \mathcal{R}, k \neq i} \exp\left[\frac{T}{T\overline{C}\_{ik}}\right]} \tag{23}$$

where *T* is the temperature parameter that controls the degree of environment exploration, *R* is the set of all *N*<sup>2</sup> points in the map. At the beginning of exploration, since the drone knows little information about the environment, a smaller *T* value is set to ensure that the drone can explore the environment quickly in the early stages. As the exploration time increases and the drone has enough information about the environment, the value of *T* is increased to ensure that the algorithm can reach convergence within a specific time.

(2) Original global path generation rules

A sequence of points forms a drone path. The calculation of the cost-benefit value *TCmn* for the drone moving between two points in the map is established in Equation (18). The path point exploration rule based on the cost-benefit value *TCmn* is established in Equation (23). Based on this, our global path planning is divided into two steps. Based on this, our global path planning is divided into two steps. Firstly, based on the cost-benefit value in the environment at the planning start time *t*0, a series of paths satisfying the constraints are iteratively generated according to the global search method (as shown in Algorithm 1), and the path with the optimal cost-benefit value is selected as the original global path. The second step performs local replanning on the basis of the original global path (as shown in Algorithm 2). The generation of the original global path is described as follows.

(1) For the *i*-th drone (UAV*i*), for each *episode* repeat (2)–(6).

(2) Initialise *Pathi* to an empty list *Pathi*[]. The initial position *P*<sup>0</sup> of the drone is the first point *Pathi*[1] in *Pathi*.

(3) For each step in each *episode*, repeat (4)–(5).

(4) For the current location point *ps*, select the next point *ps*<sup>+</sup><sup>1</sup> according to the Boltzmann exploration strategy.

(5) Add *ps*<sup>+</sup><sup>1</sup> to *Pathi*[], as the *s* + 1-th path point *Pathi*[*s* + 1]. Return to (3) until the target point is reached or the power is exhausted.

(6) Finish this *episode*, *episode* + 1, and return to (2).

(7) Until *episode* = *MAX*, the learning process ends and the current optimal *Path* is output.

The process of global path planning is defined in Algorithm 1.


(3) Drones Movement and local path replanning rules

Based on the original global path defined in Algorithm 1, we need to further establish the rules that the drone moves according to the original path and simulate the actual operation of the drone on the original path. During the flight of drones, new risk areas may appear on the map as time changes, causing the subsequent part of the original global path to cross high-risk cost areas, and then the original path needs to be locally replanned. The reason for the above situation is that global path planning is carried out at time *t*0, and some risk zones in the environment do not exist at this time but appear at time *t* = *t*<sup>0</sup> + Δ*t* (e.g., the temporary gathering of pedestrians due to time-predictable activities). This risk zone needs to be addressed by local path replanning rules during the actual flight of the UAV based on the original global path. This does not require real-time path planning, only further pre-planning for new risk zones that are known to occur during flight. The process of local path replanning by the drones to avoid the newly generated risk zone is defined in Algorithm 2.

For the *i*th drone (UAV*i*) *Scan* is executed after moving one step along the original path. After *Scan* is executed, there are two scenarios. The first scenario is the discovery of new obstacles (including other drones), and the cost-benefit matrix will be recalculated for replanning the subsequent paths. The second scenario is that the surrounding environment remains unchanged, and the path also keeps the same. For each time interval Δ*t*, the step length of the drone movement is fixed as *Step*. If the distance between the current position and the subsequent path point is less than *Step*, the drone will move directly to the subsequent path point.

