1. Introduction
The oceans cover over 70% of the Earth’s surface, serving as not only a source of life and a treasure trove of resources, but also as a crucial domain for human activities, including marine fisheries, tourism, and resource exploitation. One of the reasons for the prosperity of maritime businesses is that, despite the increasing intensity of marine transportation, the number of maritime accidents has not sharply increased, thanks to improvements in safety measures and technology. This observation can be seen from a recent study showing that, with the overall increment in oil trading, oil spills have decreased [
1]. Nonetheless, once maritime accidents happen, they could result in significant human casualties, property losses, and severe environmental impacts. Hence, in order to promote the long-term development of maritime businesses, it has become imperative to conduct research and implement maritime search and rescue missions [
2]. To ensure the efficient execution of these missions, it is common to deploy a group of unmanned aerial vehicles (UAVs) to conduct surveillance in the target area and initiate rescue operations based on the real-time assessment of the disaster situation [
3]. In maritime rescue operations, efficient path planning for UAVs becomes crucial. UAVs depart from the base station carrying a certain amount of rescue supplies, then they patrol multiple disaster sites, locate the targets in need of rescue, and drop the rescue equipment, such as lifebuoys. After completing the mission, these UAVs need to return to the base station. The path planning problem for UAVs in maritime search and rescue can be modeled as a vehicle routing problem (VRP) [
4,
5], aiming to find optimal paths that minimize search and rescue time while maximizing the effectiveness of the rescue efforts.
As a variant of the Traveling Salesman Problem (TSP) [
6], the VRP [
7] is an NP-hard combinatorial optimization problem that originated in the field of logistics. It was first introduced by Dantzig and Ramser [
8] in 1959 to address the challenge of route planning and resource utilization for delivery vehicles. The fundamental idea of the VRP is to determine the optimal routes for a set of delivery locations and a given number of vehicles, aiming to fulfill all customer demands while minimizing vehicle travel distances. In recent years, in order to model real-world scenarios more accurately, many variants of the VRP [
9,
10] have been proposed, in which the VRP with time windows (VRPTW) [
11,
12] is a typical one. In the VRPTW, each customer has a specific time window within which the delivery should be made. The objective is to find optimal routes for vehicles to satisfy customer demands while respecting the time window constraints.
In real-world applications, the dynamic nature of logistics operations is an unavoidable issue that poses significant challenges [
13,
14]. For example, new search and rescue targets can emerge at any time, while some other targets may be deemed unreachable due to either the impossibility of rescue or because they have been confirmed to be safe. The inherent unpredictability of a real-world VRP has resulted in the expansion of the static VRP to a category of problems known as the dynamic VRP (DVRP) [
15,
16]. In particular, this paper constructs and studies a dynamic vehicle routing problem with time windows (DVRPTW), since the rescue operation needs to be carried out within an urgent time frame [
17].
Among the approaches to tackle the DVRPTW, meta-heuristics demonstrate efficient performance, as they draw inspiration from nature and can provide an approximate optimal solution within a reasonable time [
9,
18]. As one of the typical meta-heuristics, Ant Colony Optimization (ACO) [
19] has been successfully applied to various VRPs. ACO mimics the behavior of ants searching for food in a distributed manner, which closely resembles the DVRP, and ACO can accommodate uncertainty in the DVRP by introducing stochastic elements; this enables ACO to address the DVRP, leading to more robust solutions [
20]. However, ACO encounters a significant challenge when the environment changes, which has not been efficiently overcome, i.e., the pheromones from the previous environment tend to bias the algorithm towards the old optimal solution [
21]. This poses a difficulty for the algorithm to adapt and find the new optimal solution. Once the algorithm converges, it may struggle to adjust to the changing environment. One potential solution is to re-initialize the pheromone matrix, but treating each dynamic change as a complete restart step is often inefficient.
Therefore, how to adjust the pheromone matrix to adapt to dynamic changes has become a crucial issue [
20]. When a small portion of the dynamic environment changes, most of the information provided by the old pheromone matrix remains relevant to the new environment. In this case, only a small part of the pheromones needs to be adjusted. However, if the majority of the environment undergoes significant changes, then it becomes necessary to re-initialize the entire pheromone matrix.
Several strategies have been proposed and combined with ACO to reduce the re-optimization time while efficiently maintaining high-quality output. These strategies can be divided into four categories: increasing diversity after a dynamic change [
21], maintaining diversity during the execution [
22], memory-based schemes [
23], and hybrid algorithms [
24]. Among these strategies, solving the DVRP based on the immigration strategy [
25] shows promising results. In the immigration strategy, some newly generated ants are called immigrant ants, which replace some ants in the current population and improve the performance of the overall algorithm. Based on the different ways of immigration generation, it can be divided into the random generation of immigrants (RIACO), the generation of immigrants based on elitism (EIACO), and the generation of immigrants based on memory (MIACO) [
25].
RIACO generates n random immigrants to replace n ants that are the worst performing ants in the current environment to improve algorithm performance. EIACO selects the best performing ants as immigrants to replace the worst performing ants in the environment to improve the performance of the algorithm. MIACO is suitable for cyclic environments where the environmental changes are cyclic, storing several memories of migrants to replace the corresponding ants when the environment changes.
However, the algorithms mentioned above are not efficient in addressing the issue of the DVRPTW. For example, MIACO may perform well in cyclic scenarios, but poorly in others. EIACO might also suffer from local optima issues, where the algorithm converges to a local optimal solution and fails to find the global optimum. The RIACO algorithm typically relies on probabilistic models to address uncertainty, which may not always accurately represent real-world uncertainties. Hence, this paper designs a new strategy for adjusting the pheromone matrix to effectively address the random presence and disappearance of rescue targets in a dynamic environment.
Furthermore, in order to enhance the population diversity and search abilities of ants, simulated annealing (SA) is incorporated into ACO for solving the DVRPTW. SA is a metaheuristic algorithm inspired by the annealing process in metallurgy. It mimics the slow cooling of a material, allowing its atoms to settle into a low-energy state. In the context of optimization, SA can help ACO escape local optima and explore a wider search space by accepting suboptimal solutions with a certain probability.
The dynamic generators [
26,
27] utilized in the literature are typically based on known optimal solutions, meaning that the optimal solution remains unchanged throughout the environmental changes. While this approach ensures that we can compare the algorithm’s performance against the optimal value, it does not accurately reflect the real-world scenario where changes are often unpredictable. In this paper, in order to better simulate real-world scenarios, we have designed a random dynamic generator that accounts for the unpredictable nature of environmental changes.
Given the dynamic nature of logistics, the DVRPTW emerges as a critical concern and opportunity for exploiting operational efficiencies. Recent developments in real-time data acquisition and intelligent transportation systems have only increased the relevance and complexities involved in this pursuit. Despite various efforts in the past and the application of metaheuristic algorithms, like ACO, there still remains a gap in effectively adapting to real-time changes and unpredictable elements in the routing problem. In this regard, the paper addresses the above research questions as follows:
- (1)
How can the pheromone matrix for ACO be adapted more effectively so that it is responsive to changing environmental conditions without frequent complete restarts?
- (2)
How would SA enhance the diversity and search capability of the ants in a given DVRPTW instance?
- (3)
How can a dynamic generator be conceived in the light of the unpredictability of real-world scenarios, to make the simulation of environments realistic, hence offering a good testing ground for new algorithms?
This paper makes the following contributions:
- (1)
A novel strategy is proposed to adapt the pheromone matrix and effectively handle random rescue targets in dynamic environments.
- (2)
To enhance the overall performance of ACO for solving the DVRPTW, the powerful technique of SA is employed, and a local search operator is designed to further improve the performance of the generated routes.
- (3)
Considering the limitations of the existing dynamic reference generator in accommodating real-world scenarios, this paper designs a random dynamic generator to provide a more realistic simulation, thereby achieving a better alignment with actual conditions.
The rest of this article is organized as follows.
Section 2 introduces the related work including the basic VRP problems and their dynamic extension and the basic ideas of ACO and SA.
Section 3 describes the proposed algorithm for solving the DVRPTW.
Section 4 introduces the dynamic benchmark generator. It can generate DVRPTW instances dynamically.
Section 5 gives the experimental results and analysis. Finally,
Section 6 gives the discussion and future work.
3. The Proposed Dynamic Ant Colony Optimization for Dynamic Vehicle Routing Problem with Time Windows
To effectively address the proposed DVRPTW that is modeled based on maritime search and rescue, this paper designs a dynamic ACO, named DACO. One of the primary characteristics of DACO is that it dynamically adjusts the pheromone values in the matrix to adapt to the new conditions. This adaptation allows the ants to explore alternative routes that may be more efficient in the updated environment.
The proposed DACO is described in Algorithm 2. The variable
is used to record the best routes in the evolutionary process. During the initialization stage (lines 1–4), the counter
i is initialized to 0 and all the customers except the depot
are unserved initially. The initial pheromone value on each edge is set. Then, the optimization process begins. In this process, the environment may change, for example new rescue targets may emerge, while others, such as those that have been rescued or are no longer accessible, may disappear from the list of active rescue points. If environment changes are detected, the pheromone matrix will be updated dynamically based on the proposed strategy (line 7), which can be seen in
Section 3.2. If it is not the first generation,
will have a specific value that becomes inaccessible. We also update
to ensure that we have feasible routes (line 9).
Algorithm 2 The pseudocode of DACO for the DVRPTW. |
- Require:
Graph , where V is the set of vertices, which represent the rescue targets and the base station of the UAVs, and E is the set of edges, with each edge’s length corresponding to the flight distance between its two vertices. - Ensure:
A feasible solution - 1:
, unserved targets set ; - 2:
for edge do - 3:
Set the initial pheromone value ; - 4:
end for - 5:
while do - 6:
- 7:
; - 8:
if then - 9:
; - 10:
end if - 11:
end if - 12:
for ant do ▹ Construct a feasible routes - 13:
; - 14:
while do - 15:
; ▹ Filter the feasible targets - 16:
if then - 17:
; - 18:
else - 19:
- 20:
, ; - 21:
end if - 22:
end while - 23:
Calculate the total cost of ; - 24:
end for - 25:
; ▹ Record the best routes - 26:
- 27:
- 28:
Update the pheromone matrix based on ; - 29:
; - 30:
end while - 31:
return ;
|
Then, each ant, who is located at the base station at the beginning, goes to construct a feasible route. In this process, a subset of candidate rescue targets is firstly selected (line 15). The selection criteria are that the supplies required at these rescue targets must be within the remaining payload capacity of the UAV, while also meeting the upper limit of its rescue waiting time. It explicitly removes the candidates whose time windows exceed the limits, as well as those whose demands exceed the capacity of the UAV. If the subset is empty, then the ant goes back to the depot to refill (lines 16–17). Otherwise, a rescue target is selected from the subset. In traditional ACO, Equation (
11) is used to decide the next target that the ant should go to. In the proposed DACO, the Metropolis rule in SA is combined with the probability in ACO to select the next target. The details can be seen in
Section 3.3. This process is repeated until all targets have been added to the routes (lines 13–22). Then, the total cost (i.e., flight distance) of the routes is calculated (line 23), and the best routes with the minimal costs are recorded (line 25). A comparison between the costs of
and
is carried out, and the best one will be retained. After this, a local search operator is designed and used to further optimize the best routes (line 27). The details of the local search operator are introduced in
Section 3.4. Then, the pheromone matrix is updated based on the best routes generated in the above steps (line 28). Then, the counter is incremented by one, and the new iteration is started, i.e., the routes are constructed according to the updated pheromone matrix. When the maximum number of iterations (
) is reached, the algorithm stops.
3.1. Solution Generation
Solution generation is performed as follows: Ants leave the base station and scan targets who can be served subject to the time limit and the remaining capacity of the UAV. The selection probability of each target is computed according to Equation (
11). Then, the next target to be served is determined by either the highest selection probability or randomly, as detailed in
Section 3.3. As infeasible solutions are rejected in the screening process, all solutions generated are feasible. If no serving targets are found, then the ant returns to the base station, and a new ant is sent to serve targets again until all of them are served.
As shown in
Figure 2, node 0 represents the base station, and nodes 1, 2, 3, 4, 5, and 6 represent the targets. Initially, start from 0. After filtering, all targets meet the constraint conditions. Then, calculate the selection probability based on Equation (
11), and then select a target. Suppose target 1 is chosen. After filtering again, only targets 2, 4, and 6 meet the constraint conditions. Repeat the above steps, and choose target 2, followed by target 4. After that, no customers meet the constraint conditions, so return to 0.
3.2. Dealing with Dynamic Changes
The dynamic changes in the DVRPTW considered in this paper are divided into two types: (1) new targets appear and (2) the original targets disappear. The strategies to deal with environment changes are to set a flag bit for each node. When dynamic changes occur, only the tag bits need to be processed. The specific strategies are shown in Algorithm 3, which can be explained as follows:
- (1)
If it is a new target, then the pheromone matrix is expanded from the original
to
. The values in the columns and rows where the new nodes are located are filled according to Equation (
12), and the remaining parts are filled by the original matrix (lines 1–9).
- (2)
If it is to cancel a target, the values in the columns and rows in the pheromone matrix where the canceled target is located are set to 0 (lines 10–17).
The algorithm will continue to evolve based on the modified pheromone matrix.
Algorithm 3 Update. |
- 1:
if x new targets appear then - 2:
for do - 3:
for do - 4:
; - 5:
; - 6:
end for - 7:
end for - 8:
- 9:
end if - 10:
if y targets cancel their requirements then - 11:
for do - 12:
locate the number s of the canceled target; - 13:
for do - 14:
, ; - 15:
end for - 16:
end for - 17:
end if
|
In cases where there are dynamic changes, the optimal routes will be inaccessible. In order to ensure the existence of feasible routes at all times, the original optimal routes will be updated as follows:
- (1)
For the addition of a new target, a new UAV will be assigned to serve it. The reason for this operation is to avoid violating constraint conditions, as assigning a new UAV ensures that the target can be serviced without conflicting with the constraints. Subsequently, in the optimization process, the targets served by the newly added UAV can be merged into the routes of the existing UAVs, thereby achieving a more optimal solution over time.
- (2)
For the cancellation of a target, it is straightforward to remove that target from the original route. In this scenario, since it is guaranteed that the updated routes will always be compliant with the constraint conditions, there is no need to perform additional checks or considerations regarding constraint violations.
3.3. Improved Ant Colony Optimization Based on Simulated Annealing
The idea that the probability is proportional to the temperature in SA is applied to ACO. In the new algorithm, a random number
is generated firstly, which follows a uniform distribution. Then,
x is used to compare with the probability
calculated in Equation (
13), where
i and
represent the current number of iterations and the maximum number of iterations, respectively. If
x is less than
, the node with the largest probability, calculated as Equation (
11), is selected. Otherwise, the roulette algorithm is used to randomly generate the next visited node.
3.4. Local Search Operator
This section proposes a local search operator to further optimize the best routes obtained in each generation. Firstly, the routes are divided into high-load or low-load according to their load, which is calculated as the total demands of targets served by them. The division is given in Equation (
14), where
and
represent the current load of the route and the maximal load of these routes, respectively. If
is less than
, it is considered as low-load. Otherwise, the route is high-load.
For the high-load routes, the exchange strategy is carried out on them. The exchange strategy learns from the idea of 2-OPT [
58], but restricts the exchange between two routes. That is, a random node from one route is chosen to exchange with the node of another route. If the new routes obtained through the exchange process can satisfy the constraints and result in lower total costs compared to the original routes, the new routes are accepted and the process is terminated. However, if the new routes fail to meet these criteria, the exchange process will be attempted a limited numbers of times.
Then, for the low-load routes, the split strategy is used on them. First, two low-load routes (e.g., A and B) are selected. Then, one of them (e.g., A) is chosen for splitting, i.e., a random node from A is selected to attempt to insert into the other route (i.e., B). The insertion starts from the first position, and after each insertion attempt, the constraint conditions including the time window and capacity of the UAVs are checked. The process continues sequentially until every node of this route has been attempted. If the conditions are not met, another node from A is tried until all nodes have been attempted. The goal is to reduce one route and transfer nodes in the low-load routes to other eligible low-load routes.
As can be seen from
Figure 3a, the red node is exchanged with the blue node, and in
Figure 3b, the yellow node is inserted from the original route to another route.
3.5. Superiority of Dynamic Ant Colony Optimization
The proposed DACO has three advantages:
- (1)
Quickly generating feasible solutions to handle dynamic changes: DACO can swiftly generate feasible solutions to address dynamic changes in the environment. This rapid response capability allows it to adjust route planning more flexibly in real-time situations, ensuring the algorithm’s applicability and practicality.
- (2)
Incorporating the Metropolis rule in SA to select the next target: DACO integrates the Metropolis rule from SA when selecting the next target. Compared to traditional ACO, this enhancement significantly improves the diversity of the solutions. By exploring more possibilities in the solution space, DACO can avoid local optima and enhance the overall quality of the solutions.
- (3)
Designing a local search operator to further optimize route performance: To further improve the performance of optimal routes, DACO includes a local search operator. This operator fine-tunes the paths within a local scope, enabling adjustments based on the existing solutions to find better routes. This approach effectively enhances the efficiency and accuracy of route planning, ensuring the superiority of the final solutions.
These three improvements enable DACO to not only increase the algorithm’s response speed and flexibility, but also enhance solution diversity and optimization capability, making it perform excellently in solving dynamic-route-optimization problems.
6. Conclusions
This study preliminarily explores the application of DACO in maritime search and rescue, modeling the problem as a DVRPTW with randomly changing rescue targets. The results obtained from the experimental evaluation revealed several key findings. Firstly, DACO consistently outperformed five existing algorithms in terms of route quality, generating superior routes with minimal total costs across various dynamic scenarios. Additionally, the convergence curves demonstrated that DACO achieved faster convergence compared to the other algorithms, indicating its efficiency in finding optimal solutions within a shorter computational time. Furthermore, DACO exhibited greater stability in response to dynamic changes, highlighting its robustness in adapting to evolving environmental conditions. This is particularly evident in its ability to maintain high-quality routes even when faced with fluctuations in catastrophic situations. These findings underscore the effectiveness and superiority of DACO in addressing the DVRPTW.
In the future, research will focus on developing an online dynamic version to enhance the real-time optimization capability of DACO. This further upgrade will ensure that DACO responds to environmental changes as they occur, thus better handling emergencies and addressing the immediate needs of marine disaster events. Additionally, applying DACO to bi-layer maritime search and rescue, such as when the area of the sea that needs to be searched is quite large, requires the deployment of vessels first and then UAVs departing from the vessels to conduct localized searches, will enhance more complex and dynamic systems, initiating further research and development. Integrating DACO with other optimization techniques, such as machine learning methods, will increase the adaptability and performance of the algorithms.