Building upon the system architecture design outlined in the previous section, this section will provide a detailed overview of each functional module within the system.
3.2.2. Task Allocation and Path Planning Module Design
The accomplishment of real-time scheduling and simultaneous processing of multiple UAV tasks necessitates meticulous algorithmic design, encompassing UAV task allocation algorithms and reinforcement-learning-based path-planning algorithms. The taxonomy of UAV task scheduling algorithms bifurcates into centralized and distributed paradigms. In the former, task allocation decisions are orchestrated by a central server, whereas the latter delegates task allocation autonomy to individual UAVs. To harness the merits of both paradigms, this study proposes a hybrid model amalgamating centralized and distributed task allocation mechanisms. Commencing with centralized task allocation, the system seamlessly transitions to distributed task reallocation whenever changes arise, thereby mitigating inter-UAV conflicts and augmenting system execution efficiency. Subsequent to curating task service enumerations for each UAV, path-planning algorithms facilitate meticulous route delineation. The comprehensive procedural delineation is visually represented in
Figure 3.
- (1)
Design of Centralized Task Allocation Strategy
In the design of the centralized task scheduling algorithm in this system, a greedy strategy is employed as the basic approach. The advantage of the greedy strategy is its simplicity and ease of implementation, which allows for quick task allocation results in a centralized system.
The solution to this problem is obtained using the greedy strategy, where the idea is to prioritize the scheduling of tasks with the highest service priority, while satisfying the energy requirements. The pseudocode for this approach is shown in Algorithm 1.
During the task execution process, the energy status of the UAV is a critical factor to consider. It is important to prioritize the energy status of UAVs to avoid task delays or the inability to complete tasks. The UAV collection is sorted in descending order of remaining energy, ensuring that UAVs with higher energy levels are selected first. Then, the task collection is sorted in descending order of priority, and the collection is iterated to find the UAV with the maximum remaining energy that meets or exceeds the energy requirements of the task. If a suitable UAV is found, it is assigned the task and its remaining energy is updated. If no suitable UAV is found, options such as increasing the number of UAVs or recharging should be considered. Finally, the task-to-UAV assignment plan is returned.
Algorithm 1 The Greedy Algorithm-based UAV Task Scheduling Algorithm |
input: T, Set of tasks: D, set of UAVS: , Energy consumption of task: , Remaining energy of UAV d: , Service priority of task t; |
output: , Solution for UAV task allocation; |
initialize: ; |
Sort the tasks in descending order of service priority; |
for all
do |
Find the UAV with remaining energy greater than or equal to, such that the remaining energy of is maximized; |
if The UAV has been found then |
; |
; |
else |
return No solution found. |
end if |
end for |
return . |
- (2)
Design of Distributed Task Redistribution Strategy
In order to enhance task scheduling efficiency, mitigate communication overhead, minimize processing delays, and augment system flexibility, an edge-deployed distributed task redistribution algorithm is implemented. This algorithm facilitates the dynamic reallocation of tasks among UAVs, leveraging their prevailing states. This proves especially beneficial in exigent scenarios, ultimately bolstering task completion efficiency. The edge-centric dynamic task redistribution algorithm takes several pivotal factors into meticulous consideration, encompassing device resource constraints, network bandwidth limitations, task attributes, and system dynamics. Fundamentally rooted in auction strategies, the algorithm treats the task assignment conundrum as an auction scenario, where UAVs function as bidders and tasks embody the auction items. UAVs vie for tasks contingent upon their respective statuses and task requisites, culminating in the assignment of a task to the UAV that proffers the most economical bidding cost. This auction-based algorithm confers noteworthy merits in the realm of decentralized decision-making and task allocation, thereby diminishing the dependence on central servers. During the operational lifecycle of the system, the redistribution of tasks is dynamically fine-tuned in consonance with real-time conditions, endowing the auction algorithm with the agility to seamlessly adapt to fluctuations in real-world scenarios. Central to the algorithm’s efficacy is the judicious selection of the UAV presenting the most economical bidding proposition for task execution. This judicious curation ensures the seamless and robust redistribution of tasks, thereby optimizing the efficiency of task completion and the utilization of UAV resources. A formal representation of the pseudocode governing task redistribution, hinging on the auction strategy, is delineated in Algorithm 2.
Algorithm 2 Task Allocation Process for UAVs based on Auction Algorithm |
input: Set of UAVS U, Set of tasks T; |
output: Solution for UAV task allocation; |
initialize: Initialize the task assignment collection ; |
while
do |
Initialize the minimum cost: ; |
Initialize the UAV corresponding to the minimum cost ; |
for
do |
for
do |
if
do |
Continue with the next iteration; |
end if |
Calculate the cost: ; |
if
then |
Update the minimum cost: ; |
Update the UAV corresponding to the minimum cost: |
Update the task corresponding to the minimum cost: |
end if |
end for |
end for |
if and then |
Complete the task ; |
Remove the task from ; |
else |
End the loop. |
end if |
end while |
return Allocation result. |
In the above pseudocode, the function calculates the cost by considering the distance, energy, and priority between the UAV and the task. It combines the calculated distance cost, energy cost, and priority cost to obtain the total cost, forming a comprehensive cost calculation method.
- (3)
Design of Reinforcement Learning-based Path Planning Algorithm
The aforementioned task scheduling algorithm has successfully achieved efficient task allocation. However, practical applications also necessitate the consideration of UAV flight path planning. During flights, UAVs must select suitable paths based on task requirements to ensure optimal task completion. This path planning conundrum can be conceptualized as the Traveling Salesman Problem (TSP). Given its NP-hard complexity, conventional solving techniques often demand substantial computational resources and struggle to address extensive-scale scenarios. Consequently, our system adopts a reinforcement-learning-based path-planning approach known as the Pointer Network. In contrast to conventional TSP-solving methods, the Pointer Network offers a novel solution that excels at addressing large-scale TSP challenges. This is attributed to its quicker computation speed and enhanced accuracy. By leveraging the strengths of the Pointer Network algorithm, our system overcomes the limitations of traditional techniques and efficiently handles complex flight path-planning for UAVs.
The Pointer Network is mainly applied to Sequence-to-Sequence (Seq2Seq) learning problems, aiming to address the core issue in sequence generation: how to select elements from the input sequence to make the output sequence more accurate. Its core idea is to map each element of the input sequence to the corresponding position in the output sequence, resulting in the final output sequence. The Pointer Network consists of two main components: the Encoder and the Decoder, as shown in
Figure 4. In the Encoder part, the input sequence is transformed into a high-dimensional representation in a mapping process. The Decoder part generates the output sequence step-by-step based on the information in this representation. Unlike traditional Seq2Seq models, the Pointer Network introduces a pointer mechanism in the Decoder, allowing the selection of elements from the input sequence in the output sequence. Specifically, the Decoder learns to generate a probability distribution for each position, corresponding to the elements of the input sequence. The Decoder then utilizes this probability distribution to determine which position’s element to select in the output sequence. This pointer mechanism enables the model to directly “point” to the elements that need to be output from the input sequence, rather than only generating words from a fixed vocabulary to construct the output sequence. Through this approach, the Pointer Network can effectively solve optimization problems such as the TSP, which requires finding the shortest path to visit a set of locations.
The Encoder part employs a Bidirectional Recurrent Neural Network (BiRNN) to transform the input sequence into an encoding matrix , where represents the encoding of the i-th element in the input sequence. Specifically, the BiRNN consists of two Recurrent Neural Networks (RNNs) that encode the input sequence from left to right and from right to left, respectively. The outputs from these two directions are then concatenated to obtain the final encoding result.
The Decoder part consists of an RNN that takes as input the previous output and the encoding matrix from the Encoder. At each timestep, the Decoder calculates an attention distribution, which indicates which parts of the input sequence should be attended to at the current position. This attention distribution is used to compute a weighted average, representing which element from the input sequence should be chosen as the next element in the output sequence.
In more detail, let
denote the output at timestep
t,
denote the state vector at timestep
t,
c denote the context vector, and
denote the input representation. The computation of the Decoder can be expressed as follows:
In the computation, f and g represent Multi-Layer Perceptron models. The term represents the probability of selecting the i-th input, The variable represents the input representation, which can be obtained by using the last state vector of the Encoder, specifically . In this process, the Decoder takes as input the sequence , and the context vector c, and predicts the next output . This process continues until all input sequences have been predicted.
In the Pointer Network, a Pointer Mechanism is introduced to perform pointer operations on the input sequence, mapping the probability distribution outputted by the Decoder to the input sequence. Specifically, given the current state
of the Decoder, the position of the pointer at timestep
t is calculated as follows:
where
represents the probability of input
i corresponding to the output
t. It can be expressed as:
where
represents the correlation between input
i and output
t, and can be expressed as:
where
,
, and
v are weight matrices, and tanh represents the hyperbolic tangent function.
Through the Pointer Mechanism, the Decoder can predict the next output based on the previous output and the context vector, and can also perform pointer operations on the input sequence for improved performance. When applied to TSP problem solving, the pseudocode for this model is shown in Algorithm 3.
In the above design, the problem is solved through two stages: Encoder and Decoder. In the Encoder stage, the algorithm extracts features from the input set of points and feeds each point’s feature vector into the Encoder, obtaining the Encoder’s output and state. The purpose of this stage is to encode the feature information of the input points into the hidden state of the Encoder for subsequent path planning.
In the Decoder stage, the path is generated iteratively by looping. First, the Decoder’s state is initialized as the last hidden state of the Encoder. Through the Pointer Mechanism, the Decoder’s state, previous output, and the Encoder’s output set are input into the model. The Pointer Mechanism calculates weight scores for each point based on the current state and the history of outputs, and normalizes the scores into a probability distribution. Then, the algorithm samples from the probability distribution to obtain the index of the next point to be visited and adds it to the path. The Decoder’s state is then updated and the output is calculated. This loop iteration continues until the path length reaches the specified size of the point set.
Finally, the distance from the last point to the starting point is calculated and added to the path length, resulting in the final shortest path distance. The model returns the shortest path distance and the order of cities visited, which in this system corresponds to the order of service for the UAVs.
Algorithm 3 Pointer Network solves the TSP problem |
input: TSP problem input point set P with point set size n; |
output: Shortest path distance d,shortest path path; |
Feature extraction is performed on the point set P to obtain the feature vector ; |
Initialize the Encoder’s state ; |
ForEach to n |
Input the feature vector and the state of the last Encoder into the Encoder; |
Compute the Encoder’s output and state ,; |
end |
Initialize the Decoder’s state ; |
Initialize path length L=0 and path=; |
while path <n do |
Input the Decoder’s state and last output as well as the Encoder’s output set into the Pointer Mechanism; |
Calculate the weight score for each point, where is the scoring function; |
Normalizing the weight scores with the function yields a probability distribution ; |
Sample from the probability distribution p to obtain the index t of the point to be visited next; |
Calculate the current path length and add point to the path U ; |
Update the Decoder’s state and compute the output ; |
end while |
Calculate the distance from the last point to the starting point ; |
return the shortest path distance and the shortest path . |