A Multi-Area Task Path-Planning Algorithm for Agricultural Drones Based on Improved Double Deep Q-Learning Net

Li, Jian; Zhang, Weijian; Ren, Junfeng; Yu, Weilin; Wang, Guowei; Ding, Peng; Wang, Jiawei; Zhang, Xuen

doi:10.3390/agriculture14081294

Open AccessArticle

A Multi-Area Task Path-Planning Algorithm for Agricultural Drones Based on Improved Double Deep Q-Learning Net

by

Jian Li

^1,2,

Weijian Zhang

¹,

Junfeng Ren

¹,

Weilin Yu

¹,

Guowei Wang

^1,*,

Peng Ding

¹,

Jiawei Wang

¹ and

Xuen Zhang

¹

College of Information Technology, Jilin Agricultural University, Changchun 130118, China

²

Bioinformatics Research Center of Jilin Province, Changchun 130118, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(8), 1294; https://doi.org/10.3390/agriculture14081294

Submission received: 14 June 2024 / Revised: 1 August 2024 / Accepted: 4 August 2024 / Published: 5 August 2024

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the global population growth and increasing food demand, the development of precision agriculture has become particularly critical. In precision agriculture, accurately identifying areas of nitrogen stress in crops and planning precise fertilization paths are crucial. However, traditional coverage path-planning (CPP) typically considers only single-area tasks and overlooks the multi-area tasks CPP. To address this problem, this study proposed a Regional Framework for Coverage Path-Planning for Precision Fertilization (RFCPPF) for crop protection UAVs in multi-area tasks. This framework includes three modules: nitrogen stress spatial distribution extraction, multi-area tasks environmental map construction, and coverage path-planning. Firstly, Sentinel-2 remote-sensing images are processed using the Google Earth Engine (GEE) platform, and the Green Normalized Difference Vegetation Index (GNDVI) is calculated to extract the spatial distribution of nitrogen stress. A multi-area tasks environmental map is constructed to guide multiple UAV agents. Subsequently, improvements based on the Double Deep Q Network (DDQN) are introduced, incorporating Long Short-Term Memory (LSTM) and dueling network structures. Additionally, a multi-objective reward function and a state and action selection strategy suitable for stress area plant protection operations are designed. Simulation experiments verify the superiority of the proposed method in reducing redundant paths and improving coverage efficiency. The proposed improved DDQN achieved an overall step count that is 60.71% of MLP-DDQN and 90.55% of Breadth-First Search–Boustrophedon Algorithm (BFS-BA). Additionally, the total repeated coverage rate was reduced by 7.06% compared to MLP-DDQN and by 8.82% compared to BFS-BA.

Keywords:

precision agriculture; coverage path-planning; improved DDQN; LSTM; dueling network

1. Introduction

One of the primary goals of the United Nations is to establish sustainable agricultural development targets to enhance global agricultural productivity. This is aimed at meeting the increasing food supply demands of the growing population while avoiding the over-exploitation of natural resources [1]. By 2050, the world’s population is expected to reach 9.7 billion, with a corresponding increase in global food demand by approximately 50%, and the number of people facing hunger is predicted to rise by 8% [2,3]. To address the needs of this growing population, agricultural systems are under significant pressure [4,5]. Due to various complex factors, large areas of farmland are being converted into residential, commercial, and industrial land uses [6,7]. With the decreasing availability of farmland, improving crop yields and maintaining food quality to meet population growth demands has become an urgent issue [8,9]. Crop yields are influenced by various factors, including soil moisture and the levels of micronutrients such as nitrogen, phosphorus, and potassium [10]. Therefore, real-time monitoring of crop stress and timely management are essential and are currently the focus of precision agriculture [11,12]. To maintain crop quality and increase yields, it is crucial to manage stressed crops efficiently and cost-effectively. With the rapid development of automatic control technologies, UAVs, due to their light weight, small size, low power consumption, and high maneuverability have been widely used in smart agriculture [13,14,15], remote-sensing monitoring [16,17], disaster rescue [18], regional surveillance [19], and forest fire monitoring [20,21]. For efficient and convenient operations in stress areas, UAV path-planning algorithms based on Coverage Path-Planning (CPP) have become a significant research direction.

CPP, as a key research area in robotic motion planning, aims to explore all required positions within a scene [22]. Common algorithms to meet these requirements include heuristic algorithms, meta-heuristic algorithms, graph-based algorithms, and artificial intelligence algorithms. Traditional algorithms include Dijkstra’s algorithm [23,24] and the A-star algorithm [25]. More advanced techniques include genetic algorithms [26], particle swarm algorithms [27], and neural network models [28]. Among neural network-based algorithms, deep learning [29] and reinforcement learning [30] methods are currently the most used. The application of Deep Q-learning Net (DQN) in coverage path-planning is mainly due to its efficiency and flexibility in handling complex decision-making tasks [31]. Compared to traditional reinforcement learning algorithms, deep Q-learning has the following advantages: (1) it uses deep neural networks to process high-dimensional input data, allowing the model to learn and extract effective features directly from raw sensory data; (2) DQN stabilizes the training process through the experience replay mechanism and target network, reducing correlation issues and gradient explosion/vanishing problems during training; (3) DQN optimizes decision strategies through continuous interaction with the environment, performing well in environments that are difficult to accurately describe or unknown to the model. Therefore, this study adopts the Double Deep Q-learning algorithm for coverage path-planning and introduces directional topology to enhance algorithm stability and convergence.

Recent research has widely applied single-UAV and multi-UAV [32] path-planning algorithms in CPP. Xing et al. used the state matrix corresponding to pre-processed grid maps as input to deep neural networks, proposing a deep reinforcement learning-based coverage path-planning method for unmanned surface vehicles (USVs) [33]. Theile et al. utilized map-like input channels to provide spatial information to agents through convolutional network layers, training Double Deep Q Networks (DDQNs) to make balanced control decisions for UAVs between limited power budgets and coverage objectives [29]. These improvements have achieved ideal results; however, the target environment areas in these studies are relatively small. For larger areas, multi-agent path-planning is a better choice, as coordinated work among multiple agents [34,35] can achieve coverage of larger, more complex areas.

Currently, many methods use multiple agents to collaborate within the entire map space [36], and various multi-formation control algorithms have been proposed by numerous scholars [37,38]. Wu et al. proposed a multi-PIW maritime SAR ship coverage path-planning framework (SARCPPF) using stochastic particle simulation methods and a sea area scale drift trajectory prediction model to predict drift paths. By establishing a hierarchical probabilistic environment map model, multiple SAR units were effectively guided for search and rescue tasks [39]. Xing et al. to improve the safety and coverage of deep-sea mining systems, proposed a cluster coverage path-planning strategy based on traditional algorithms and DQN. Experimental results showed that their proposed deep-sea mining cluster path-planning strategy performed better in terms of safety, coverage range, and coverage rate [40]. Zhu et al. proposed a two-stage coordination (TSC) strategy consisting of a high-level leader module and a low-level action executor. The high-level leader module provides topological and geometric information about the environment to the robots, and based on the observed information and environmental topology, the low-level action executor module takes primitive actions to achieve sub-goals. Experiments demonstrated that their proposed algorithm had excellent efficiency, scalability, and generalization [41].

In the field of UAV precision agriculture, the primary goal is to optimize the shortest path for full coverage, planning optimal paths for one or more UAVs [42]. Li et al. designed a boundary reduction algorithm for special situations, such as ditches and canals in the working terrain; for the special nature of concave polygon areas, they designed a topological mapping-based algorithm to judge concave points and plan paths for routes with special concave points [43]. Mukhamediev et al. proposed a genetic algorithm-based heterogeneous UAV group coverage path-planning method, which can calculate flight tasks to solve coverage problems for areas of different shapes and allow the selection of the best UAV subset from an available equipment pool [44]. Apostolidis et al. proposed a new method addressing the efficiency issues of grid-based CPP methods in practical UAV applications. Their proposed CPP method includes three different coverage modes: geographic grid coverage mode, optimized coverage mode, and full coverage mode, capable of handling the most complex shapes and concave areas with obstacles, ensuring complete coverage without sharp turns and overlapping trajectories [45].

Although these studies have extensively researched the CPP issues in precision agriculture, they mainly focus on single-area tasks (CPPs) without considering multi-area tasks (CPPs). A “multi-area task” refers to the coordinated execution of agricultural operations across multiple distinct regions within a larger farming area. However, considering multi-area task CPP scenarios is crucial, especially for large-scale crop stress issues where single-area task operations cannot meet practical demands. Therefore, this study addresses the CPP problem in multi-area tasks, utilizing the Google Earth Engine (GEE) platform to obtain the spatial distribution of nitrogen stress, create grid maps, and integrate deep reinforcement learning techniques into multi-area task CPP planning, achieving higher success rates within limited timeframes.

The main novelties of this study are summarized as follows:

A Regional Framework for Coverage Path-Planning for Precision Fertilization (RFCPPF) is proposed, including nitrogen stress spatial distribution extraction based on GEE, multi-area task environmental map construction, and coverage path-planning.
Improvements to the Double Deep Q Network (DDQN) are proposed, incorporating Long Short-Term Memory (LSTM) networks and dueling network structures. Additionally, a multi-objective reward function and tailored state and action selection strategies for stress area plant protection operations are designed.
Deep learning and directional topology are integrated into CPP for specific precision fertilization scenarios, achieving the goal of precision fertilization in multi-area tasks and providing a demonstration case for search path-planning in complex environments for precision agriculture.

The rest of this paper is organized as follows: Section 2 introduces the proposed RFCPPF, covering the extraction of stress areas and the creation of multi-area task maps based on GEE, and presents the improved Double Deep Q-learning algorithm. Section 3 presents simulation experiments and results analysis. Finally, Section 4 concludes the paper and discusses future prospects.

2. Materials and Methods

RFCPPF utilizes the GEE platform and Sentinel-2 satellite imagery along with the GNDVI to create an environmental map of multi-area tasks. This environmental map is then divided into smaller maps, which are assigned to UAVs executing CPP tasks. Multiple UAVs employ deep reinforcement learning to complete their respective coverage tasks, achieving precise fertilization operations across multiple stress areas. The overall framework of the proposed method is shown in Figure 1.

2.1. Mission Map Creation Based on GEE

The study area selected for this research is Youyi County, located in the Sanjiang Plain of Heilongjiang Province, northeastern China. The Sanjiang Plain, formed by long-term alluvial processes of the Heilongjiang, Ussuri, and Songhua rivers, is geographically situated between 130.21° E to 135.01° E longitude and 45.01° N to 48.32° N latitude. Due to large-scale agricultural development, approximately 70% of the freshwater wetlands in the Sanjiang Plain have been converted into farmland, making it an important grain production base in China. Rice, corn, and soybeans are the primary crops in the Sanjiang Plain, occupying over 90% of the total planting area. The images used in this study were collected between 20 June 2023 and 30 July 2023. This period coincides with critical growth stages for many crops, making it an optimal time for data collection. Sampling during this period allows for the early detection of nutrient deficiencies in crops and the timely implementation of corrective measures.

GEE is a cloud-based geographic information processing platform with over 40 years of Petabyte-scale remote-sensing data, suitable for large-scale remote-sensing analysis and monitoring. GEE provides significant convenience for researchers in large-scale spatial and long-term temporal research. The platform offers over 100 types of satellite remote-sensing data and their derivatives for free, including Sentinel-1 satellite SAR data with a revisit period of 6 days and 10 m spatial resolution and Sentinel-2 satellite optical remote-sensing data with a revisit period of 5 days and 10 m spatial resolution, as well as data from other satellites like Landsat and MODIS. As a high spatial resolution remote-sensing satellite available on the GEE platform, Sentinel-2 has become an important foundational data source for path-planning in recent years [46,47]. For creating stress area task maps, this study uses Sentinel-2 remote-sensing images to calculate GNDVI, performing a remote-sensing inversion. The parameters of the Sentinel-2 data are shown in Table 1.

Burns et al. demonstrated a significant correlation between GNDVI and nitrogen fertilizer application rates, with the highest inversion accuracy for calculating nitrogen content in maize crops, reaching up to 0.88 [48]. Typically, a negative GNDVI value indicates lower nitrogen content in crops, suggesting that additional nitrogen fertilizer may be needed. Therefore, this study uses GNDVI to create a task map of stress areas. The GNDVI calculation formula is shown in Equation (1):

G N D V I = \frac{B 8 - B 3}{B 8 + B 3}

(1)

This study is based on the GEE platform, utilizing Sentinel-2 remote-sensing images to calculate the Green Normalized Difference Vegetation Index (GNDVI). The cloud masking algorithm provided by GEE is used to remove clouds from the remote-sensing images to avoid their impact on the results. The GNDVI analysis was conducted to assess nitrogen stress in crops, and the nitrogen stress map was converted into a two-dimensional grid map. The inversion results are illustrated in Figure 2.

This figure plots the average GNDVI values in Youyi County from 20 June 2023 to 30 July 2023 to represent the average nitrogen content in crops during the growing season. We selected Area C, a region with notably severe nitrogen deficiency, as the focus of this study’s task area.

After obtaining the remote-sensing inversion map, an adaptive threshold calculation was performed using Otsu’s method [49] to classify the GNDVI of Figure 2C to determine the precise distribution of task areas. The optimal GNDVI threshold is calculated, and based on this threshold, the map is divided into healthy areas (non-task areas) and nitrogen stress areas (task areas). The formula for Otsu’s method is shown below:

σ_{B}^{2} (t) = ω_{1} (t) \cdot ω_{2} (t) \cdot {[μ_{1} (t) - μ_{2} (t)]}^{2}

(2)

where:

t

represents the threshold.

ω_{1} (t)

and

ω_{2} (t)

are the probabilities of the two classes separated by the threshold

t

.

μ_{1} (t)

and

μ_{2} (t)

are the means of the two classes.

Morphological opening operations were employed to remove noise from the classified map, resulting in a refined map. The transformed map is represented as a square with side length

N (N \in ℕ^{2})

. This task map was then divided into four equally sized sub-maps. Within the task map, the UAV’s operational environment is categorized into non-task areas, task areas, and boundaries. Thus, the task map can be represented using a tensor

M \in Β^{N \times N \times 3}

, with

Β = {0, 1}

.

N \times N \times 3

denotes the different parts of the task map, which are specified as follows: (1) task areas; (2) non-task areas; and (3) boundaries.

The process of creating the task map is illustrated in Figure 3.

In this study, each sub-task map is assigned to a specific UAV as its designated task area. The sub-task map is structured as a 20 × 20 grid, with each grid cell representing a unit area that the UAV will cover in a single move. Boundaries are introduced around each sub-task map to prevent overlap between UAVs’ operational areas (e.g., a UAV assigned to sub-task Map 1 inadvertently covering areas assigned to sub-task Map 2) and to ensure safety by minimizing the risk of collisions between UAVs.

2.2. Improved Double Deep Q-Learning

2.2.1. UAV Agent and Environmental Status

In the two-dimensional grid map, each UAV covers one grid cell per movement. According to the task requirements and the division of corresponding areas, each sub-map is assigned to a specific UAV during the mission execution. Each map has a designated starting point for the UAV. The objective for each UAV is to cover the task areas of its sub-map as completely as possible while minimizing paths through non-task areas and avoiding collisions with boundaries to ensure maximum operational efficiency. The mission is considered complete when the UAV has covered all designated task areas. In the context of precise fertilization for stress areas, the CPP task goal is to ensure that the UAV reaches the specified sub-map, navigates through multi-area tasks, and achieves full coverage of the task regions.

The objective of the Coverage Path-Planning (CPP) task for precision fertilization in stress areas is to ensure that UAVs reach their assigned sub-maps, navigate through multiple task areas, and achieve complete coverage. In this process, the next action of the UAV depends on the state and action of the previous step, forming a Markov Decision Process (MDP). An MDP is characterized by the interaction between the agent (UAV) and the environment, and it includes three key components: the agent’s state (S), action (A), and reward (R). For the UAV agent’s own state, at each discrete time step, the agent receives the current state, selects an action to transition to a new state, and receives a corresponding reward based on this transition. The new state is then acquired by the agent, and this cycle is repeated continuously. This iterative process ensures that UAVs make informed decisions to optimize coverage. The specific steps of this process are illustrated in Figure 4.

After simplifying the environmental map to obtain a grid map, further processing is conducted. The current sub-environment of the UAV is defined as a state matrix

M_{1} \sim M_{4}

containing

p \times p

elements, as shown in Equation (3):

M_{1} \sim M_{4} = [\begin{matrix} a_{(1, 1)} & \dots & a_{(1, p)} \\ ⋮ & ⋱ & ⋮ \\ a_{(p, 1)} & \dots & a_{(p, p)} \end{matrix}]

(3)

where

p

represents the side length of the sub-map and

a_{i}

indicates the environmental state at the i-th position of the sub-map. Different values are used to represent various states, as shown in Table 2.

To represent the reward, this study defines the topological relationship

T_{(U A V, M)}

between the UAV agent and the grid map as follows:

T_{(U A V, M)} = {\begin{matrix} a_{(i, j)} = 0, E Q_{(x, y)} (2, 0) \\ a_{(i, j)} = 1, E Q_{(x, y)} (2, 1) \\ a_{(i, j)} = 4, E Q_{(x, y)} (2, 4) \end{matrix}

(4)

In this context, EQ denotes the current status of the UAV agent, where

E Q (2, 0)

indicates that the UAV agent is currently covering non-task areas, while

E Q (2, 1)

signifies that the UAV agent is currently covering task areas.

E Q (2, 4)

represents the scenario where the UAV agent is currently covering boundaries. Following the establishment of the grid map representing precision fertilization tasks, if the UAV initiates movement and executes action

A_{i}

, its spatial orientation will undergo change. Consequently, the corresponding state matrix will also undergo modification, transitioning from the current state

S_{i}

to the subsequent state

S_{i + 1}

, as described in Equation (5).

S_{i + 1} = {\begin{matrix} x - (- 1, 0), A_{i} = 0 \\ x + (+ 1, 0), A_{i} = 1 \\ y - (0, - 1), A_{i} = 2 \\ y + (0, + 1), A_{i} = 3 \\ x -, y + (- 1, + 1), A_{i} = 4 \\ x +, y + (+ 1, + 1), A_{i} = 5 \\ x -, y - (- 1, - 1), A_{i} = 6 \\ x +, y - (+ 1, - 1), A_{i} = 7 \end{matrix}

(5)

In this scenario,

A_{i} = {0, 1, 2, \dots, 7}

represent the eight cardinal directions of UAV movement, namely: left, right, down, up, top-left, top-right, bottom-left, and bottom-right. In the interaction between multiple UAV agents and the environment, each UAV agent selects an appropriate action based on the current state of the environment and executes these actions according to its own strategy. Once the actions are executed, the environment transitions to a new state, and the agent receives a reward value associated with that action. The UAV agent uses the reward received after each action to evaluate and adjust its strategy, aiming to achieve higher rewards in future decisions. The multi-UAV decision-making process is illustrated in Figure 5.

2.2.2. Reward Function

Setting an appropriate reward function is crucial to better meet the objectives of precision fertilization Coverage Path-Planning tasks, enabling UAVs to achieve goals efficiently within a short timeframe. In precision fertilization path-planning, UAV agents need to cover the entire task area while minimizing redundant paths and time spent in non-task areas. Due to the nature of UAV operations in the air, they can move freely without considering obstacles during operations. To encourage UAV agents to cover all task areas as much as possible, the reward function should consider the following considerations: the fewer remaining task areas, the higher the reward for covering task areas. Therefore, the reward function is defined as follows:

R = {\begin{matrix} 0.05 + 0.004 \cdot e^{\frac{C_{i, o n e s}}{S_{o n e s}}}, T (U A V, M) = E Q_{(x, y)} (2, 1) \\ - 0.03, T (U A V, M) = E Q_{(x, y)} (2, 0) \end{matrix}

(6)

In this context,

C_{i, o n e s}

represents the number of covered task areas at state

S_{i}

, and

S_{o n e s}

denotes the total number of task areas in the sub-map. This configuration ensures that the final reward for each of the four sub-maps remains within 10 and facilitates more effective weight updates after smaller rewards are returned.

2.2.3. Action Selection Strategy

In the Double Deep Q-Network (DDQN), it is crucial to balance the concepts of exploitation and exploration. Exploitation involves selecting the optimal action for the agent by maximizing the values of all known state–action pairs. Conversely, exploration refers to when the agent randomly selects actions from its action set. While exploitation can help maximize expected returns in real-time, it may lead to local optima. On the other hand, exploration aids in achieving maximization of total returns in the long run.

This study adopts a method to balance exploitation and exploration in the action selection strategy to achieve a global optimal solution. In the initial stages of reinforcement learning, the agent explores with a higher probability. As the learning process progresses, the probability of exploration gradually decreases, while the probability of exploitation increases gradually. This study employs a strategy called

ε

-greedy.

At each time step, a Random Number

(R N_{s a m p l e}, R N_{s a m p l e} \in (0, 1))

is generated. The decision to explore or exploit is based on comparing

R N_{s a m p l e}

with the current value of

ε

. If

R N_{s a m p l e} > ε

, exploitation is performed by selecting the action with the highest Q-value at the current state. Otherwise, exploration is conducted by randomly selecting an action. Thus, by gradually decreasing

ε

, the agent explores more in the early stages of training and exploits more in the later stages, effectively balancing exploration and exploitation and gradually approaching the global optimal solution. The dynamic adjustment process

ε

is illustrated in Equation (7).

ε = ε_{e n d} + (ε_{s t a r t} - ε_{e n d}) \cdot e^{(- \frac{S_{d}}{ε_{d e c a y}})}

(7)

Among them,

ε

it represents the probability of randomly selecting an action,

ε_{s t a r t}

denotes the initial exploration probability,

ε_{e n d}

denotes the final exploration probability,

ε_{d e c a y}

represents the rate of exploration probability decay, and

S T

represents the current number of training steps. To alleviate the training burden, this study sets four termination conditions

S T_{d o n e}

:

Termination of the current training episode when the agent reaches the boundary;
Termination of the current training episode when the agent reaches the maximum number of steps (200);
Termination of the current training episode when the agent consecutively fails to score more than 80 times;
Termination of the current training episode when the agent covers all task areas in the submap;
These conditions are formulated as follows:

S T_{d o n e} = {\begin{matrix} T_{(U, M)} = E Q_{(x, y)} (2, 4), S_{i + 1} = S T_{d o n e} \\ S T_{\max} > 200, S_{i + 1} = S T_{d o n e} \\ N o n - R_{\max} > 80, S_{i + 1} = S T_{d o n e} \\ C_{i, o n e s} = S T_{o n e s}, S_{i + 1} = S T_{d o n e} \end{matrix}

(8)

where

S T_{\max}

represents the maximum steps for the UAV agent, and

N o n - R_{\max}

represents the maximum consecutive steps without scoring. This configuration helps reduce the training burden and prevents the UAV agent from getting stuck in local optima.

2.2.4. DDQN for Precision Fertilization Navigation with UAVs

Utilizing a DDQN, this research addresses precision fertilization navigation challenges with UAVs. In cooperative path-planning involving multiple mobile robots, value-centered deep reinforcement learning methods, including DQN and DDQN, directly provide the Q-value function. This direct approach reduces adaptability in complex environments, increases computational costs, and lowers learning efficiency. Hence, the study extends DDQN by incorporating the dueling network [50] and LSTM [51], enhancing the algorithm’s focus on obstacle navigation through neural network structure enhancement. This modification separates the Q-value function into distinct streams for state value and advantage functions, enabling more accurate value estimation.

An improved DDQN network consists of convolutional layers followed by a dueling architecture that separates the estimation of the state value function

V (s)

and the advantage function

A (s, a)

.

Q (s, a; θ, α, β) = V (s; θ, β) + (A (s, a; θ, α) - \frac{1}{| A |} \sum_{a^{'}} A (s, a'; θ, α))

(9)

where

θ

represents the parameters of the convolutional layers,

α

represents the parameters of the advantage stream, and

β

represents the parameters of the value stream.

The network architecture is composed of three convolutional layers with the GELU activations function, a fully connected layer with the GELU activation function, an LSTM layer for capturing temporal dependencies, and separate fully connected layers for the value and advantage streams. The path-planning model is shown in Figure 6.

LSTM networks are a type of recurrent neural network (RNN) that are well-suited for learning from data sequences. Unlike standard RNNs, LSTMs are designed to avoid the problem of long-term dependency, which is a common issue in traditional RNNs due to the vanishing gradient problem. LSTM achieves this by introducing a set of gates that control the flow of information and maintain a cell state, which can carry information across many time steps. The output of the LSTM layer is given by:

h_{i} = L S T M (S_{i}, h_{i - 1})

(10)

where,

h_{i}

denotes the output state of the LSTM. The updated formula of the Q-value is:

Q (S_{i}, A_{i}) \leftarrow Q (S_{i}, A_{i}) + α_{l} (R_{i + 1} + γ \underset{a'}{m a x} Q (S_{i + 1}, a') - Q (S_{i}, A_{i}))

(11)

where,

α_{l}

is the learning rate,

γ

is the discount factor and

\underset{a'}{m a x} Q (S_{i + 1}, a')

is the maximum Q-value of all possible actions in the next state

S_{i + 1}

.

The loss function used to update the network parameters is based on the Temporal Difference error between the predicted Q-values and the target Q-values. The Huber loss is used to mitigate the effect of outliers:

L (θ) = E_{(S, A, R, S') \sim D} [H u b e r (Q (S, A; θ) - y)]

(12)

where the target

y

is computed as:

y = r + γ \underset{a'}{m a x} Q (S', A'; θ^{-})

(13)

Here,

θ^{-}

represents the parameters of the target network, which are periodically updated with the parameters of the policy network

θ

. The Huber loss is defined as:

H u b e r (x) = {\begin{matrix} \frac{1}{2} x^{2}, i f | x | \leq δ \\ δ (| x | - \frac{1}{2} δ), o t h e r w i s e \end{matrix}

(14)

where

δ

is a threshold parameter.

The path-planning model proposed in this study, based on DDQN, calculates the optimal fertilization path through interactive learning between the agent and the precision fertilization environment model. The pseudocode for this process is presented in Algorithm 1.

Algorithm 1: Training algorithm of improved DDQN

Input: learning rate

α_{l}

, maximum episode

L_{m}

, maximum step size

S T_{\max}

, maximum step size with no reward

N o - R_{\max}

, size of batch, memory length, threshold

ε

1. Initialize device, hyperparameters, and other variables.

2. Initialize neural networks.

3. Define optimizer and scaler

4. Initialize replay memories for each map.

5. Initialize reward and loss tracking variables.

6. Define helper functions.

7.

If R N_{s a m p l e} > ε

8. Use policy network to select greedy action

9. Else: Select random action

10. for episode in range

(L_{m}

):

11. for t = 1 to

L_{m}

do

12. Select action using epsilon-greedy policy based on DDQN predictions:

13.

A_{i} = {\begin{matrix} r a n d o m a c t i o n, i f R N_{s a m p l e} > ε \\ \arg \max_{A} (S, A; θ) + Q (S, h, c, A; θ), otherwise \end{matrix}

14. Calculate target Q-values using DDQN:

15.

Q_{t a r g e t} = Q_{t a r g e t} (S', A; θ_{t a r g e t})

16. Apply attention mechanism to DDQN outputs.

17. Compute DDQN loss:

18.

L (θ) = E_{(S, A, R, S')} [H u b e r (Q (S, A; θ) - y)]

19. Update DDQN networks using backpropagation:

20. Update target networks periodically:

21. if t mod update_target_interval == 0:

22. Update current state:

S_{i} = S_{i + 1}

23. Continue to the next timestep if the episode is not done.

24. end for

25. end for

2.3. Experimental Setup

Environment Simulation: All simulation experiments in this study were conducted on a desktop computer equipped with a 13th Gen Intel(R) Core(TM) i5-13600KF 3.50 GHz CPU (Intel Corporation, Santa Clara, CA, USA), an NVIDIA GeForce RTX 4080 GPU with 16 GB of Video RAM (NVIDIA Corporation, Santa Clara, CA, USA), 32 GB of system RAM, and the Windows 10 operating system. The experiments were programmed using Python.

In this study, the effectiveness of the three algorithms in path-planning is evaluated using three metrics: the number of steps, coverage rate, and repeated coverage rate.

Step: The number of optimal moves required by the algorithm to complete the task.
Coverage (%): This metric quantifies the proportion of the mission area that has been effectively covered by the UAV, as shown in Equation (15):

C o v e r a g e (%) = (\frac{N u m b e r o f M i s s i o n G r i d s C o v e r e d}{T o t a l N u m b e r o f M i s s i o n G r i d s}) \times 100

(15)

3.: Repeated·coverage (%): The percentage of grid cells that are revisited during the path-planning process, as shown in Equation (16):

R e p e a t e d C o v e r a g e (%) = (\frac{N u m b e r o f R e v i s i t e d G r i d s}{T o t a l N u m b e r o f M i s s i o n G r i d s}) \times 100

(16)

Algorithm Parameters: The parameter settings for the proposed algorithm in this study are presented in Table 3.

3. Results and Discussion

3.1. Results

This study compares the proposed improved DDQN algorithm with the commonly used boustrophedon algorithm (BA) in the CPP domain, as well as with the MLP-DDQN algorithm based on the models proposed by Li et al. [52]. and Wu et al. [39]. The MLP-DDQN consists of three fully connected layers. BA is a strategy for mobile robots to cover an area from left to right or from right to left [45]. Its advantage lies in providing a systematic approach to ensure that every part of the area is covered while reducing redundancy and improving efficiency. It is typically used in scenarios such as agricultural spraying, search and rescue missions, and environmental monitoring. To adapt BA for optimal application in our proposed precision fertilization environment, we incorporated a Breadth-First Search (BFS) to ensure fair task distribution [53].

We tested these three algorithms in four different environments, with the path-planning results shown in Figure 7, Figure 8 and Figure 9. Both the improved DDQN and BFS-BA algorithms achieved complete coverage of the task areas according to our predefined pathfinding rules. However, while the MLP-DDQN achieved full coverage in a 10 × 10 grid, it failed to meet the task requirements for full coverage in a 20 × 20 grid due to the increased complexity of the environment. All three algorithms exhibited some degree of path repetition, and the BFS-BA algorithm, due to its priority search method, showed instances of grid skipping. The proposed improved DDQN algorithm demonstrated superior performance in terms of path coverage, total steps, and path repetition.

In Figure 7, Figure 8 and Figure 9, the dark blue grid represents the boundary, the light blue grid represents the mission area, the white grid represents the non-mission area, the blue square represents the drone’s starting point, the blue dot represents the path endpoint and the repeated paths for each algorithm are marked with a red box.

To quantitatively evaluate the aforementioned three methods, this study calculated the path-planning steps, repetition rate, and coverage rate required for each algorithm to achieve full coverage of the multi-area task. For fairness, the BFS-BA algorithm’s skipped steps due to grid hopping were included as actual steps. The results are shown in Table 4.

The results suggest that the proposed improved DDQN algorithm offers promising performance in terms of lower repetition rates and fewer steps required for full coverage compared to the other algorithms. Specifically, our method consistently achieved 100% coverage with minimal repetition and fewer steps across all tested scenarios.

Although the BFS-BA algorithm does not require training and can quickly achieve comprehensive coverage and task area path-planning, it encounters challenges in balancing redundant paths. Under complex conditions, this algorithm may generate a large number of repetitive searches and grid hopping, making it difficult to meet the path-planning needs in multi-area tasks with complex scenarios. This limitation is evident from the higher repetition rates and increased number of steps required, as shown in the results.

The MLP-DDQN performed the least effectively across all three metrics. Even after 100,000 training iterations, it failed to achieve full coverage, frequently terminating under the third condition specified in Equation (8). This indicates that the MLP-DDQN algorithm struggles with convergence and efficiency in complex path-planning tasks.

In contrast, the proposed improved DDQN algorithm shows notable advantages. It demonstrates a total step count that is 60.71% of MLP-DDQN and 90.55% of BFS-BA. The overall repetition rate is 7.06% lower than that of MLP-DDQN and 8.82% lower than that of BFS-BA. These results highlight the potential of the improved DDQN algorithm in reducing redundant paths and optimizing the overall path-planning process.

From the table, it is evident that the improved DDQN algorithm consistently outperformed the other algorithms in all maps. Specifically, our method achieved 100% coverage with the lowest repetition rate and fewer steps across all tested scenarios. For instance, in Map 3, our method required only 83 steps with no repeated coverage, while BFS-BA and MLP-DDQN required 91 and 116 steps, respectively, with significantly higher repetition rates.

Overall, the data clearly demonstrate the effectiveness of the improved DDQN algorithm in optimizing path-planning for UAVs in multi-area tasks, highlighting its robustness and efficiency in complex environments.

3.2. Discussion

To better compare the two algorithms, this study saved the reward every 10 iterations and then calculated the mean, maximum, and minimum values for every 100 rewards as a group. Figure 10 illustrates the reward curves of the improved DDQN algorithm and the MLP-DDQN.

As shown in the figure, the reward curve of our proposed improved DDQN gradually increases during training and stabilizes after 20,000 episodes, achieving a higher final reward. The superior performance of our improved DDQN algorithm can be attributed to the LSTM and Dueling Network architectures, which enhance learning efficiency by capturing time dependencies using LSTM. In multi-agent tasks with long-term dependency relationships, the LSTM network can comprehensively learn the complex interactions between agents, thereby enhancing state representation.

The loss values of the two algorithms are depicted in Figure 11. With the increase in training steps, the loss values of both algorithms significantly decrease and then stabilize. This indicates that our improved DDQN algorithm gradually learns the optimal policy and value function during training, demonstrating the effectiveness and reliability of the model.

Overall, the results indicate that the proposed framework and the corresponding improved DDQN algorithm can effectively accomplish multi-area CPP tasks. However, our model still has certain limitations. For instance, as shown in Table 4 and Figure 7b, the improved DDQN algorithm exhibited a 1.82% repetition rate in the path on Map 2. Although relatively low, this indicates room for improvement.

Additionally, our algorithm primarily considers multi-area CPP tasks under ideal conditions without accounting for obstacle-laden environments. In recent years, numerous researchers have studied UAV path-planning in obstacle-laden environments [54,55,56]. Yu et al. addressed the issue of agricultural vehicles lacking intelligent obstacle avoidance capabilities by embedding an improved neural network model into the Double DQN architecture, constructing an obstacle avoidance speed controller. Their model’s effectiveness was validated through simulation and field experiments [57]. Guo et al. constructed a distributed DRL framework to solve UAV navigation in highly dynamic obstacle environments, dividing the navigation task into two subtasks and using LSTM networks to guide UAV path-planning [58]. Gök proposed a model extended with Prioritized Experience Replay (PER) based on the Dueling Double Deep Q-Network (D3QN) for safe UAV path-planning in dynamic obstacle environments [59]. Wang et al. addressed the issue of unknown environments due to radar failure or communication interruption by combining the Faster R-CNN model with a data storage mechanism in a Deep Q-Network, designing a new replay memory data storage mechanism to train more efficient agents [60].

Although these studies focus on various obstacle scenarios encountered by agents during task execution, they do not consider the multi-area CPP problem. Similarly, considering that UAVs performing CPP tasks cannot always be in ideal environments, obstacles will pose significant threats during missions [61,62]. In future work, we will focus on exploring the impact of obstacles in multi-area CPP tasks, incorporating obstacles into the environment. We will also improve the proposed algorithm to meet the needs of both obstacle avoidance and CPP tasks simultaneously.

4. Conclusions and Future Work

This study successfully integrates deep reinforcement learning technology into the multi-area path-planning of agricultural UAVs, establishing a framework suitable for multi-area task, referred to as RFCPPF. The framework comprises three key modules: nitrogen stress spatial distribution extraction, multi-area task environmental map construction, and coverage path-planning. By processing Sentinel-2 remote-sensing images through the GEE platform, precise monitoring of crop nitrogen stress conditions is achieved. The DDQN-based path-planning algorithm, improved by incorporating LSTM and a dueling network, effectively guides multiple UAVs to complete precise fertilization tasks across various stress areas, thereby enhancing the operation success rate.

In the experiments, the proposed improved DDQN achieved a total step count that is 60.71% of MLP-DDQN and 90.55% of BFS-BA, demonstrating significant improvements in efficiency. Furthermore, the total repeated coverage rate was reduced by 7.06% compared to MLP-DDQN and 8.82% compared to BFS-BA, indicating a notable decrease in redundant movements.

Although this study has achieved promising results under the assumption of a static search environment, several limitations remain:

The proposed algorithm still encounters some instances of path repetition, indicating the need for further optimization to enhance its efficiency;
The study was conducted in a relatively ideal environment without considering the presence of obstacles. However, for drones performing tasks, obstacles are a significant threat. Thus, addressing multi-area CPP tasks in the presence of obstacles is crucial.

In future work, we plan to refine the model further and adjust parameters to enhance our results. The objectives are to reduce the repetition rate and improve the efficiency of the path-planning process. We will explore advanced techniques and fine-tune the reward function to better guide the agent’s actions, ensuring more optimal path selection. Additionally, we will consider the complexity of the environment, including the introduction of obstacles in the multi-area CPP task maps and accounting for potential dynamic changes in obstacles. Furthermore, we will explore deploying this framework on drones to assess its practical application value, aiming to further improve the efficiency and intelligence of precision agriculture operations.

Author Contributions

Conceptualization, W.Z., J.L. and G.W.; methodology, G.W. and W.Y.; software, J.R., P.D. and W.Y.; validation, W.Z., J.W. and X.Z.; formal analysis, W.Z. and W.Y.; investigation, X.Z.; resources, W.Z. and W.Y.; data curation, J.R. and P.D.; writing—original draft preparation, W.Z.; writing—review and editing, J.L.; visualization, J.W.; supervision, G.W.; project administration, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Changchun Science and Technology Development Program, grant number 21ZGN26 and by the Jilin Province Science and Technology Development Program, grant number 20230508026RC.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviation

DDQN	Double Deep Q-learning Net
CPP	Coverage Path Planning
RFCPPF	Regional Framework for Coverage Path Planning for Precision Fertilization
GEE	Google Earth Engine
GNDVI	Green Normalized Difference Vegetation Index
LSTM	Long Short-Term Memory
BFS-BA	Breadth-First Search-Boustrophedon Algorithm
DQN	Deep Q-learning Net
S	State
A	Action
R	Reward
$ℕ^{2}$	Natural Numbers

References

Pawlak, K.; Kołodziejczak, M. The Role of Agriculture in Ensuring Food Security in Developing Countries: Considerations in the Context of the Problem of Sustainable Food Production. Sustainability 2020, 12, 5488. [Google Scholar] [CrossRef]
Gu, D.; Andreev, K.; Dupre, M.E. Major Trends in Population Growth around the World. China CDC Wkly. 2021, 3, 604. [Google Scholar] [CrossRef]
Van Dijk, M.; Morley, T.; Rau, M.L.; Saghai, Y. A Meta-Analysis of Projected Global Food Demand and Population at Risk of Hunger for the Period 2010–2050. Nat. Food 2021, 2, 494–501. [Google Scholar] [CrossRef]
Edwards, C.A. The Importance of Integration in Sustainable Agricultural Systems. In Sustainable Agricultural Systems; CRC Press: Boca Raton, FL, USA, 2020; pp. 249–264. [Google Scholar]
Prăvălie, R.; Patriche, C.; Borrelli, P.; Panagos, P.; Roșca, B.; Dumitraşcu, M.; Nita, I.-A.; Săvulescu, I.; Birsan, M.-V.; Bandoc, G. Arable Lands under the Pressure of Multiple Land Degradation Processes. A Global Perspective. Environ. Res. 2021, 194, 110697. [Google Scholar] [CrossRef] [PubMed]
Hidayat, A.R.T.; Hasyim, A.W.; Prayitno, G.; Harisandy, J.D. Farm Owners’ Perception toward Farmland Conversion: An Empirical Study from Indonesian Municipality. Environ. Res. Eng. Manag. 2021, 77, 109–124. [Google Scholar] [CrossRef]
Saputra, R.A.; Tisnanta, H.S.; Sumarja, F.X.; Triono, A. Agricultural Land Conversion for Housing Development and Sustainable Food Agricultural Land. Tech. Soc. Sci. J. 2022, 37, 216. [Google Scholar] [CrossRef]
Clough, Y.; Kirchweger, S.; Kantelhardt, J. Field Sizes and the Future of Farmland Biodiversity in European Landscapes. Conserv. Lett. 2020, 13, e12752. [Google Scholar] [CrossRef] [PubMed]
Folberth, C.; Khabarov, N.; Balkovič, J.; Skalský, R.; Visconti, P.; Ciais, P.; Janssens, I.A.; Peñuelas, J.; Obersteiner, M. The Global Cropland-Sparing Potential of High-Yield Farming. Nat. Sustain. 2020, 3, 281–289. [Google Scholar] [CrossRef]
Grzyb, A.; Wolna-Maruwka, A.; Niewiadomska, A. Environmental Factors Affecting the Mineralization of Crop Residues. Agronomy 2020, 10, 1951. [Google Scholar] [CrossRef]
Segarra, J.; Buchaillot, M.L.; Araus, J.L.; Kefauver, S.C. Remote Sensing for Precision Agriculture: Sentinel-2 Improved Features and Applications. Agronomy 2020, 10, 641. [Google Scholar] [CrossRef]
Sishodia, R.P.; Ray, R.L.; Singh, S.K. Applications of Remote Sensing in Precision Agriculture: A Review. Remote Sens. 2020, 12, 3136. [Google Scholar] [CrossRef]
Singh, P.K.; Sharma, A. An Intelligent WSN-UAV-Based IoT Framework for Precision Agriculture Application. Comput. Electr. Eng. 2022, 100, 107912. [Google Scholar] [CrossRef]
Maddikunta, P.K.R.; Hakak, S.; Alazab, M.; Bhattacharya, S.; Gadekallu, T.R.; Khan, W.Z.; Pham, Q.-V. Unmanned Aerial Vehicles in Smart Agriculture: Applications, Requirements, and Challenges. IEEE Sens. J. 2021, 21, 17608–17619. [Google Scholar] [CrossRef]
Boursianis, A.D.; Papadopoulou, M.S.; Diamantoulakis, P.; Liopa-Tsakalidi, A.; Barouchas, P.; Salahas, G.; Karagiannidis, G.; Wan, S.; Goudos, S.K. Internet of Things (IoT) and Agricultural Unmanned Aerial Vehicles (UAVs) in Smart Farming: A Comprehensive Review. Internet Things 2022, 18, 100187. [Google Scholar] [CrossRef]
Lyu, X.; Li, X.; Dang, D.; Dou, H.; Wang, K.; Lou, A. Unmanned aerial vehicle (uav) remote sensing in grassland ecosystem monitoring: A systematic review. Remote Sens. 2022, 14, 1096. [Google Scholar] [CrossRef]
Zhang, H.; Wang, L.; Tian, T.; Yin, J. A Review of Unmanned Aerial Vehicle Low-Altitude Remote Sensing (UAV-LARS) Use in Agricultural Monitoring in China. Remote Sens. 2021, 13, 1221. [Google Scholar] [CrossRef]
Shamsoshoara, A.; Afghah, F.; Razi, A.; Mousavi, S.; Ashdown, J.; Turk, K. An Autonomous Spectrum Management Scheme for Unmanned Aerial Vehicle Networks in Disaster Relief Operations. IEEE Access 2020, 8, 58064–58079. [Google Scholar] [CrossRef]
Van Cuong, N.; Hong, Y.-W.P.; Sheu, J.-P. UAV Trajectory Optimization for Joint Relay Communication and Image Surveillance. IEEE Trans. Wirel. Commun. 2022, 21, 10177–10192. [Google Scholar] [CrossRef]
Sudhakar, S.; Vijayakumar, V.; Kumar, C.S.; Priya, V.; Ravi, L.; Subramaniyaswamy, V. Unmanned Aerial Vehicle (UAV) Based Forest Fire Detection and Monitoring for Reducing False Alarms in Forest-Fires. Comput. Commun. 2020, 149, 1–16. [Google Scholar] [CrossRef]
Sharma, A.; Singh, P.K. UAV-based Framework for Effective Data Analysis of Forest Fire Detection Using 5G Networks: An Effective Approach towards Smart Cities Solutions. Int. J. Commun. Syst. 2021, 2021, e4826. [Google Scholar] [CrossRef]
Chen, J.; Du, C.; Zhang, Y.; Han, P.; Wei, W. A Clustering-Based Coverage Path Planning Method for Autonomous Heterogeneous UAVs. IEEE Trans. Intell. Transp. Syst. 2021, 23, 25546–25556. [Google Scholar] [CrossRef]
Tan, C.S.; Mohd-Mokhtar, R.; Arshad, M.R. A Comprehensive Review of Coverage Path Planning in Robotics Using Classical and Heuristic Algorithms. IEEE Access 2021, 9, 119310–119342. [Google Scholar] [CrossRef]
Qin, Y.; Fu, L.; He, D.; Liu, Z. Improved Optimization Strategy Based on Region Division for Collaborative Multi-Agent Coverage Path Planning. Sensors 2023, 23, 3596. [Google Scholar] [CrossRef] [PubMed]
Kumar, K.; Kumar, N. Region Coverage-Aware Path Planning for Unmanned Aerial Vehicles: A Systematic Review. Phys. Commun. 2023, 59, 102073. [Google Scholar] [CrossRef]
Jing, W.; Deng, D.; Wu, Y.; Shimada, K. Multi-Uav Coverage Path Planning for the Inspection of Large and Complex Structures. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 1480–1486. [Google Scholar]
Bezas, K.; Tsoumanis, G.; Angelis, C.T.; Oikonomou, K. Coverage Path Planning and Point-of-Interest Detection Using Autonomous Drone Swarms. Sensors 2022, 22, 7551. [Google Scholar] [CrossRef] [PubMed]
Fevgas, G.; Lagkas, T.; Argyriou, V.; Sarigiannidis, P. Coverage path planning methods focusing on energy efficient and cooperative strategies for unmanned aerial vehicles. Sensors 2022, 22, 1235. [Google Scholar] [CrossRef] [PubMed]
Theile, M.; Bayerlein, H.; Nai, R.; Gesbert, D.; Caccamo, M. UAV Coverage Path Planning under Varying Power Constraints Using Deep Reinforcement Learning. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 1444–1449. [Google Scholar]
Le, A.V.; Veerajagadheswar, P.; Thiha Kyaw, P.; Elara, M.R.; Nhan, N.H.K. Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot. Sensors 2021, 21, 2577. [Google Scholar] [CrossRef]
Hu, W.; Yu, Y.; Liu, S.; She, C.; Guo, L.; Vucetic, B.; Li, Y. Multi-UAV Coverage Path Planning: A Distributed Online Cooperation Method. IEEE Trans. Veh. Technol. 2023, 72, 11727–11740. [Google Scholar] [CrossRef]
Fang, X.; Xie, L.; Li, X. Distributed Localization in Dynamic Networks via Complex Laplacian. Automatica 2023, 151, 110915. [Google Scholar] [CrossRef]
Xing, B.; Wang, X.; Yang, L.; Liu, Z.; Wu, Q. An Algorithm of Complete Coverage Path Planning for Unmanned Surface Vehicle Based on Reinforcement Learning. J. Mar. Sci. Eng. 2023, 11, 645. [Google Scholar] [CrossRef]
Fang, X.; Li, X.; Xie, L. 3-D Distributed Localization with Mixed Local Relative Measurements. IEEE Trans. Signal Process. 2020, 68, 5869–5881. [Google Scholar] [CrossRef]
Fang, X.; Li, X.; Xie, L. Angle-Displacement Rigidity Theory with Application to Distributed Network Localization. IEEE Trans. Autom. Control 2020, 66, 2574–2587. [Google Scholar] [CrossRef]
Ni, J.; Gu, Y.; Tang, G.; Ke, C.; Gu, Y. Cooperative Coverage Path Planning for Multi-Mobile Robots Based on Improved K-Means Clustering and Deep Reinforcement Learning. Electronics 2024, 13, 944. [Google Scholar] [CrossRef]
Fang, X.; Xie, L.; Li, X. Integrated Relative-Measurement-Based Network Localization and Formation Maneuver Control. IEEE Trans. Autom. Control 2023, 69, 1906–1913. [Google Scholar] [CrossRef]
Fang, X.; Xie, L. Distributed Formation Maneuver Control Using Complex Laplacian. IEEE Trans. Autom. Control 2023, 69, 1850–1857. [Google Scholar] [CrossRef]
Wu, J.; Cheng, L.; Chu, S.; Song, Y. An Autonomous Coverage Path Planning Algorithm for Maritime Search and Rescue of Persons-in-Water Based on Deep Reinforcement Learning. Ocean Eng. 2024, 291, 116403. [Google Scholar] [CrossRef]
Xing, B.; Wang, X.; Liu, Z. The Wide-Area Coverage Path Planning Strategy for Deep-Sea Mining Vehicle Cluster Based on Deep Reinforcement Learning. J. Mar. Sci. Eng. 2024, 12, 316. [Google Scholar] [CrossRef]
Zhu, L.; Cheng, J.; Zhang, H.; Zhang, W.; Liu, Y. Multi-Robot Environmental Coverage With a Two-Stage Coordination Strategy via Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 5022–5033. [Google Scholar] [CrossRef]
Höffmann, M.; Patel, S.; Büskens, C. Optimal Guidance Track Generation for Precision Agriculture: A Review of Coverage Path Planning Techniques. J. Field Robot. 2024, 41, 823–844. [Google Scholar] [CrossRef]
Li, J.; Sheng, H.; Zhang, J.; Zhang, H. Coverage Path Planning Method for Agricultural Spraying UAV in Arbitrary Polygon Area. Aerospace 2023, 10, 755. [Google Scholar] [CrossRef]
Mukhamediev, R.I.; Yakunin, K.; Aubakirov, M.; Assanov, I.; Kuchin, Y.; Symagulov, A.; Levashenko, V.; Zaitseva, E.; Sokolov, D.; Amirgaliyev, Y. Coverage Path Planning Optimization of Heterogeneous UAVs Group for Precision Agriculture. IEEE Access 2023, 11, 5789–5803. [Google Scholar] [CrossRef]
Apostolidis, S.D.; Vougiatzis, G.; Kapoutsis, A.C.; Chatzichristofis, S.A.; Kosmatopoulos, E.B. Systematically Improving the Efficiency of Grid-Based Coverage Path Planning Methodologies in Real-World UAVs’ Operations. Drones 2023, 7, 399. [Google Scholar] [CrossRef]
Liu, C.; Sziranyi, T. Active Wildfires Detection and Dynamic Escape Routes Planning for Humans through Information Fusion between Drones and Satellites. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; pp. 1977–1982. [Google Scholar]
Zhu, Y.; Zhang, G.; Chu, R.; Xiao, H.; Yang, Y.; Wu, X. Research on Escape Route Planning Analysis in Forest Fire Scenes Based on the Improved A* Algorithm. Ecol. Indic. 2024, 166, 112355. [Google Scholar] [CrossRef]
Burns, B.W.; Green, V.S.; Hashem, A.A.; Massey, J.H.; Shew, A.M.; Adviento-Borbe, M.A.A.; Milad, M. Determining Nitrogen Deficiencies for Maize Using Various Remote Sensing Indices. Precis. Agric. 2022, 23, 791–811. [Google Scholar] [CrossRef]
Ma, G.; Yue, X. An Improved Whale Optimization Algorithm Based on Multilevel Threshold Image Segmentation Using the Otsu Method. Eng. Appl. Artif. Intell. 2022, 113, 104960. [Google Scholar] [CrossRef]
Peng, B.; Sun, Q.; Li, S.E.; Kum, D.; Yin, Y.; Wei, J.; Gu, T. End-to-End Autonomous Driving Through Dueling Double Deep Q-Network. Automot. Innov. 2021, 4, 328–337. [Google Scholar] [CrossRef]
Lu, L.; Zhao, H.; Xv, F.; Luo, Y.; Chen, J.; Ding, X. GA-LSTM Speed Prediction-Based DDQN Energy Management for Extended-Range Vehicles. Energy AI 2024, 17, 100367. [Google Scholar] [CrossRef]
Li, Y.; Shi, J.; Jiang, W.; Zhang, W.; Lyu, Y. Autonomous Maneuver Decision-Making for a UCAV in Short-Range Aerial Combat Based on an MS-DDQN Algorithm. Def. Technol. 2022, 18, 1697–1714. [Google Scholar] [CrossRef]
Yang, G.; Hu, C.; Meng, H.; Wang, S.Y. Constraint Path Planning for an Autonomous Wall Spray Coating Robot. In Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), Dalian, China, 6–8 December 2019; pp. 2977–2983. [Google Scholar]
Zhang, S.; Li, Y.; Dong, Q. Autonomous Navigation of UAV in Multi-Obstacle Environments Based on a Deep Reinforcement Learning Approach. Appl. Soft Comput. 2022, 115, 108194. [Google Scholar] [CrossRef]
Gu, Y.; Zhu, Z.; Lv, J.; Shi, L.; Hou, Z.; Xu, S. DM-DQN: Dueling Munchausen Deep Q Network for Robot Path Planning. Complex Intell. Syst. 2023, 9, 4287–4300. [Google Scholar] [CrossRef]
Liu, Y.; Chen, C.; Wang, Y.; Zhang, T.; Gong, Y. A Fast Formation Obstacle Avoidance Algorithm for Clustered UAVs Based on Artificial Potential Field. Aerosp. Sci. Technol. 2024, 147, 108974. [Google Scholar] [CrossRef]
Yu, Y.; Liu, Y.; Wang, J.; Noguchi, N.; He, Y. Obstacle Avoidance Method Based on Double DQN for Agricultural Robots. Comput. Electron. Agric. 2023, 204, 107546. [Google Scholar] [CrossRef]
Guo, T.; Jiang, N.; Li, B.; Zhu, X.; Wang, Y.; Du, W. UAV Navigation in High Dynamic Environments: A Deep Reinforcement Learning Approach. Chin. J. Aeronaut. 2021, 34, 479–489. [Google Scholar] [CrossRef]
Gök, M. Dynamic Path Planning via Dueling Double Deep Q-Network (D3QN) with Prioritized Experience Replay. Appl. Soft Comput. 2024, 158, 111503. [Google Scholar] [CrossRef]
Wang, F.; Zhu, X.; Zhou, Z.; Tang, Y. Deep-Reinforcement-Learning-Based UAV Autonomous Navigation and Collision Avoidance in Unknown Environments. Chin. J. Aeronaut. 2024, 37, 237–257. [Google Scholar] [CrossRef]
Fu, X.; Zhi, C.; Wu, D. Obstacle Avoidance and Collision Avoidance of UAV Swarm Based on Improved VFH Algorithm and Information Sharing Strategy. Comput. Ind. Eng. 2023, 186, 109761. [Google Scholar] [CrossRef]
Castro, G.G.R.D.; Berger, G.S.; Cantieri, A.; Teixeira, M.; Lima, J.; Pereira, A.I.; Pinto, M.F. Adaptive Path Planning for Fusing Rapidly Exploring Random Trees and Deep Reinforcement Learning in an Agriculture Dynamic Environment UAVs. Agriculture 2023, 13, 354. [Google Scholar] [CrossRef]

Figure 1. The overall process of this algorithm.

Figure 2. GNDVI remote-sensing inversion results of the study area. (A) shows the overall GNDVI inversion results for the study area, (B) highlights one of the regions with the most severe nitrogen deficiency, and (C) depicts the area used for the simulation experiments.

Figure 3. Mission map production process (white area represents the boundary, gray and dark green areas represent the mission area, and black area represents the non-mission area).

Figure 4. MDP Decision Process.

Figure 5. Autonomous decision-making process of multiple drones. (white area represents the boundary, gray and dark green areas represent the mission area, and black area represents the non-mission area).

Figure 6. Structure of the improved DDQN.

Figure 7. The path-planning results of the improved DDQN algorithm proposed in this study (the red box highlights the repeated path).

Figure 8. The path-planning results of the MLP-DDQN algorithm (the red box highlights the repeated path).

Figure 9. The path-planning results of the BFS-BA (the red box highlights the repeated path).

Figure 10. Reward trends with training episodes.

Figure 11. Average loss trends with training steps.

Table 1. Sentinel-2 band parameters.

Band Specification	Color	Wavelength (nm)	Resolution (m)
Band 1	Coastal	433–453	60
Band 2	Blue	458–523	10
Band 3	Green	543–578	10
Band 4	Red	650–680	10
Band 5	Vegetation red edge	698–713	20
Band 6	Vegetation red edge	734–748	20
Band 7	Vegetation red edge	765–785	20
Band 8	NIR	785–900	10
Band 8a	Vegetation red edge	855–875	20
Band 9	Water vapor	930–950	60
Band 10	SWIR(Cirrus)	1365–1385	60
Band 11	SWIR	1565–1655	20
Band 12	SWIR	2100–2280	20

Table 2. Sub-map status corresponding to different values.

The Value of $a_{i (p, p)}$	State
0	Non-mission areas
1	Mission areas
2	The current location of the UAV
4	Map boundaries

Table 3. The parameters of the improved DDQN algorithm proposed in this study.

Parameter	Value	Description
$L_{m}$	100,000	maximum episode
$S T_{\max}$	200	maximum step size
$N o n - R_{\max}$	80	Maximum Step Size Without Reward
$γ$	0.95	discount factor
$ε_{s t a r t}$	1.0	The initial value of the exploration rate
$ε_{e n d}$	0.1	The final value of the exploration rate
$ε_{d e c a y}$	5000	The number of steps over which the exploration rate decays
$B$	128	batch size
$M$	100,000	Memory size
$L R$	1 × 10⁻⁴	learning rate
$L a y e r s_{c o n v 1}$	32	Number of neurons in conv1
$L a y e r s_{c o n v 2}$	64	Number of neurons in conv2
$L a y e r s_{c o n v 3}$	128	Number of neurons in conv3
$L a y e r s_{L S T M}$	128	Number of neurons in LSTM
$n$	5	target network update frequency
$N_a c t i o n s$	8	the output neurons
Optimizer	Adam	optimizer

Table 4. The number of steps, the repetition rate and coverage ratio of the path-planning results.

Map	Algorithms	Step	Repeated Coverage (%)	Coverage (%)
Map 1	Ours	121	0	100%
	BFS-BA	137	12.84%	100%
	MLP-DDQN	150	1.83%	53.21%
Map 2	Ours	118	1.82%	100%
	BFS-BA	135	10.91%	100%
	MLP-DDQN	155	5.45%	56.36%
Map 3	Ours	83	0.00%	100%
	BFS-BA	91	13.92%	100%
	MLP-DDQN	116	12.66%	78.48%
Map 4	Ours	109	0.00%	100%
	BFS-BA	113	0.00%	100%
	MLP-DDQN	170	12.12%	59.60%
Sum	Ours	431	0.50%	100%
	BFS-BA	476	9.32%	100%
	MLP-DDQN	591	7.56%	60.71%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Zhang, W.; Ren, J.; Yu, W.; Wang, G.; Ding, P.; Wang, J.; Zhang, X. A Multi-Area Task Path-Planning Algorithm for Agricultural Drones Based on Improved Double Deep Q-Learning Net. Agriculture 2024, 14, 1294. https://doi.org/10.3390/agriculture14081294

AMA Style

Li J, Zhang W, Ren J, Yu W, Wang G, Ding P, Wang J, Zhang X. A Multi-Area Task Path-Planning Algorithm for Agricultural Drones Based on Improved Double Deep Q-Learning Net. Agriculture. 2024; 14(8):1294. https://doi.org/10.3390/agriculture14081294

Chicago/Turabian Style

Li, Jian, Weijian Zhang, Junfeng Ren, Weilin Yu, Guowei Wang, Peng Ding, Jiawei Wang, and Xuen Zhang. 2024. "A Multi-Area Task Path-Planning Algorithm for Agricultural Drones Based on Improved Double Deep Q-Learning Net" Agriculture 14, no. 8: 1294. https://doi.org/10.3390/agriculture14081294

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Area Task Path-Planning Algorithm for Agricultural Drones Based on Improved Double Deep Q-Learning Net

Abstract

1. Introduction

2. Materials and Methods

2.1. Mission Map Creation Based on GEE

2.2. Improved Double Deep Q-Learning

2.2.1. UAV Agent and Environmental Status

2.2.2. Reward Function

2.2.3. Action Selection Strategy

2.2.4. DDQN for Precision Fertilization Navigation with UAVs

2.3. Experimental Setup

3. Results and Discussion

3.1. Results

3.2. Discussion

4. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI