Applications of Multi-Robotic Arms to Assist Agricultural Production: A Review

Gai, Xiaojian; Xu, Chang; Liu, Yajia; Feng, Qingchun; Wang, Shubo

doi:10.3390/agriengineering7060192

Open AccessReview

Applications of Multi-Robotic Arms to Assist Agricultural Production: A Review

by

Xiaojian Gai

¹,

Chang Xu

¹,

Yajia Liu

²,

Qingchun Feng

³

and

Shubo Wang

^1,*

¹

School of Automation, Qingdao University, Qingdao 266071, China

²

College of Science, China Agricultural University, Beijing 100083, China

³

Intelligent Equipment Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(6), 192; https://doi.org/10.3390/agriengineering7060192

Submission received: 22 May 2025 / Revised: 6 June 2025 / Accepted: 9 June 2025 / Published: 16 June 2025

Download

Browse Figures

Versions Notes

Abstract

With the modernization of agricultural production, single-arm machine systems in agriculture are unable to meet the needs of future agricultural development. In order to further improve agricultural operation efficiency, the collaborative operation of multi-robotic arms has become a hot topic in current research. This paper focuses on the task allocation problem in the collaborative operation of agricultural multi-robotic arms and summarizes the main algorithms currently used, including the genetic algorithm, particle swarm algorithm, etc., in terms of the aspects of work area division and task planning order. On this basis, further analysis is conducted on the path planning problem of agricultural multi-robotic arms. This paper summarizes the key technologies used in current research, including heuristic algorithms, fast search rapidly exploring random trees, reinforcement learning algorithms, etc., and focuses on reviewing the present applications of cutting-edge reinforcement learning algorithms in agricultural robotic arms. In summary, the agricultural multi-robot arms system can help with agricultural mechanization and intelligent production.

Keywords:

agricultural multi-robotic arms; task allocation; path planning; reinforcement learning; heuristic algorithm

1. Introduction

With the promotion of modernization and automation in agricultural production, robotic arms have been widely used in various fields of agricultural production due to their precise operational capabilities and stable operation advantages [1,2,3,4,5]. Driven by technologies such as sensors, computers, and artificial intelligence, agricultural robotic arms have made positive progress in various aspects, such as information perception, precise control, and job execution [6,7,8]. In order to further break through the efficiency bottleneck of single robotic arms, more researchers are focusing on combining multi-robotic arms to jointly complete operational tasks. And, in related fields such as agricultural picking and planting, it has been proven that the collaborative operation of multi-robotic arms in agriculture can improve the efficiency of agricultural operations. The 24-arm robotic arm designed by Agrobot and the multi-robot arms of FFR have increased efficiency by up to 8 times compared to manual labor, and the single-fruit picking time has been compressed to less than 1.5 s [9,10,11]. Although agricultural multi-robotic arms have significant advantages in improving operational efficiency, they face multiple technological challenges as the number of robotic arms increases, including task allocation and path planning between multiple robots [12,13].

In order to ensure complete coverage of the operating range by multi-robotic arms, it is inevitable that the working space of the robotic arms will overlap, which increases the possibility of collisions between the robotic arms. A study by Luo et al. [14] showed that in the overlapping area of multiple arms, robot arms are prone to collision. In addition, the scheduling problem between robotic arms is also a key factor affecting the efficiency of multi-arm collaborative operations. In actual agricultural work environments, task distribution is often uneven. For example, in the field of fruit harvesting, the distribution of fruits is often limited by the canopy structure of fruit trees [15,16,17]. Meanwhile, the different traversal orders of multi-robotic arms on the target points of the operation could also lead to significant differences in operation time. Kurtzer et al. [18] planned the sequence of tasks for harvesting robots and were able to reduce time costs by 12%. Therefore, in order to improve the overall efficiency of agricultural multi-robotic arm systems, it is particularly important to construct a scientific and efficient task allocation strategy for agricultural multi-robotic arms.

The task allocation problem of multi-robotic arms is a complex non-deterministic polynomial (NP) that can be solved using operations research methods [19]. But it is difficult to solve this problem using this method within a limited time. Many scholars believe that heuristic algorithms are effective collaborative evolutionary algorithms for multi-robot task allocation and scheduling in smart farms [20]. And with the development of machine vision, the application of reinforcement learning algorithms has provided new options for solving task allocation problems [21,22]. However, the effective allocation and scheduling of agricultural multi-robotic arms remains a major challenge at present.

In order to achieve reasonable task allocation for multi-robotic arms, researchers have further conducted in-depth research on the path planning problem of agricultural multi-robotic arms. The path planning technology of agricultural robotic arms, as one of the core technologies in the field of agricultural automation, is the underlying support for achieving precision agriculture and smart agriculture [23,24,25]. A reasonable path planning algorithm requires the robotic arm to effectively avoid obstacles and find the shortest feasible path under task constraints. The traditional A * algorithm and Dijkstra algorithm perform well in structured tasks but are difficult to adapt to the complex dynamic environment of agriculture. Many scholars have used intelligent heuristic algorithms such as the artificial potential field method, genetic algorithm, and ant colony algorithm to solve the path planning problem of multi-robotic arms in agricultural scenarios [26,27,28,29]. The sampling-based path planning method has become a commonly used method for solving complex multi-robotic arm path planning due to its adaptability to high-dimensional space [30,31]. Specifically, path planning algorithms based on reinforcement learning have become a cutting-edge direction in the field of path planning due to their excellent environmental perception and self-determination abilities, providing new technical support for the application of agricultural multi-robotic arms [32].

In response, this paper extensively collects and analyzes the relevant literature to explore the question of how to improve the task allocation efficiency of agricultural multi-robotic arms based on different types of robotic arms, as well as how to avoid resource waste and task conflicts among multi-robotic arms. A comprehensive review of path planning algorithms for multi-robotic arms is conducted, with a deep analysis of the advantages and disadvantages of current algorithms. Especially, the application of reinforcement learning algorithms in multi-robotic arm path planning provides a theoretical reference for promoting the application of agricultural multi-robotic arms in agricultural production.

This paper is structured as follows: Section 2 describes the task allocation problem of agricultural multi-robotic arms collaborative operation, including work area division, operation sequence, and planning sequence issues. Section 3 summarizes the application of intelligent heuristic algorithms in agricultural multi-robotic arms path planning, including obstacle avoidance strategies and trajectory optimization. Section 4 summarizes the prospects and applications of reinforcement learning algorithms in multi-arm path planning. Finally, a summary of this paper is provided in Section 5.

2. Task Allocation for Agricultural Multi-Robotic Arms

Realizing scientific task allocation is a prerequisite for ensuring efficient collaborative operation of agricultural multi-robotic arms systems. Therefore, it has attracted extensive attention from researchers. The current task allocation of multi-robotic arms in agriculture mainly faces the following difficulties. Firstly, due to the differences in motion modes and structures of different types of robotic arms, how to ensure complete coverage of the agricultural operation range by the multi-robotic arms system is the primary issue to ensure the comprehensive completion of tasks [33,34,35]. Secondly, the absence of efficient harvesting strategies inherently leads to a heightened risk of collisions, which can severely disrupt the normal operation of collaborative workflows. [36]. In addition, how to reasonably schedule multi-robotic arms systems and achieve efficient traversal of task target points are the core issues that restrict the efficient development of multi-robotic arms collaboration [21].

This paper provides a systematic overview of the structural characteristics of various types of robotic arms. It summarizes the approaches used to categorize different workspaces into indoor and outdoor tasks, as well as collaborative work strategies. Additionally, the paper reviews and analyzes current mainstream task allocation algorithms in detail. These algorithms mainly include reinforcement learning algorithms and heuristic algorithms, such as genetic algorithms and ant colony algorithms.

2.1. Division of Working Areas for Multi-Robotic Arms

In the action of agricultural multi-robotic arms collaborative operation, the multi-robotic arms system should scientifically divide the working area of the robotic arms based on the terrain characteristics and crop distribution of the agricultural environment. By dividing the area reasonably, it is possible to effectively reduce blind spots in the operation and avoid repetitive work of the robotic arms [37]. Meanwhile, due to significant differences in structure, degree of freedom of motion, and working radius among different types of robotic arms. Therefore, it is necessary to select the appropriate type of robotic arm for operation based on specific usage scenarios.

2.1.1. Robotic Arm Classification

As the core component of agricultural intelligent equipment, agricultural robotic arms can be divided into Cartesian robotic arms, articulated robotic arms, and cylindrical robotic arms according to the different crops harvested and planting methods, as shown in Table 1 [38].

The Cartesian robot arm is composed of three mutually perpendicular linear motion axes, and the workspace is composed of three motion axes, forming a spatial right-angle prism. On the one hand, the robot arm moves along three linear motion axes, which have high linear accuracy. Zahid et al. [39] used a three-degree-of-freedom Cartesian robot arm to cut agricultural branches. Thanks to the advantages of a Cartesian robot arm with high-precision positioning, it effectively meets the requirements of cutting points and angles when cutting branches. On the other hand, Cartesian robot arms rely on fixed tracks and frames and cannot adapt to high-density or unstructured environments. Zhang et al. [40] developed a four-degree-of-freedom robot arm with an infrared laser. In the older and lush orchard, the success rate is only 65.2%. The cylindrical agricultural robotic arm is designed based on a cylindrical coordinate system, with a cylindrical workspace that can perform rotational and linear telescopic movements, making it suitable for vertical or radial motion scenarios. Yoshida et al. [41] designed a double cylindrical robotic arm for picking operations in V-shaped trellis orchards, which covers fruits of different heights through telescopic motion. However, in more complex orchard environments, the linear extension and contraction of cylindrical robotic arms are susceptible to obstacles and lack obstacle avoidance capabilities.

Unlike the two types of robotic arms mentioned above, articulated robotic arms are more flexible and widely used in agriculture. Articulated robotic arms are typically composed of multiple transmission joints, as well as the workspace is typically composed of multiple spherical regions. The functions performed by the robotic arm vary depending on the number of rotating joints. Most low-degree-of-freedom robotic arms have simple control and high stability and are commonly used in crop planting, field management, and other applications. Wei et al. [42] designed a three-degree-of-freedom articulated robot arm to replace artificial wheat fertilization and used an angle sensor to control the joint angle, with a conveying efficiency of 0.833 kg/s. Due to its high flexibility and adaptability to complex environments, the high degree of freedom robot arm is widely used in fruit and vegetable picking, flower planting, and other fields. Li et al. [43] used the z-arm four-axis cooperative robot arm to pick cherry. The average fruit picking time is 10.4 s. Xiong et al. [44] designed a strawberry-picking robot, which uses an industrial 5-DOF serial robot arm and binocular vision to pick. The average single-fruit picking time is 7.5 s. For a more complex orchard environment, Zhu et al. [45] used a six-degree-of-freedom robot arm for apple picking, with a success rate of 96.0%.

In fact, three different types of robot arms are widely used not only in outdoor farms but also in indoor farms. However, due to the limited indoor space, most of the indoor operating robot arms are smaller and more compact. It is widely used in standardized crops such as strawberries and tomatoes. Most outdoor robot arms are large in size and suitable for large-scale farmland scenes.

Table 1. Advantages and disadvantages of different types of robot arms.

Applied Crops	Sensor Model	Producing Area	Structure	Advantage	Disadvantage	Ref.
Cotton	-	Clemson University team	Cartesian robotic arm	Simple structure, high control accuracy, and good rigidity	Only linear operations can be performed, and the space utilization rate is low.	[46]
Apple	SN04-N	America	Cartesian robotic arm	Simple structure, high control accuracy, and good rigidity		[39]
Apple	Intel RealSense D435	Universal robots	Multi-cylindrical robotic arm	Compact structure, easier to solve spatial trajectories	Complex structure; arm end error increases with the increase in arm length	[41]
Eggplant	Prosilica GC2450C, Mesa SwissRanger SR4000	Kinova Robotics	Dual joint robotic arm	Flexible movements, good obstacle avoidance performance, and the ability to perform complex and precise tasks	Low structural stiffness and more complex driving control	[47]

2.1.2. Regional Division

According to the different types of robot arms, how to realize the full coverage of the operating range of multi-robot arms is the premise of the efficient operation of a multi-robotic arms system.

Xiong et al. [48] used the double Cartesian robot arms for apple picking and regarded the apple picking area as a rectangle to adapt to the workspace of the Cartesian robot arm cuboid, as shown in Figure 1a. Feng et al. [25] also used Cartesian robot arms for picking operations. According to the apple tree crown, nine rectangular areas are equally divided, including four single-arm operation areas, four double-arm operation areas, and one four-arm operation area, to ensure the complete coverage of the operation range of the robot arm, as shown in Figure 1b. Compared with the Cartesian robot arm, the articulated robot arm is more flexible and widely used in agricultural dobby operations. Cui et al. [49] designed a two-arm picking robot for Kiwifruit picking in a trellis, using two UR5 articulated robot arms for operation. In order to adapt to the matrix area of the scaffolding space, an elliptic workspace is designed. Jiang et al. [50] also conducted oval modeling for the grape picking area and divided it evenly according to the grape growth area, as shown in Figure 1d. Reduce the scope of the work area as much as possible while ensuring no fruit leakage. Zhang et al. [51] equalized the operating range of the six robot arms according to the reachable area of the articulated robot arm, as shown in Figure 1c.

Figure 1. Equal division of multi-robot arms based on working area. (a) Division of dual Cartesian robot arms [48]; (b) division of four Cartesian robot arms [25]; (c) division of six joint robot arms [51]; (d) division of double joint robot arms [50].

In fact, this equalization of work areas according to the working environment is often not the optimal solution. Due to the uneven distribution of tasks of the agricultural robot arms, the robot arms with intensive tasks are in high load operation for a long time, and other parts will be idle due to insufficient work, resulting in significant differences between the workload of the arms.

Arikapudi et al. [52] divided the region according to the number of fruits in the region, as shown in Figure 2a. According to the frequency histogram and cumulative distribution function of the fruit, the working boundaries of multi-robot arms are determined to reduce the impact of uneven task distribution. In addition, increasing the overlapping area of the arm is also a feasible scheme. K Lammers et al. [53] set the gap between multi-robot arms to 7 mm, so as to maximize the overlapping area and reduce the difference in workload of each robot arm. Barnett et al. [37] further expanded the common workspace of multiple Cartesian robot arms by sharing the x-axis of multiple robot arms, as shown in Figure 2b.

2.2. Operation Sequence of Agricultural Multi-Robot Arms

Scientifically planning the operation sequence of agricultural multi-robot arms can significantly shorten the waiting time caused by the poor task connection, and effectively prevent the collision between the robot arm.

Cui et al. [49] started from the distance between the robot arms, and, at the same time, the mechanical arm on one side picked and the mechanical arm on the other side was as close to the next picking target as possible to shorten the waiting time. Yu team [54] designed a humanoid apple-picking robot to harvest apples in turn according to the coordinates of the end effector in the workspace. Xiong et al. [55] used the primary–secondary control algorithm to ensure that the distance between the two arms was greater than the safe distance in real time to prevent collision. The harvesting sequence of strawberries is shown in Figure 3a below. Henry et al. [56] controlled the safe distance between each arm to ensure the safe operation of the arm. Although these methods ensure the safety of operation from the perspective of distance, they also lose the efficiency of collaborative operation. Jiang et al. [50] implemented a left-right segmentation strategy through the central axis of the camera’s field of view, dividing the workspace into a left-arm work area and a right-arm work area, ensuring that the two robot arms operate independently while preventing the collision of the robot arms. The harvesting sequence of the double robot arms is shown in Figure 3b below. Feng et al. [25] judged whether the picking tasks of the two robots collided according to the time interval of different robots entering the public area, which effectively reduced the interference and collision of the robots and improved the operation efficiency, as shown in Figure 3c.

2.3. Task Allocation Algorithm

In the process of the development of modern agricultural intelligence, the dynamic planning of agricultural multi-robot arms cooperative operation has become a key problem restricting the large-scale application and efficiency improvement of multi-robot arms. The cooperative operation of multiple robot arms constitutes the multi-traveling salesman problem. Firstly, how to traverse the target points efficiently according to the distribution of the target points is the key to solving this problem. Many researchers use heuristic algorithms to solve such large-scale and complex problems. Secondly, the application of a reinforcement learning algorithm also provides a new strategy to solve this problem. The specific task planning algorithm, division method and average operation time are shown in Table 2.

2.3.1. Intelligent Heuristic Algorithm

A heuristic algorithm is an algorithm based on intuitive or indirect experience, which searches for the optimal solution by constructing a heuristic function. At present, the algorithms commonly used to solve the task allocation of multi-robot arms mainly include the simulated annealing algorithm, ant colony algorithm, genetic algorithm, and so on.

The simulated degradation algorithm is derived from the solid-state annealing process and seeks the optimal solution by finding the termination temperature, which is easily affected by the initial temperature and cooling coefficient [57]. Zhang et al. [51] conducted in-depth research on the initial temperature, internal circulation times, and cooling coefficient of the simulated annealing algorithm. When the initial temperature is 3000 °C and the cooling coefficient is 0.99, the convergence speed is the fastest. The search efficiency of the optimized algorithm is improved by about 20%. Feng et al. [20] proposed AOMTSP-GA based on a genetic algorithm to solve the picking sequence problem of multi-robot arms. When selecting the selection operator, it fused the best individual and roulette strategy to further improve the convergence speed of the genetic algorithm. Compared with the random traversal algorithm, the traversal time is shortened by 40.97%. However, in the case of intensive operation, it is not effective to plan the fruit covered by leaves. Regarding the issue of crop obstruction, Barnet et al. [37] prioritized harvesting fruits that are obstructed to prevent them from falling off during the picking process. Yang et al. [58] proposed a genetic ant colony algorithm (IGAACMO) to solve the problem of mushroom-picking sequences in dense environments. The pheromone concentration of the ant colony algorithm is used to choose the picking order. While ensuring the success rate of the algorithm, it also improves the efficiency of multi-robot cooperation. When there are many targets, the average success rate is as high as 97%.

2.3.2. Reinforcement Learning Algorithm

Although the use of heuristic algorithms has significantly improved the efficiency of task allocation for agricultural multi-robotic arms, it still faces problems such as high computational difficulty and convergence speed is slow. The reinforcement learning algorithm proposed by Minsky provides a new research direction for solving multi-robotic arms task allocation [59]. Thanks to the powerful online learning and dynamic decision-making capabilities of reinforcement learning, it provides a revolutionary solution for the efficient collaborative operation of multiple agricultural robotic arms.

Xie et al. [22] proposed a multi-robotic arms task allocation algorithm based on proximal policy optimization (PPO). The reward function is integrated with time reward, action reward, and conflict penalty to guide model training. The access sequence of the robotic arm is determined based on selecting the appropriate picking action in the action space. Guo et al. [32] integrated a long short-term memory (LSTM) network during the data collection phase to store the past action sequences of the robotic arm and the historical picking records of apples, improving the agent’s ability to capture long-term dependencies. Simultaneously combining the PPO to control the task planning of multi-robotic arms. Li et al. [60] adopted the training strategy of multi-agent reinforcement learning, and the centralized controller integrated the state information of each robot arm to improve the training efficiency of multi-robot arms. In the real orchard environment, the average single-fruit picking time is 5.8 s. Li et al. [36] set the discount factor and learning rate for training, which improved the convergence speed of the near-end strategy optimization model. In addition, the incremental training method is adopted to carry out multi-robot arms operation training from less target environment, and gradually migrate to more target environment, which greatly improves the generalization ability of the model. Gong et al. [61] used the soft actor–critic (SAC) algorithm for task allocation of multi-robot arms. Through the pre-allocation module, entity units are allocated to each task point to effectively avoid the problem of local optimal solutions.

Table 2. Multi-robotic arms task allocation method based on heuristic algorithm and reinforcement learning.

Target	Number	Task Allocation Method	Average Time	Efficiency Improvement	Ref.
Peach	2	Simulated annealing algorithm	-	1.20 times	[51]
Apple	4	Genetic algorithm	7.12 s	1.96 times	[25]
Peach	6	Allocation of work units for waiting fruit load	4 s	1.5 times	[52]
Fruit	4	AOMTSP-GA	3.15 s	4.28 times	[20]
Fungus		IGAACMO	1183 pcs/h	2 times	[58]
Apple	4	PPO	5.8 s	1.33 times	[36]
Apple	2	LSTM-PPO	6.26 s	1.17 times	[32]
Strawberry	2	Active obstacle separation strategy	6.1 s	1.23 times	[55]

3. Path Planning of Agricultural Multi-Robot Arms Based on Intelligent Algorithm

In the complex dynamic scene of agriculture, the multi-robot arms system is vulnerable to the dual constraints of natural terrain and crop species in the process of operation. Therefore, on the basis of task allocation, planning an appropriate path for the multi-robot arms system is the premise for the robot arm to reach the target position accurately and complete the task successfully. Unlike the path planning of a single arm, the path planning problem of multi-robot arms not only needs to plan the collision-free optimal path to the target point, but also needs to solve the problem of mutual interference between each robot arm in the overlapping area of multi-robot arms, which is a major test for the real-time, robustness, and collaborative optimization ability of agricultural multi-robot arms path planning algorithm [62].

For the special challenges faced by agricultural multi-robot arms path planning, scholars around the world have conducted a lot of research. At present, the main algorithms for path planning of multi-robot arms can be roughly divided into two categories. One is a heuristic algorithm using empirical laws. The other is a path planning algorithm based on sampling, which mainly includes a probabilistic road map and a fast search random tree algorithm.

3.1. Heuristic Algorithm

Heuristic algorithms often use empirical rules or direct observation and summary algorithms. When facing some complex NP problems, they can skillfully use heuristic information to reduce the search space, so they are suitable for real-time path planning of multi-robot arms. The heuristic information of heuristic algorithms mainly comes from the following parts: the heuristic based on natural laws, the heuristic based on terrestrial organisms, the heuristic based on aquatic organisms, and the heuristic based on aerial organisms. The common algorithms are shown in Figure 4 below. The trajectory and planning time of the arm path planning algorithm are shown in Table 3.

Figure 4. Sources of intelligent heuristic algorithms [63,64,65,66].

In the single-arm independent operation area, only one collision-free path needs to be planned. As a traditional heuristic algorithm, the artificial potential field method is inspired by natural physical phenomena [67]. A local path search algorithm is proposed by using the repulsion force field to guide the robot arm to explore the best path. The traditional artificial potential field method uses two-dimensional potential energy for path search, which is difficult to adapt to the complex agricultural operation environment. Zhang et al. [68] transferred the two-dimensional potential energy of the artificial potential field method to three-dimensional space for the path planning of the apple-picking robot arm, and the arm obtained a smoother obstacle avoidance curve. Chen et al. [69] combined the simulated annealing algorithm to search the optimal parameters of the artificial potential field and set up virtual obstacles to solve the local minimum problem, which significantly improved the obstacle avoidance speed of the robot arm. However, in the agricultural environment with more dense obstacles, the planning efficiency is significantly reduced.

Ant colony algorithm, with its distributed search characteristics and global search capability, provides a new solution to improve the efficiency of path planning for multi-robotic arms. Guo et al. [70] improved the initial parameters of the ant colony algorithm using a genetic algorithm to enhance the convergence speed of the algorithm in order to improve the efficiency of the robotic arm in picking red flowers. At the same time, in order to alleviate the errors caused by the positive feedback of the ant colony algorithm. Establish a minimum threshold for the pheromone level to increase its chances of discovering the global optimum. In the simulated picking experiment, compared with the traditional ant colony algorithm, the path length is shortened by 1.33–7.85%. Meng et al. [71] proposed an elite smooth ant colony algorithm to reduce the dependence of harvesting robotic arms on local paths. By introducing attenuation coefficients in the pheromone update strategy, the robotic arm is effectively prevented from becoming stuck in local optimal solutions. Yan et al. [72] also improved the pheromone distribution of the ant colony algorithm by dynamically adjusting the decay rate of pheromones with path length and height and using the beetle antenna algorithm for secondary optimization of the initial path. After experimental verification, the success rate of path planning is as high as 96%.

In the overlapping area of multiple arms, the cooperative operation of multi-robot arms should not only consider the static obstacles in the workspace, but also avoid the dynamic path of other robot arms in real time [12]. Ling et al. [73] defined the potential collision relationship between various parts of the double robot arm, such as allowing the left forearm and right forearm to operate together, implementing strict obstacle avoidance on the waist and shoulders, and effectively avoiding the occurrence of special tracks. Bao et al. [74] Proposed a citrus-picking system based on double robots, with one robot arm responsible for target recognition and the other for picking. The optimal path is solved by a genetic algorithm. Gao et al. [75] improved the fitness function of the genetic algorithm by using the weighted summation method to consider multiple optimization objectives at the same time to meet the path planning requirements of multi-robot arms. Xiong et al. [55] designed a monorail dobby system to pick strawberries. Inspired by the active separation of surrounding obstacles by human hands, they proposed a path planning strategy for active separation of obstacles. The robot arm is used to actively separate obstacles to reduce the picking failure caused by strawberry leaves. In the double-arm operation mode, the single fruit takes about 4.6 s.

During the operation of multi-robotic arms, it is necessary to autonomously adjust their motion trajectory to meet motion constraints. Cao et al. [76] used B-spline curves to interpolate path points, achieving flexible configuration of robotic arm velocity and acceleration, and ensuring the stability of robotic arm motion. In order to overcome the dynamic singular value problem of robotic arms, Wang et al. [77] used constrained particle swarm optimization (PSO) for coordinated trajectory planning of dual-arm space robots. Huang et al. [78] addressed the issue of motion trajectory during double-arm apple picking by generating motion trajectories using the five-degree spline difference method based on the starting and ending joint positions, effectively improving the accuracy and reliability of double-arm picking.

Table 3. Comparison of robot path planning based on heuristic algorithm.

Algorithm	Working Hours	Success Rate	Path Trajectory	Ref.
Ant Colony Algorithm	72.245 s	-		[70]
Elite Ant Colony Algorithm	3.83 s	73.3%		[71]
-	30 s	87.5%	-	[73]
Genetic Algorithm	82%	158.9 s		[74]
Active Separation Path Planning	75%	4.6 s		[55]

3.2. Path Planning Algorithm Based on Probability Sampling

In the scenario of multi-robotic arms collaborative operation in agriculture, the computational complexity of heuristic algorithms will increase exponentially due to the increase in the number of robotic arms, making it difficult to develop suitable heuristic functions in a short period of time. The path planning algorithm based on probability sampling provides a novel approach for solving multi-robotic arms path planning problems due to its efficiency and flexibility in complex environments. The path planning algorithm based on probability sampling mainly uses random sampling in the workspace to search for the optimal path using sampling points. The algorithms commonly used for multi-robotic arms path planning include the probabilistic roadmap algorithm and the fast extended random tree algorithm (RRT).

3.2.1. Probability Roadmap

The probability roadmap involves random sampling in the workspace and planning the path from the starting point to the endpoint based on the distance of the sampling points. Therefore, probabilistic roadmap algorithms are highly dependent on accurate modeling of the environment.

Cai et al. [79] used binocular vision to obtain the three-dimensional coordinates of fruits and obstacles and constructed a virtual picking environment to ensure the accuracy of robotic arm path planning. Chen et al. [80] conducted in-depth research on the path of robotic arms in narrow channel environments and proposed an improved probabilistic road map (PRM). By applying virtual force to the sample points, the distribution of sample points has been optimized. Through verification of an 8-degree-of-freedom robotic arm, the time required to search for feasible paths is shorter than that of traditional PRM algorithms. Cheng et al. [81] used Gaussian mixture models to model the work environment and further optimized the path trajectory by considering both Euclidean distance and probability distance. But it only verifies the feasibility of the path, without fully considering whether the path is safe. SepLveda et al. [47] fully considered the influence of a dynamic environment on dual-arm harvesting and used the simple text-oriented messaging protocol (STOMP) to optimize the path of the dual robotic arms. By generating random trajectories and optimizing their gradients, the path quality of the dual robotic arm is improved. Therefore, compared to traditional roadmap methods, dynamic path planning is more suitable. Through experimental verification, the success rate of double-arm picking is as high as 91.67%.

In fact, modeling the working environment is difficult in complex agricultural environments. Therefore, the probability roadmap method is difficult to apply in more complex orchard environments.

3.2.2. Rapidly Exploring Random Trees

Unlike the PRM, the method of rapidly exploring random trees generates a tree structure and finds the optimal path based on the tree structure. It has been widely applied in the path planning problem of high-dimensional space robot arms in the past few decades.

In response to the problems of low planning efficiency and long iteration time in traditional RRT algorithms, Cao et al. [82] used the idea of target gravity to guide the exploration tree to extend towards the target point and selected the optimal search step size to improve the path search capability of the robotic arm. Ye et al. [83] adjusted the adaptive coefficient for obstacle avoidance while introducing target gravity to address the issue of path search failure caused by too many obstacles. And the experimental results verified the effectiveness of the collision-free motion planning method for robotic arms. Liu et al. [84] proposed an attractive step size and step-size binary method for path planning of robotic arms, which is used to solve path searches in repulsive fields. In real environments, the path length has been reduced by 17.88%, which gives it an advantage when facing obstacles with larger occlusion areas. Li et al. [85] proposed an adaptive step-size RRT * algorithm that utilizes a dynamic step-size adjustment mechanism. Increase the step size when there are fewer obstacles, and decrease the step size when there are more obstacles. Ensure the reliability and accuracy of the path planning process for the robotic arm.

The core to improve the convergence speed of the RRT algorithm is to select the appropriate exploration node search problem. Wang et al. [86] used the third point between the starting point of the principle and the target point as the guide point for a four-way search and used P probability sampling instead of random sampling to increase the search efficiency. Wang et al. [87] proposed a method to gradually improve the sampling area to constrain the expansion nodes of the spanning tree. The boundary node is used as a new extension node to prevent the random tree from falling into the local optimal solution.

Tahir et al. [88] have performed a lot of research on the bidirectional RRT* algorithm and added the guided bidirectional search tree in the path search process to accelerate the convergence speed of the algorithm. The characteristics, trajectory and planning time of the robot arms are shown in Table 4 below.

In the field of multi-arm path planning, Zhu et al. [89] proposed an attractive RRT algorithm with adaptive step size. Establish the step-size norm between the configuration space and the workspace, and adjust the step size according to the target position. Kim et al. [90] reduced the dimension of the path planning space according to the task requirements to speed up the path search efficiency of the dual robot arm. Cui et al. [49] deployed a multi-machine distributed system for the integrated control of the two robot arms and used the RRT algorithm and the collision detection library for collision-free path planning. The average picking success rate is 82.10%, but there is still a risk of collision in the shared area of both arms. Shi et al. [91] combined with A * heuristic algorithm to calculate the path cost of each random point in order to prevent the collision problem of double robot arms in the public area. The two arms will be divided into main and auxiliary arms to avoid dynamic and static obstacles, which greatly improves the path’s searching ability of the robot arms.

In addition, the Batch Notification Tree (BIT *) algorithm proposed by Gammell et al. [92] strikes a balance between graph search algorithms and sampling algorithms, achieving optimal sampling path planning through heuristic search of implicit random geometric graphs, providing new ideas for path planning of agricultural multi-robotic arms. Ma et al. [93] established both forward and backward trees during the initialization phase and conducted alternating searches. And use fifth-degree polynomial interpolation on the joint path to further improve the safety of the robotic arm’s motion trajectory. The average execution time is 7.32 s, with an average success rate of 90%, both higher than traditional path search algorithms.

Table 4. Comparison of robot path planning based on RRT algorithm.

Degree of Freedom	Crop	Improved Algorithm	Picking Success Rate	Path Time	Ref.
-	Litchi	tRRT	100%	0.447 s	[82]
6	Litchi	AtBi-RRT	100%	0.53 s	[83]
6	Fruit	PIB-RRT	85%	7.84 s	[88]
7	Tangerine	TO-RRT	-	0.07 s	[84]

4. Path Planning of Agricultural Multi-Robot Arms Based on Reinforcement Learning Algorithm

The reinforcement learning algorithm is outstanding in the face of complex environments and tasks and has strong learning ability and self-decision-making ability, which provides a new idea for solving the path planning problem of multi-robot arms. At this stage, the research on path planning of arms based on reinforcement learning mostly focuses on the path planning of agricultural single arms, and the research on agricultural multi-robot arms is relatively less.

The path planning training of the robot arm is mainly determined by the state space, action space, and reward function. According to different training methods, reinforcement learning path planning algorithms can be classified as follows: the reinforcement learning algorithm based on strategy, the algorithm based on Q value, and the reinforcement learning algorithm based on actor–critic (A–C) [94]. The development of the reinforcement learning algorithm is shown in Figure 5.

4.1. Strategy Based Reinforcement Learning Algorithm

The strategy-based reinforcement learning algorithm can learn the optimal control strategy of the robot arm directly through the continuous interaction between the robot arm and the environment. The whole training process mainly collects the state action reward sequence of the robot arm in the whole process and constructs the strategy network. According to this control strategy, the path planning of the agricultural robot arm is guided. At present, PPO and diffusion policy optimization (DPPO) are most widely used in the path planning of agricultural robot arms [95,96].

The PPO algorithm was proposed by OpenAI in 2017. It transforms the on-policy method into the off-policy method, which effectively solves the problems of low sample efficiency and unstable training of the traditional policy gradient algorithm when updating the policy. However, when there are large samples of rewards in the experience replay area, the model is easy to fall into a local optimal solution. Qi et al. [97] decomposed the rewards into multi-dimensional frequency rewards and the sum of exponentially weighted components to reduce the impact of large reward samples. The heuristic idea is introduced into the reward function, which greatly speeds up the convergence speed at the initial stage of model training. Based on the PPO algorithm, the introduction of distributed parallel sampling in diffusion policy optimization (DPPO) has further improved the training efficiency.

Yang et al. [98] combined this with distributed sampling to search the environment in parallel by multiple agents, significantly reducing the time correlation of samples. The generalized dominance estimation is used to optimize the near-end strategy optimization algorithm to further reduce the sample demand. It effectively solves the problem of low sampling efficiency of robot arms in path planning. In addition, Yang et al. [99] combined multi-agent technology with the PPO algorithm to set up nine identical and independent agents to share reward and punishment signals to further improve sample utilization efficiency. Lin et al. [100] conducted in-depth research on the path planning of the picking robot arm. Aiming at the sparse reward problem during the training of the robot arm, they used the idea of an artificial potential field to design the reward and punishment mechanism. When the robot arm gradually approaches the target point or the angle between the robot arm and the target point decreases, they will be rewarded; otherwise, they will be punished. The success rate of obstacle avoidance is as high as 97.5%.

Actually, in the dynamic agricultural scenario, the construction of the strategy network is very complex and difficult to converge. And the training cost of robot arms is too high to be widely used in path planning of agricultural robot arms.

4.2. Reinforcement Learning Based on Q Value

In reinforcement learning, besides updating through policy networks, updates can also be made through Q functions. The Q function, also known as the action value function, is the expected cumulative reward that an intelligent agent can receive in the future after taking a certain action in a certain state. Q-learning (Q-L) and Deep Q-learning (DQN) algorithms are often used in agriculture for the path planning of robotic arms.

4.2.1. Q-Learning Algorithm

The Q-L algorithm will obtain the maximum expected value of future cumulative rewards based on the reward and punishment for selecting the next action in a certain state.

In order to improve the convergence efficiency of traditional Q-L algorithms and reduce training time. Li et al. [101] aimed to study greenhouse environments and adaptively adjust the step size of harvesting robots based on the position of obstacles. Compared to traditional Q-L algorithms, the average training time has been reduced by 34.46%. Xie et al. [102] combined an ant colony algorithm to dynamically update the pheromone matrix and iteratively adapted the path search strategy in the Q-table, effectively reducing the problem of becoming stuck in local optima. Li et al. [103] added a distance metric to Q-L to guide the robotic arm towards the target in order to avoid the problem of dimensionality disaster in path planning. Liu et al. [104] proposed a reinforcement learning strategy that combines expert experience guidance to improve the learning efficiency of the robotic arm in the early stages of training. The training path in an unstructured environment is shown in Figure 6. The selection of action selection strategies for robotic arms is equally important. Wang et al. [105] drew on the short-term convergence of the SARSA algorithm and combined it with the Q-L algorithm to indirectly influence action strategies based on the Q-value. Maoudj et al. [106] explored the Q-table initialization process by utilizing prior knowledge of the environment. At the same time, in order to reduce the search space, a new selection strategy is proposed to further accelerate the convergence of the algorithm.

The Q-L algorithm requires discretization of the state space and action space during training, which to some extent limits its ability to represent complex continuous spaces. In actual agricultural robotic arm path planning, the motion of the robotic arm is often continuous, and the state information in the environment may also be continuously changing. Discretizing it may lose some important information, affecting the accuracy and effectiveness of path planning. Therefore, it is not widely used in agricultural robotic arms.

4.2.2. Deep Q-Learning

In order to handle higher dimensional and continuous state spaces, a new algorithm DQN is formed by combining deep learning algorithms with Q-L. The DQN algorithm approximates the Q function in Q-learning through deep neural networks, enabling it to handle complex environments that traditional Q-learning cannot handle.

In response to the problems of slow convergence speed and poor path planning performance faced by traditional DQN algorithm networks. Li et al. [107] proposed an improved ERDQN algorithm, which recalculates the Q value by recording the frequency of repeated states. The more times a state is repeated, the lower the probability of its next occurrence. To a certain extent, it reduces the risk of the network converging to a local optimum and reduces the number of training rounds required for network convergence. Wang et al. [108] reconstructed the experience backtracking pool to speed up the early training of the model. At the same time, the path evaluation function is constructed and the optimal path is selected.

4.3. Actor–Critic Algorithm

The actor–critic algorithm is most widely used in practical agriculture, combining the advantages of value function-based methods and strategy-based methods. It can effectively handle the continuous motion problem of agricultural robotic arms during operation and efficiently extract multi-dimensional information such as obstacle and target point positions. Common actor–critic (AC) algorithms include soft actor–critic (SAC), deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient algorithm (TD3), etc. The specific algorithm features are shown in Table 5.

4.3.1. Soft Actor–Critic

The SAC algorithm is based on the AC algorithm and introduces a maximum entropy objective function to encourage strategies to maintain diversity. In order to reduce the training time of neural networks, Yang et al. [109] modified the network structure of SAC by adding a class residual structure, which to some extent enhanced the network’s expressive power. Simultaneously using multi-step TD error to optimize the reward mechanism and reduce the possibility of gradient vanishing problems.

Prianto et al. [110] proposed the SAC algorithm based on ex-post-experience replay for periodic motion multi-robot arms system. The deep neural network is used to store the moving position information of obstacles in the past period of time, and the ex-post-experience replay technology is used to effectively use the training data. Xiong et al. [111] improved the a-c network by combining long-term and LSTM and used LSTM to extract the characteristics of the current state. The selection of temperature parameters plays a decisive role in the training process. Xiong et al. [111] designed the neural network to automatically adjust the value of the parameter according to the change in the reward. The parameters converge when the optimal strategy is reached.

On the other hand, the design of the reward function is the key to the path quality of the robot arm. Tao et al. [112] designed a non-sparse reward function to guide the early training sparsity of the agent and guided the agent to approach according to the distance of the obstacle and the target guidance angle. Xiong et al. [113] introduced the concept of target attraction and target repulsion in the artificial potential field method into the design of reward and punishment mechanism and converted the obstacle range punishment into a single-direction punishment. By building a simulation environment in unity in 200 random picking experiments, the success rate of the directional penalty obstacle avoidance function method is 97.5%, which is 11% higher than that of the ordinary reward function method.

4.3.2. Twin Delayed Deep Deterministic Policy Gradient Algorithm

The Twin Delayed Deep Deterministic policy gradient algorithm also utilizes the AC architecture and updates the value function estimation through time differences. The TD3 algorithm proposed by Fujimoto in 2018 provides a new direction for the path planning problem of robotic arms. Yang et al. [114] utilized the double-delay deep deterministic policy gradient algorithm to conduct path planning for the green walnut harvesting robotic arm. By using the post-experience replay mechanism, the exploration ability of the agent is significantly improved, and the problem of sparse rewards in the early stage of model training is effectively alleviated.

4.3.3. Deep Deterministic Policy Gradient

The deep deterministic policy gradient algorithm proposed by Timothy et al. provides an effective method for solving complex reinforcement learning problems in continuous action spaces and is another option for the path planning of agricultural robotic arms.

To solve problems such as low training efficiency and slow convergence caused by the high proportion of illegal strategies in the traditional DDPG algorithm, Dong et al. [115] adopted an adaptive exploration method based on the epsilon–greedy algorithm. Dynamically adjusting the exploration factors and rationally allocating the probabilities of exploration and mining provide algorithmic support for its application in agricultural robotic arms. Lin et al. [116] adopted the recurrent neural network LSTM to store and remember the past states of the environment and retrieve implicit information, using some three-dimensional line segments to replace obstacles. Meanwhile, we set an appropriate reward threshold to ensure that the robotic arm does not collide. Zheng et al. [117] conducted an in-depth study on the trajectory problem of the picking robotic arm, limiting the trajectory planning to the picking plane to reduce the environmental complexity. Meanwhile, the strategies learned in the single-obstacle scenario are transferred to the mixed-obstacle scenario, and the obtained prior knowledge is utilized to guide the path planning task of the robotic arm in the complex unstructured environment.

Table 5. Path comparison of robot arm based on reinforcement learning.

Crop	Algorithm	Success Rate	Planning Time	Ref.
Guava	Recurrent deep reinforcement learning	90.90%	29 ms	[116]
Apple	Trajectory planning	-	260 s	[117]
Litchi	DPPG	96.7%	124 s	[115]
Walnut	HER-TD3	80.0%	-	[114]

In summary, there are three main types of path planning algorithms for agricultural robot arms: path planning algorithms based on intelligent heuristics, path planning algorithms based on probability sampling, and path planning algorithms based on reinforcement learning. The algorithms based on intelligence elicitation, such as the artificial potential field method and genetic algorithm, have high computational complexity. It is difficult to guarantee the optimal solution, and it is difficult to directly apply to the path planning problem of agricultural robot arm, which can assist other algorithms to improve. Path planning algorithms based on probabilistic sampling perform outstandingly in high-dimensional environments. They usually do not need to build an environment model and have good real-time performance and scalability. However, the path quality is relatively low and often requires further optimization. Benefiting from the self-learning ability and decision-making ability of reinforcement learning, the path planning algorithm of reinforcement learning provides an innovative method for solving the complex agricultural environment, which has become an important topic of current research. However, most of the current studies are still at the stage of single-robotic arm research, and there is still great potential in the path planning of multi-robotic arms in agriculture in the future.

5. Conclusions

Multi-robot systems have a wide range of applications in agricultural production practices. This paper systematically reviews recent research advancements in agricultural multi-robot systems, addressing issues such as low efficiency in cooperative tasks, difficulties in task allocation, and poor path planning quality. It provides an overview of algorithms for cooperative task allocation and dynamic environment path planning strategies to promote the future large-scale application of multi-robot systems in smart agriculture. Section 2 focuses on the task allocation problem for agricultural multi-robot systems, systematically discussing the division of work areas and task planning issues for Cartesian and joint robotic arms in different agricultural scenarios. Based on the existing literature, current multi-robot task allocation algorithms mainly include genetic algorithms, ant colony algorithms, and reinforcement learning algorithms. Among these, genetic algorithms, with their excellent search capabilities and adaptive characteristics, have become the most widely used intelligent heuristic algorithms, playing a key role in the orderly operation of agricultural multi-robot systems, while ant colony algorithms and reinforcement learning algorithms have promising research prospects for the future. Section 3 summarizes the path planning strategies for multi-robot systems in complex agricultural scenarios based on the implementation of multi-robot task allocation, reviewing various heuristic algorithms such as genetic algorithms and particle swarm algorithms, as well as sampling-based algorithms like RRT. Heuristic algorithms like genetic and particle swarm algorithms are widely applied in the harvesting operations of strawberries, kiwis, and others. The rapid random tree algorithm, with its adaptability to high-dimensional spaces, provides solutions for fruit harvesting in complex orchard environments. Section 4 reviews cutting-edge reinforcement learning algorithms, including PPO, DQN, and AC algorithms, and their application in the path planning of agricultural robots. Thanks to the strong self-learning and decision-making capabilities of reinforcement learning, path planning efficiency is significantly improved compared to traditional heuristic algorithms. However, they generally face challenges such as high training costs and difficulties in convergence. Additionally, current research is mainly focused on single-robot path planning, with relatively little research on multi-robot path planning. Nevertheless, reinforcement learning algorithms have broad prospects in the path planning of multi-robot systems.

The use of multiple robot systems in agriculture not only demonstrates the progress in agricultural machinery technology but also signifies a crucial shift toward advanced intelligent automation in the field. As technology continues to evolve, these robot systems will play a key role in modernizing agriculture and promoting sustainable development. However, the suitability of such systems for farmers’ investments will ultimately depend on whether the prices of agricultural products justify the costs associated with these investments.

Author Contributions

Conceptualization, X.G. and S.W.; methodology, X.G. and Q.F.; formal analysis, X.G. and C.X.; investigation, Y.L. and C.X.; data curation, S.W. and Q.F.; writing—original draft preparation, X.G.; supervision, S.W. and Y.L.; project administration, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by Shandong Provincial Natural Science Foundation, grant number ZR2024QC084.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ji, W.; Huang, X.; Wang, S.; He, X. A comprehensive review of the research of the “Eye–Brain–Hand” harvesting system in smart agriculture. Agronomy 2023, 13, 2237. [Google Scholar] [CrossRef]
Yerebakan, M.O.; Hu, B. Human–robot collaboration in modern agriculture: A review of the current research landscape. Adv. Intell. Syst. 2024, 6, 2300823. [Google Scholar] [CrossRef]
Han, C.; Lv, J.; Dong, C.; Li, J.; Luo, Y.; Wu, W.; Abdeen, M.A. Classification, advanced technologies, and typical applications of end-effector for fruit and vegetable picking robots. Agriculture 2024, 14, 1310. [Google Scholar] [CrossRef]
Zhang, M.; Wang, S. Agricultural Unmanned Systems: Empowering Agriculture with Automation. Agronomy 2024, 14, 1203. [Google Scholar] [CrossRef]
Xu, L.; Zhao, S.; Ma, S.; Niu, C.; Yan, C.; Lu, C. Optimized design and experiment of the precise obstacle avoidance control system for a grape interplant weeding machine. Trans. Chin. Soc. Agric. Eng. 2021, 37, 31–39. [Google Scholar]
Zhan, J.; Jiang, Y. Industrialization Trends and Multi-arm Technology Direction of Harvesting Robots. Trans. Chin. Soc. Agric. Mach. 2024, 55, 1–17. [Google Scholar]
Chen, J.; Ma, W.; Liao, H.; Lu, J.; Yang, Y.; Qian, J.; Xu, L. Balancing Accuracy and Efficiency: The Status and Challenges of Agricultural Multi-Arm Harvesting Robot Research. Agronomy 2024, 14, 2209. [Google Scholar] [CrossRef]
Zhai, C.; Yang, S.; Wang, X.; Zhang, C.; Song, J. Status and prospect of intelligent measurement and control technology for agricultural equipment. Trans. Chin. Soc. Agric. Mach. 2022, 53, 1–20. [Google Scholar]
AGROB0T Meet the E-Series-the First Pre-Commercial Robotic Harvesters for Gently Harvest Strawberries [EB/0L]. Available online: https://www.agrobot.com/e-series (accessed on 20 July 2024).
FF Roboties [EB/OL]. Available online: https://www.ffrobotics.com (accessed on 20 July 2024).
Kumar, S.; Mohan, S.; Skitova, V. Designing and implementing a versatile agricultural robot: A vehicle manipulator system for efficient multitasking in farming operations. Machines 2023, 11, 776. [Google Scholar] [CrossRef]
Wu, Q.; Zhao, H.; Chen, X.; Zhao, Y. Review of technology, application status and development trend in multi-arm cooperative robots. J. Mech. Eng. 2023, 59, 1–16. [Google Scholar]
He, Z.; Ma, L.; Wang, Y.; Wei, Y.; Ding, X.; Li, K.; Cui, Y. Double-arm cooperation and implementing for harvesting kiwifruit. Agriculture 2022, 12, 1763. [Google Scholar] [CrossRef]
Luo, J.W.; Xu, J.; Hou, Y.; Xu, H.; Wu, W.; Zhang, H.T. Task-Oriented Collision Avoidance in Fixed-Base Multi-manipulator Systems. In Proceedings of the Intelligent Robotics and Applications: 13th International Conference, ICIRA 2020, Kuala Lumpur, Malaysia, 5–7 November 2020; Proceedings 13. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 76–87. [Google Scholar]
Levy, A.; Livingston, T.; Wang, C.; Achor, D.; Vashisth, T. Canopy density, but not bacterial titers, predicts fruit yield in huanglongbing-affected sweet orange trees. Plants 2023, 12, 290. [Google Scholar] [CrossRef] [PubMed]
Guo, H.; Miao, Z.; Ji, J.C.; Pan, Q. An effective collaboration evolutionary algorithm for multi-robot task allocation and scheduling in a smart farm. Knowl.-Based Syst. 2024, 289, 111474. [Google Scholar] [CrossRef]
Ladon, T.; Chandel, J.S.; Sharma, N.C.; Verma, P. Optimizing apple orchard management: Investigating the impact of planting density, training systems and fertigation levels on tree growth, yield and fruit quality. Sci. Hortic. 2024, 334, 113329. [Google Scholar] [CrossRef]
Kurtser, P.; Edan, Y. Planning the sequence of tasks for harvesting robots. Robot. Auton. Syst. 2020, 131, 103591. [Google Scholar] [CrossRef]
Wu, R.; Yin, Y.; Xu, K. Multi-task Collaborative Modeling and Assignment Method of Manipulator. Mech. Sci. Technol. Aerosp. Eng. 2020, 39, 433–437. [Google Scholar]
Li, T.; Qiu, Q.; Zhao, C.J.; Xie, F. Task planning of multi-arm harvesting robots for high-density dwarf orchards. Trans. CSAE 2021, 37, 1–10. [Google Scholar]
Cui, Y.; Xu, Z.; Zhong, L.; Xu, P.; Shen, Y.; Tang, Q. A task-adaptive deep reinforcement learning framework for dual-arm robot manipulation. IEEE Trans. Autom. Sci. Eng. 2024, 22, 466–479. [Google Scholar] [CrossRef]
Xie, F.; Guo, Z.; Li, T.; Feng, Q.; Zhao, C. Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning. Horticulturae 2025, 11, 88. [Google Scholar] [CrossRef]
Xuhai, Y.; Wenhao, Z.; Yufeng, L.; Xiaochen, Q.; Qian, Z. Review of path planning algorithms for picking manipulator. J. Chin. Agric. Mech. 2023, 44, 161. [Google Scholar]
Gao, R.; Zhou, Q.; Cao, S.; Jiang, Q. Apple-picking robot picking path planning algorithm based on improved PSO. Electronics 2023, 12, 1832. [Google Scholar] [CrossRef]
Feng, Q.; Zhao, C.; Li, T.; Chen, L.; Guo, X.; Xie, F.; Xiong, Z.; Chen, K.; Liu, C.; Yan, T. Design and test of a four-arm apple harvesting robot. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2023, 39, 25–33. [Google Scholar]
Orisatoki, M.; Amouzadi, M.; Dizqah, A. A heuristic informative-path-planning algorithm to map unknown areas and a benchmark solution. In Proceedings of the2024 IEEE Conference on Control Technology and Applications (CCTA), Northumbria University, Newcastle upon Tyne, UK, 21–23 August 2024; IEEE: New York, NY, USA, 2024; pp. 254–261. [Google Scholar]
Dijkstra, E. A note on two problems in connexion with graphs. In Edsger Wybe Dijkstra: His Life, Work, and Legacy; Association for Computing Machinery: New York, NY, USA, 2022; pp. 287–290. [Google Scholar]
Lambora, A.; Gupta, K.; Chopra, K. Genetic algorithm-A literature review. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; IEEE: New York, NY, USA, 2019; pp. 380–384. [Google Scholar]
Wu, S.; Li, Q.; Wei, W. Application of ant colony optimization algorithm based on triangle inequality principle and partition method strategy in robot path planning. Axioms 2023, 12, 525. [Google Scholar] [CrossRef]
Jiang, L.; Liu, S.; Cui, Y.; Jiang, H. Path planning for robotic manipulator in complex multi-obstacle environment based on improved_RRT. IEEE/ASME Trans. Mechatron. 2022, 27, 4774–4785. [Google Scholar] [CrossRef]
Xu, T. Recent advances in Rapidly-exploring random tree: A review. Heliyon 2024, 10, e32451. [Google Scholar] [CrossRef]
Guo, Z.; Fu, H.; Wu, J.; Han, W.; Huang, W.; Zheng, W.; Li, T. Dynamic Task Planning for Multi-Arm Apple-Harvesting Robots Using LSTM-PPO Reinforcement Learning Algorithm. Agriculture 2025, 15, 588. [Google Scholar] [CrossRef]
Jin, T.; Han, X. Robotic arms in precision agriculture: A comprehensive review of the technologies, applications, challenges, and future prospects. Comput. Electron. Agric. 2024, 221, 108938. [Google Scholar] [CrossRef]
Fountas, S.; Mylonas, N.; Malounas, I.; Rodias, E.; Hellmann Santos, C.; Pekkeriet, E. Agricultural robotics for field operations. Sensors 2020, 20, 2672. [Google Scholar] [CrossRef]
Guo, Z.; Yin, C.; Wu, X.; Cheng, Q.; Wang, J.P.; Zhou, H.P. Research status and prospect of key technologies of fruit picking manipulator. Jiangsu J. Agric. Sci. 2024, 40, 1142–1152. [Google Scholar]
Li, T.; Xie, F.; Zhao, Z.; Zhao, H.; Guo, X.; Feng, Q. A multi-arm robot system for efficient apple harvesting: Perception, task plan and control. Comput. Electron. Agric. 2023, 211, 107979. [Google Scholar] [CrossRef]
Barnett, J.; Duke, M.; Au, C.K.; Lim, S.H. Work distribution of multiple Cartesian robot arms for kiwifruit harvesting. Comput. Electron. Agric. 2020, 169, 105202. [Google Scholar] [CrossRef]
Gou, Y.; Yan, J.; Zhang, F.; Sun, C.Y.; Xu, Y. Research Progress on Vision System and Manipulator of Fruit Picking Robot. Comput. Eng. Appl. 2023, 59, 13–26. [Google Scholar]
Zahid, A.; Mahmud, M.S.; He, L.; Choi, D.; Heinemann, P.; Schupp, J. Development of an integrated 3R end-effector with a cartesian manipulator for pruning apple trees. Comput. Electron. Agric. 2020, 179, 105837. [Google Scholar] [CrossRef]
Zhang, K.; Lammers, K.; Chu, P.; Li, Z.; Lu, R. An automated apple harvesting robot—From system design to field evaluation. J. Field Robot. 2024, 41, 2384–2400. [Google Scholar] [CrossRef]
Yoshida, T.; Onishi, Y.; Kawahara, T.; Fukao, T. Automated harvesting by a dual-arm fruit harvesting robot. Robomech. J. 2022, 9, 19. [Google Scholar] [CrossRef]
Wei, L.; Wang, Q.; Niu, K.; Bai, S.; Wei, L.; Qiu, C.; Han, N. Design and Test of Seed–Fertilizer Replenishment Device for Wheat Seeder. Agriculture 2024, 14, 374. [Google Scholar] [CrossRef]
Li, X.; Chen, W.; Wang, Y.; Yang, S.; Wu, H.; Zhao, C. Design and experiment of an automatic cherry tomato harvesting system based on cascade vision detection. Trans. Chin. Soc. Agric. Eng. 2023, 39, 136–145. [Google Scholar]
Xiong, Y.; Peng, C.; Grimstad, L.; From, P.J.; Isler, V. Development and field evaluation of a strawberry harvesting robot with a cable-driven gripper. Comput. Electron. Agric. 2019, 157, 392–402. [Google Scholar] [CrossRef]
Zhuang, M.; Li, G.; Ding, K. Obstacle avoidance path planning for apple picking robotic arm incorporating artificial potential field and A* algorithm. IEEE Access 2023, 11, 100070–100082. [Google Scholar] [CrossRef]
Fue, K.G.; Porter, W.M.; Barnes, E.M.; Rains, G.C. An extensive review of mobile agricultural robotics for field operations: Focus on cotton harvesting. AgriEng 2020, 2, 150–174. [Google Scholar] [CrossRef]
SepúLveda, D.; Fernández, R.; Navas, E.; Armada, M.; González-De-Santos, P. Robotic aubergine harvesting using dual-arm manipulation. IEEE Access 2020, 8, 121889–121904. [Google Scholar] [CrossRef]
Xiong, Z.; Feng, Q.; Li, T.; Xie, F.; Liu, C.; Liu, L.; Guo, X.; Zhao, C. Dual-Manipulator Optimal Design for Apple Robotic Harvesting. Agronomy 2022, 12, 3128. [Google Scholar] [CrossRef]
Cui, Y.; Ma, L.; He, Z.; Zhu, Y.; Wang, Y.; Li, K. Design and experiment of dual manipulators parallel harvesting platform for kiwifruit based on optimal space. Trans. Chin. Soc. Agric. Mach. 2022, 53, 132–143. [Google Scholar]
Jiang, Y.; Liu, J.; Wang, J.; Li, W.; Peng, Y.; Shan, H. Development of a dual-arm rapid grape-harvesting robot for horizontal trellis cultivation. Front. Plant Sci. 2022, 13, 881904. [Google Scholar] [CrossRef]
Zhang, H.; Li, X.; Wang, L.; Liu, D.; Wang, S. Construction and optimization of a collaborative harvesting system for multiple robotic arms and an end-picker in a trellised pear orchard environment. Agronomy 2023, 14, 80. [Google Scholar] [CrossRef]
Arikapudi, R.; Vougioukas, S.G. Robotic Tree-fruit harvesting with arrays of Cartesian Arms: A study of fruit pick cycle times. Comput. Electron. Agric. 2023, 211, 108023. [Google Scholar] [CrossRef]
Lammers, K.; Zhang, K.; Zhu, K.; Chu, P.; Li, Z.; Lu, R. Development and evaluation of a dual-arm robotic apple harvesting system. Comput. Electron. Agric. 2024, 227, 109586. [Google Scholar] [CrossRef]
Yu, X.; Fan, Z.; Wang, X.; Wan, H.; Wang, P.; Zeng, X.; Jia, F. A lab-customized autonomous humanoid apple harvesting robot. Comput. Electr. Eng. 2021, 96, 107459. [Google Scholar] [CrossRef]
Xiong, Y.; Ge, Y.; Grimstad, L.; From, P.J. An autonomous strawberry-harvesting robot: Design, development, integration, and field evaluation. J. Field Robot. 2020, 37, 202–224. [Google Scholar] [CrossRef]
Williams, H.; Jones, M.; Nejati, M.; Seabright, M.; Bell, J.; Penhall, N.; Barnett, J.; Duck, M.; Scarfe, A.; Ahn, H.; et al. Robotic kiwifruit harvesting using machine vision, convolutional neural networks, and robotic arms. Biosyst. Eng. 2019, 181, 140–156. [Google Scholar] [CrossRef]
Shi, K.; Wu, Z.; Jiang, B.; Karimi, H.R. Dynamic path planning of mobile robot based on improved simulated annealing algorithm. J. Frankl. Inst. 2023, 360, 4378–4398. [Google Scholar] [CrossRef]
Yang, S.; Jia, B.; Yu, T.; Yuan, J. Research on multiobjective optimization algorithm for cooperative harvesting trajectory optimization of an intelligent multiarm straw-rotting fungus harvesting robot. Agriculture 2022, 12, 986. [Google Scholar] [CrossRef]
Kulathunga, G. A reinforcement learning based path planning approach in 3D environment. Procedia Comput. Sci. 2022, 212, 152–160. [Google Scholar] [CrossRef]
Li, T.; Xie, F.; Qiu, Q.; Feng, Q. Multi-arm robot task planning for fruit harvesting using multi-agent reinforcement learning. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: New York, NY, USA, 2023; pp. 4176–4183. [Google Scholar]
Gong, A.; Yang, K.; Lyu, J.; Li, X. A two-stage reinforcement learning-based approach for multi-entity task allocation. Eng. Appl. Artif. Intell. 2024, 136, 108906. [Google Scholar] [CrossRef]
Feng, Z.; Hu, G.; Sun, Y.; Soon, J. An overview of collaborative robotic manipulation in multi-robot systems. Annu. Rev. Control. 2020, 49, 113–127. [Google Scholar] [CrossRef]
Chou, J.S.; Molla, A. Recent advances in use of bio-inspired jellyfish search algorithm for solving optimization problems. Sci. Rep. 2022, 12, 19157. [Google Scholar] [CrossRef] [PubMed]
Nadimi-Shahraki, M.H.; Zamani, H.; Asghari Varzaneh, Z.; Mirjalili, S. A systematic review of the whale optimization algorithm: Theoretical foundation, improvements, and hybridizations. Arch. Comput. Methods Eng. 2023, 30, 4113–4159. [Google Scholar] [CrossRef]
Yuan, X.; Yuan, X.; Wang, X. Path planning for mobile robot based on improved bat algorithm. Sensors 2021, 21, 4389. [Google Scholar] [CrossRef]
Liu, Y.; As’ arry, A.; Hassan, M.K.; Hairuddin, A.A.; Mohamad, H. Review of the grey wolf optimization algorithm: Variants and applications. Neural Comput. Appl. 2024, 36, 2713–2735. [Google Scholar] [CrossRef]
Liu, Y.; Ren, Y.; Wang, J.; Zhao, L.; Wang, Q.; Shan, J. Path Planning for Mobile Robot Based on Improved Artificial Potential Field Method. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; pp. 4757–4762. [Google Scholar]
Xie, J.; Zhang, Z.; Wei, Z.; Ma, S. Simulation of apple picking path planning based on artificial potential field method. IOP Conf. Ser. Earth Environ. Sci. 2019, 252, 052148. [Google Scholar] [CrossRef]
Chen, Z.; Ma, L.; Shao, Z. Path planning for obstacle avoidance of manipulators based on improved artificial potential field. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; IEEE: New York, NY, USA, 2019; pp. 2991–2996. [Google Scholar]
Guo, H.; Qiu, Z.; Gao, G.; Wu, T.; Chen, H.; Wang, X. Safflower Picking Trajectory Planning Strategy Based on an Ant Colony Genetic Fusion Algorithm. Agriculture 2024, 14, 622. [Google Scholar] [CrossRef]
Meng, X.; Zhu, X. Autonomous obstacle avoidance path planning for grasping manipulator based on elite smoothing ant colony algorithm. Symmetry 2022, 14, 1843. [Google Scholar] [CrossRef]
Yan, B.; Quan, J.; Yan, W. Three-Dimensional Obstacle Avoidance Harvesting Path Planning Method for Apple-Harvesting Robot Based on Improved Ant Colony Algorithm. Agriculture 2024, 14, 1336. [Google Scholar] [CrossRef]
Ling, X.; Zhao, Y.; Gong, L.; Liu, C.; Wang, T. Dual-arm cooperation and implementing for robotic harvesting tomato using binocular vision. Robot. Auton. Syst. 2019, 114, 134–143. [Google Scholar] [CrossRef]
Bao, X.; Shi, X.; Ma, X.; Leng, J.; Ma, Z.; Ren, M.; Li, S. Design and experiment of citrus picking system based on dual robot collaboration. J. Eng. 2024, 2024, e12419. [Google Scholar] [CrossRef]
Sheng, G.; Jie, Z.; He, C. Genetic algorithm-based path planning of coordinated multi-robot manipulators. In Proceedings of the IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, 2003, Changsha, China, 8–13 October 2003; Proceedings 2003. IEEE: New York, NY, USA, 2003; Volume 2, pp. 763–767. [Google Scholar]
Cao, X.; Yan, H.; Huang, Z.; Ai, S.; Xu, Y.; Fu, R.; Zou, X. A multi-objective particle swarm optimization for trajectory planning of fruit picking manipulator. Agronomy 2021, 11, 2286. [Google Scholar] [CrossRef]
Wang, M.; Luo, J.; Yuan, J.; Walter, U. Coordinated trajectory planning of dual-arm space robot using constrained particle swarm optimization. Acta Astronaut. 2018, 146, 259–272. [Google Scholar] [CrossRef]
Huang, W.; Miao, Z.; Wu, T.; Guo, Z.; Han, W.; Li, T. Design of and Experiment with a Dual-Arm Apple Harvesting Robot System. Horticulturae 2024, 10, 1268. [Google Scholar] [CrossRef]
Cai, J.; Wang, F.; Lü, Q.; Wang, J. Real-time path planning for citrus picking robot based on SBL-PRM. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2009, 25, 158–162. [Google Scholar]
Chen, G.; Luo, N.; Liu, D.; Zhao, Z.; Liang, C. Path planning for manipulators based on an improved probabilistic roadmap method. Robot. Comput.-Integr. Manuf. 2021, 72, 102196. [Google Scholar] [CrossRef]
Cheng, Q.; Zhang, W.; Liu, H.; Zhang, Y.; Hao, L. Research on the path planning algorithm of a manipulator based on GMM/GMR-MPRM. Appl. Sci. 2021, 11, 7599. [Google Scholar] [CrossRef]
Cao, X.; Zou, X.; Jia, C.; Chen, M.; Zeng, Z. RRT-based path planning for an intelligent litchi-picking manipulator. Comput. Electron. Agric. 2019, 156, 105–118. [Google Scholar] [CrossRef]
Ye, L.; Duan, J.; Yang, Z.; Zou, X.; Chen, M.; Zhang, S. Collision-free motion planning for the litchi-picking robot. Comput. Electron. Agric. 2021, 185, 106151. [Google Scholar] [CrossRef]
Liu, C.; Feng, Q.; Tang, Z.; Wang, X.; Geng, J.; Xu, L. Motion planning of the citrus-picking manipulator based on the TO-RRT algorithm. Agriculture 2022, 12, 581. [Google Scholar] [CrossRef]
Li, X.; Yang, J.; Wang, X.; Fu, L.; Li, S. Adaptive Step RRT*-Based Method for Path Planning of Tea-Picking Robotic Arm. Sensors 2024, 24, 7759. [Google Scholar] [CrossRef]
Wang, Y.; Liu, D.; Zhao, H.; Li, Y.; Song, W.; Liu, M.; Tian, L.; Yan, X. Rapid citrus harvesting motion planning with pre-harvesting point and quad-tree. Comput. Electron. Agric. 2022, 202, 107348. [Google Scholar] [CrossRef]
Wang, X.; Luo, X.; Han, B.; Chen, Y.; Liang, G.; Zheng, K. Collision-free path planning method for robots based on an improved rapidly-exploring random tree algorithm. Appl. Sci. 2020, 10, 1381. [Google Scholar] [CrossRef]
Hui, L.; Shiyi, Z.; Yunpeng, D.; Weidong, J.; Yue, S. Orchard Robot Motion Planning Algorithm Based on Improved Bidirectional RRT. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2022, 53. [Google Scholar]
Yang, L.; Da, X. Cooperative path planning of dual-arm robot based on attractive force self-dadptive step size RRT. Robot 2020, 42, 606–616. [Google Scholar]
Kim, D.; Lim, S.; Lee, D.; Lee, J.; Han, C. An RRT-based motion planning of dual-arm robot for (Dis) assembly tasks. In Proceedings of the IEEE ISR 2013, Seoul, Republic of Korea, 24–26 October 2013; pp. 1–6. [Google Scholar]
Shi, W.; Wang, K.; Zhao, C.; Tian, M. Obstacle avoidance path planning for the dual-arm robot based on an improved RRT algorithm. Appl. Sci. 2022, 12, 4087. [Google Scholar] [CrossRef]
Gammell, J.; Srinivasa, S.; Barfoot, T. Batch informed trees (BIT*): Sampling-based optimal planning via the heuristically guided search of implicit random geometric graphs. In Proceedings of the 2015 IEEE international conference on robotics and automation (ICRA), Seattle, WA, USA, 26–30 May 2015; IEEE: New York, NY, USA, 2015; pp. 3067–3074. [Google Scholar]
Ma, P.; Zhu, A.; Chen, Y.; Tu, Y.; Mao, H.; Song, J.; Wang, X.; Su, S.; Li, D.; Dong, X. Multi objective motion planning of fruit harvesting manipulator based on improved BIT* algorithm. Comput. Electron. Agric. 2024, 227, 109567. [Google Scholar] [CrossRef]
Han, D.; Mulyana, B.; Stankovic, V.; Cheng, S. A survey on deep reinforcement learning algorithms for robotic manipulation. Sensors 2023, 23, 3762. [Google Scholar] [CrossRef]
Cheng, Y.; Guo, Q.; Wang, X. Proximal Policy Optimization with Advantage Reuse Competition. IEEE Trans. Artif. Intell. 2024, 5, 3915–3925. [Google Scholar] [CrossRef]
Wang, J.; Sun, H.; Zhu, C. Vision-based autonomous driving: A hierarchical reinforcement learning approach. IEEE Trans. Veh. Technol. 2023, 72, 11213–11226. [Google Scholar] [CrossRef]
Qi, C.; Wu, C.; Lei, L.; Li, X.; Cong, P. UAV path planning based on the improved PPO algorithm. In Proceedings of the 2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering (ARACE), Qingdao, China, 26–28 August 2022; IEEE: New York, NY, USA, 2022; pp. 193–199. [Google Scholar]
Yang, S.; Wu, D.; Pan, Y.; He, Y. Research on Manipulator Control Based on Improved Proximal Policy Optimization Algorithm. In Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC), Hefei, China, 15–17 August 2022; IEEE: New York, NY, USA, 2022; pp. 4301–4306. [Google Scholar]
Bo, Y.; Kun, W.; Xiang, M. Research on motion control method of manipulator based on reinforcement learning. Comput. Eng. Appl. 2023, 59, 318–325. [Google Scholar]
Lin, J.; Wang, H.; Zou, X.; Zhang, P.; Li, C.; Zhou, Y.; Yao, S. Obstacle avoidance path planning and simulation of mobile picking robot based on DPPO. J. Syst. Simul. 2023, 35, 1692–1704. [Google Scholar]
Li, X.; Zhang, J.; Guo, X.; Wu, G. Reinforcement learning-based optimization algorithm for energy management and path planning of robot chassis. Trans. Chin. Soc. Agric. Eng. 2024, 40, 175–183. [Google Scholar]
Xie, T.; Zhou, Y. Ant colony enhanced q-learning algorithm for mobile robot path planning. In Proceedings of the 2024 36th Chinese Control and Decision Conference (CCDC), Xi’an, China, 25–27 May 2024; IEEE: New York, NY, USA, 2024; pp. 5001–5006. [Google Scholar]
Low, E.S.; Ong, P.; Low, C.Y.; Omar, R. Modified Q-learning with distance metric and virtual target on path planning of mobile robot. Expert Syst. Appl. 2022, 199, 117191. [Google Scholar] [CrossRef]
Liu, Y.; Gao, P.; Zheng, C.; Tian, L.; Tian, Y. A deep reinforcement learning strategy combining expert experience guidance for a fruit-picking manipulator. Electronics 2022, 11, 311. [Google Scholar] [CrossRef]
Wang, Y.H.; Li, T.H.S.; Lin, C.J. Backward Q-learning: The combination of Sarsa algorithm and Q-learning. Eng. Appl. Artif. Intell. 2013, 26, 2184–2193. [Google Scholar] [CrossRef]
Maoudj, A.; Hentout, A. Optimal path planning approach based on Q-learning algorithm for mobile robots. Appl. Soft Comput. 2020, 97, 106796. [Google Scholar] [CrossRef]
Li, Q.; Ma, H.; Xiao, H. Robot Path Planning Based on Improved DQN Algorithm. Comput. Telecommun. 2024, 1, 37–41. [Google Scholar]
Wang, Y.; He, Z.; Cao, D.; Ma, L.; Li, K.; Jia, L.; Cui, Y. Coverage path planning for kiwifruit picking robots based on deep reinforcement learning. Comput. Electron. Agric. 2023, 205, 107593. [Google Scholar] [CrossRef]
Yang, J.; Ni, J.; Li, Y.; Wen, J.; Chen, D. The intelligent path planning system of agricultural robot via reinforcement learning. Sensors 2022, 22, 4316. [Google Scholar] [CrossRef]
Prianto, E.; Park, J.H.; Bae, J.H.; Kim, J.S. Deep reinforcement learning-based path planning for multi-arm manipulators with periodically moving obstacles. Appl. Sci. 2021, 11, 2587. [Google Scholar] [CrossRef]
Xiong, C.; Xiong, J.; Yang, Z.; Hu, W. Path planning method for citrus picking manipulator based on deep reinforcement learning. J. South China Agric. Univ. 2023, 44, 473–483. [Google Scholar]
Tao, B.; Kim, J.H. Deep reinforcement learning-based local path planning in dynamic environments for mobile robot. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102254. [Google Scholar] [CrossRef]
Xiong, J.; Li, Z.; Chun, S.; Zheng, Z. Obstacle avoidance planning of virtual robot picking path based on deep reinforcement learning. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2020, 51, 1–10. [Google Scholar]
Yang, S.; Xie, X.; Bing ZHao, J.; Zhang, X.; Yuan, D. Path Planning of Green Walnut Picking Robotic Arm Based on HER-TD3 Algorithm. Trans. Chin. Soc. Agric. Mach. 2024, 55, 113–123. [Google Scholar]
Dong, Y.; Zou, X. Mobile robot path planning based on improved DDPG reinforcement learning algorithm. In Proceedings of the 2020 IEEE 11th International Conference on software engineering and service science (ICSESS), Beijing, China, 16–18 August 2020; IEEE: New York, NY, USA, 2020; pp. 52–56. [Google Scholar]
Lin, G.; Zhu, L.; Li, J.; Zou, X.; Tang, Y. Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning. Comput. Electron. Agric. 2021, 188, 106350. [Google Scholar] [CrossRef]
Chang, Z.; Po, G.; Hao, G.; Ye, T.; Yan, Z. Trajectory planning method for apple picking manipulator based on stepwise migration strategy. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2020, 51, 15–23. [Google Scholar]

Figure 2. Division of multi-robot arms working area based on task distribution. (a) Division of six Cartesian robot arms; (b) division of four Cartesian robot arms.

Figure 3. Picking sequence without fruits. (a) Harvesting sequence of four armed apples; (b) double-arm grape picking sequence; (c) double-arm strawberries picking sequence.

Figure 5. Timeline of reinforcement learning algorithm development.

Figure 6. Movement trajectory of apple-picking robotic arm. (a) Path planning of multi-DOF Robot arm based on RRT; (b) path planning of multi-DOF Robot arm based on expert experience.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gai, X.; Xu, C.; Liu, Y.; Feng, Q.; Wang, S. Applications of Multi-Robotic Arms to Assist Agricultural Production: A Review. AgriEngineering 2025, 7, 192. https://doi.org/10.3390/agriengineering7060192

AMA Style

Gai X, Xu C, Liu Y, Feng Q, Wang S. Applications of Multi-Robotic Arms to Assist Agricultural Production: A Review. AgriEngineering. 2025; 7(6):192. https://doi.org/10.3390/agriengineering7060192

Chicago/Turabian Style

Gai, Xiaojian, Chang Xu, Yajia Liu, Qingchun Feng, and Shubo Wang. 2025. "Applications of Multi-Robotic Arms to Assist Agricultural Production: A Review" AgriEngineering 7, no. 6: 192. https://doi.org/10.3390/agriengineering7060192

APA Style

Gai, X., Xu, C., Liu, Y., Feng, Q., & Wang, S. (2025). Applications of Multi-Robotic Arms to Assist Agricultural Production: A Review. AgriEngineering, 7(6), 192. https://doi.org/10.3390/agriengineering7060192

Article Menu

Applications of Multi-Robotic Arms to Assist Agricultural Production: A Review

Abstract

1. Introduction

2. Task Allocation for Agricultural Multi-Robotic Arms

2.1. Division of Working Areas for Multi-Robotic Arms

2.1.1. Robotic Arm Classification

2.1.2. Regional Division

2.2. Operation Sequence of Agricultural Multi-Robot Arms

2.3. Task Allocation Algorithm

2.3.1. Intelligent Heuristic Algorithm

2.3.2. Reinforcement Learning Algorithm

3. Path Planning of Agricultural Multi-Robot Arms Based on Intelligent Algorithm

3.1. Heuristic Algorithm

3.2. Path Planning Algorithm Based on Probability Sampling

3.2.1. Probability Roadmap

3.2.2. Rapidly Exploring Random Trees

4. Path Planning of Agricultural Multi-Robot Arms Based on Reinforcement Learning Algorithm

4.1. Strategy Based Reinforcement Learning Algorithm

4.2. Reinforcement Learning Based on Q Value

4.2.1. Q-Learning Algorithm

4.2.2. Deep Q-Learning

4.3. Actor–Critic Algorithm

4.3.1. Soft Actor–Critic

4.3.2. Twin Delayed Deep Deterministic Policy Gradient Algorithm

4.3.3. Deep Deterministic Policy Gradient

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI