A Review on the High-Efficiency Detection and Precision Positioning Technology Application of Agricultural Robots

Wang, Ruyi; Chen, Linhong; Huang, Zhike; Zhang, Wei; Wu, Shenglin

doi:10.3390/pr12091833

Open AccessReview

A Review on the High-Efficiency Detection and Precision Positioning Technology Application of Agricultural Robots

by

Ruyi Wang

,

Linhong Chen

,

Zhike Huang

,

Wei Zhang

and

Shenglin Wu

^*

School of Intelligent Manufacturing and Electrical Engineering, Guangzhou Institute of Science and Technology, Guangzhou 510540, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(9), 1833; https://doi.org/10.3390/pr12091833

Submission received: 23 June 2024 / Revised: 10 August 2024 / Accepted: 26 August 2024 / Published: 28 August 2024

(This article belongs to the Section Advanced Digital and Other Processes)

Download

Browse Figures

Versions Notes

Abstract

:

The advancement of agricultural technology has increasingly positioned robotic detection and localization techniques at the forefront, ensuring critical support for agricultural development through their accuracy and reliability. This paper provides an in-depth analysis of various methods used in detection and localization, including UWB, deep learning, SLAM, and multi-sensor fusion. In the domain of detection, the application of deep algorithms in assessing crop maturity and pest analysis is discussed. For localization, the accuracy of different methods in target positioning is examined. Additionally, the integration of convolutional neural networks and multi-sensor fusion with deep algorithms in agriculture is reviewed. The current methodologies effectively mitigate environmental interference, significantly enhancing the accuracy and reliability of agricultural robots. This study offers directional insights into the development of robotic detection and localization in agriculture, clarifying the future trajectory of this field and promoting the advancement of related technologies.

Keywords:

detection and localization; ultra-wideband (UWB) technology; deep learning; simultaneous localization and mapping (SLAM); multi-sensor fusion; convolutional neural networks (CNN)

1. Introduction

With the ongoing advancement of technology, agricultural robots have increasingly become a key component of modern agriculture. They play a vital role in enhancing production efficiency, reducing labor costs, and improving crop quality. In particular, their detection and positioning technologies represent advanced and critical methods in modern agriculture. Compared to traditional manual methods, these technologies provide greater accuracy and reliability, underscoring their substantial potential for application and development in the agricultural sector [1]. This is particularly evident in the precise sowing and fertilization functions of agricultural robots, which depend on advanced positioning technologies. By employing the Global Positioning System (GPS) and ground sensors, agricultural robots can navigate accurately between crop rows to ensure uniform sowing and fertilization. Furthermore, they can utilize various detection technologies for pest and disease monitoring, facilitating targeted spraying and treatment. In recent years, researchers have introduced various technological methods in this field, establishing a solid foundation for advancements in robot detection and positioning within agriculture.

Currently, scholars have carried out an extensive analysis and research on various technical methods related to the detection and positioning of agricultural robots. In terms of robot detection, W. Li et al. [2] effectively improved the accuracy and recall rate in the rapid detection of small moving targets by combining the traditional EMA algorithm with the original YOLOv8’s C2F module. SU et al. [3] achieved real-time segmentation of weed between rows of blackgrass in wheat fields using a deep neural network (DNN) based on the geometric position of blackgrass. R. Liu et al. [4] enhanced the capability of FRICH-RCNN in small target detection tasks in remote sensing images through online hard example mining, feature pyramid structure, soft netting technology, and ROI-ALIGN technology. V. Bajait et al. [5] utilized optimized deep learning methods to better identify different types of pests and diseases.

In the field of robot localization methods, Ultra-Wideband (UWB) technology is widely adopted due to its sensitivity to environmental conditions and power characteristics. This article discusses four localization methods: RSSI, AOA, TOA, and enhanced TODA, all utilizing UWB technology. Additionally, it proposes a multi-sensor localization method based on graph-based extended RTab-Map SLAM, leveraging data fusion capabilities and online assessment for precise robot positioning. Since most SLAM methods are either vision-based or lidar-based, comparing them proves challenging. Consequently, we have decided to extend the RTAB map to support both vision-based and lidar-based SLAM [6]. This approach integrates visual sensors for data analysis and laser sensors for accurate localization in agricultural robots. Building on this, Choudhury, Romit Roy, and colleagues [7] introduce SemancSLAM, an unsupervised indoor localization solution using unique environmental features as landmarks and combining dead reckoning within a novel SLAM framework to minimize localization errors and convergence time. Mur-Artal et al. [8] develop a comprehensive SLAM system for mobile, stereo, and RGB-D sensors, ensuring long-term stable localization in expansive areas. Della Corte et al. [9] investigate optimization challenges for mobile platforms equipped with diverse sensors, simultaneously estimating kinematic parameters, sensor extrinsics, and delays while analyzing trajectory impacts on estimation accuracy. Furthermore, He, Yue, and co-authors [10] utilize a dual-layer detection algorithm based on deep learning techniques for rapid and accurate positioning and detection of brown planthoppers. Ayan et al. [11] study a convolutional neural network-based predictive model for comparative and classificatory analysis of various pests post-localization detection.

The research outcomes indicate that the detection and positioning of robots provide accuracy and efficiency for the overall agricultural production, primarily utilizing techniques such as deep algorithms, UWB technology, multi-sensor fusion, and neural convolutional networks, among others. This article provides an overview of the aforementioned technical methods and analyzes the experimental results in the respective technical fields of the researchers, aiming to guide the future development direction of robot detection and positioning, laying a foundation for exploring advanced technical methods thereafter.

2. Detection of Agricultural Pests in Farmland Based on Image Processing Algorithms

2.1. Pest Detection Based on the Faster R-CNN Algorithm

In recent years, with the development of deep learning, deep learning-based object detection methods have been applied to various visual detection tasks. In particular, deep convolutional neural networks (CNNs) have achieved remarkable results in object detection. Researchers have proposed numerous excellent algorithmic frameworks, with Faster-RCNN being one of the classic algorithms in the field of object detection. Due to its superior detection performance, Faster-RCNN has been widely applied in pest detection. The two-stage object detection methods based on Faster-RCNN first generate a large number of region proposals, then use convolutional neural networks to extract features for each proposal, and finally classify each region and localize bounding boxes. However, due to the insufficient speed of detection, the practicality of these methods remains limited [12]. In recent years, researchers have improved Faster-RCNN by addressing its limitations, such as slow training speed and high computational resource demands. They have incorporated various strategies to enhance its accuracy and practicality in pest detection.

Wang et al. [13] combined ResNet-50 with a Feature Pyramid Network (FPN) to extend the receptive field of the lower layers and enhance the algorithm’s ability to extract features from small objects. They employed ROI Align to provide more precise candidate frame regions for the network and incorporated Convolutional Block Attention Modules (CBAMs) into the feature extraction network to obtain more important features. The authors utilized the IP-102 dataset and employed web scraping techniques to capture additional natural scene insect images, forming a dataset containing 2485 insect pictures. Experimental results demonstrate that the improved algorithm outperforms the original Faster-RCNN and other mainstream methods in terms of both accuracy and speed. Data enhancement technology is another effective way to improve the detection performance. Patel and Bhatt [14] investigated the impact of various data augmentation methods on the performance of Faster-RCNN for pest detection, as shown in Figure 1. The author utilized a dataset of pest images captured using a regular smartphone for experimentation, converting the pest images into TFRecords files for use on the TensorFlow platform. The model employed Inception v2 as the feature extractor during training. The author made a list analysis of the data after the experiment. Experimental results demonstrate that data augmentation significantly enhances detection accuracy, particularly benefiting from it in scenarios with limited samples and complex environments. This underscores the importance and effectiveness of data augmentation in pest detection tasks. Developing a complete pest identification system is crucial for the practical application of detection algorithms. Zhang et al. [15] designed an end-to-end pest recognition system based on the Faster-RCNN framework, as illustrated in Figure 2. The system utilizes ResNet-50 to extract deep semantic features and incorporates a Feature Pyramid Network (FPN) to achieve multi-scale feature fusion, enhancing adaptability to pests of various sizes. The experimental setup utilized a GeForce GTX TITAN X 12GB GPU (NVIDIA Semiconductor Technology Inc., Santa Clara, CA, USA) and Ubuntu 16.04 (Kylin Information Technology Co., Ltd., Tianjin, China) as the operating system. The authors augmented a set of anchors based on the VOC dataset and combined them with real-world scenes to create their own dataset. This system introduces Region of Interest (ROI) to achieve multi-scale feature fusion after extracting deep semantic features. Experimental data indicate that the recognition accuracy of this system surpasses other methods, providing a valuable reference for pest detection. As data privacy and security receive increasing attention, Deng et al. [16] innovatively combined federated learning with an improved Faster R-CNN, proposing a distributed framework for the detection of various plant diseases and pests, as illustrated in Figure 3. This framework allows for collaborative training of more effective detection models while protecting data privacy. Additionally, the authors incorporated attention mechanisms and multi-scale feature fusion strategies into Faster R-CNN, further enhancing the model’s feature extraction and fusion capabilities. The authors utilized the image sample expansion method proposed in this study on 445 photos from the original dataset to obtain a large number of images for their experimental dataset. Experimental results demonstrate that this framework not only ensures data security but also enhances the model’s feature extraction and fusion capabilities, offering new insights for technological advancement.

Due to its superior detection performance, Faster-RCNN has become a research hotspot in the field of pest detection. Researchers have made significant progress by optimizing from multiple angles, including algorithm improvements, data augmentation, system construction, and federated learning. In the future, we still need to develop more lightweight deep learning algorithms that are better suited for small datasets, especially when dealing with smaller target pests [17]. Continuing to enhance the performance of Faster-RCNN in pest detection tasks will inject new momentum into the development of smart agriculture.

2.2. Detection for Improvement and Applications of Faster R-CNN and ResNet AlgorithMS

The identification of harmful organisms in agriculture is crucial for smart farming. Automated monitoring of crop diseases and pests is a key component of agricultural resource management and optimization. Although standard CNN-based algorithms generally provide satisfactory results for object detection, methods specifically for detecting and identifying small objects like pests are quite rare. Recently, with the rapid advancement of deep learning technologies, ResNet (Residual Network) and its derivative models have achieved significant success in the field of computer vision. Thanks to ResNet’s ability to enable information flow across layers without degradation and its powerful feature extraction capabilities [18]. Due to the outstanding performance demonstrated by ResNet and its improved algorithms in pest detection tasks, they have garnered widespread attention from researchers.

Gui et al. [19] proposed a non-destructive detection method for soybean pests based on hyperspectral images and an attention ResNet meta-learning model. The experimental setup included a model IPX-2M30 hyperspectral imaging instrument (Lingliang Optoelectronics Technology Co., Ltd., Shanghai, China), a CCD camera, an electronic control translation stage, four 150W halogen lamps, and a computer. The authors utilized the CAVE, iCVL, and NUS datasets as training datasets in this experiment. The method first employs hyperspectral imaging technology to capture soybean leaf images, then enhances ResNet’s ability to capture pest features using attention mechanisms, and finally utilizes meta-learning strategies to enable the model to quickly adapt in small sample scenarios. Figure 4 illustrates the overall framework of the attention ResNet network. Experimental results indicate that this method achieves high accuracy in soybean pest detection tasks while enabling non-destructive detection of plants, demonstrating significant practical value.

Traditional pest recognition methods require a large amount of annotated data, which poses a challenge for pest recognition systems. Dewi et al. [20] pre-trained ResNet models on a large-scale natural image dataset. The authors evaluated the effectiveness of nine different pre-training algorithms using the ResNet model on Deng et al.’s dataset and ImageNet. To reduce the risk of overfitting, they employed transfer learning strategies by applying the pre-trained models to insect pest identification, categorizing them into six classes for comparison, as illustrated in Figure 5. The author evaluated the effectiveness of nine different pre-training algorithms using two popular datasets: Deng et al.’s dataset and the ImageNet dataset. The computational environment used for model training in this experiment included an Nvidia GTX 3060 Super GPU accelerator (NVIDIA Semiconductor Technology Co., Ltd., Shanghai, China) and an AMD Ryzen 7 3700X CPU (Advanced Micro Devices (AMD) Inc., Santa Clara, CA, USA) with an 8-core processor and 32GB DDR4-3200 memory. Experimental results demonstrated that this method reduces the required data and improves the accuracy of pest recognition, providing new insights for pest recognition systems.

The attention mechanism enhances the model’s ability to capture key features by weighting and focusing on different parts of the input data. Hassan and Maji [21] combined the self-attention mechanism with ResNet to establish a dataset model architecture comprising ResNetV1 and ResNetV2 and proposed a novel model, ResNet50-SA, for pest identification. The model utilizes a self-attention mechanism to adaptively adjust the weights of features at different layers of ResNet, thereby better accommodating the characteristics of pest images. Experimental results demonstrated that this model significantly improved accuracy and robustness compared to traditional ResNet and other mainstream methods. Addressing the limitations of ResNet in identifying small insects, Wang et al. [22] proposed an improved ResNet neural network model, S-ResNet. The model, referencing the MS COCO dataset, enhances ResNet’s capability to extract and represent features of small target insects by introducing a feature pyramid structure and attention mechanism. Additionally, the authors designed an IoU-based positive–negative sample partitioning strategy to alleviate the class imbalance issue during the training process. The hardware platform used in the experiment was a desktop computer equipped with an NVIDIA RTX 2060 GPU (NVIDIA Semiconductor Technology Inc., Santa Clara, CA, USA) unit. Experimental results showed that S-ResNet could effectively identify small pests and outperformed the original ResNet and other improved methods. In addition to static images, video data are also commonly used data formats in pest detection. Li et al. [23] employed an improved ResNet model to extract spatial features from video frames and then used Long Short-Term Memory (LSTM) networks to model the feature sequences, capturing the temporal information in the video, and finally synthesized the frames into a video. The authors utilized an image training model to detect relatively blurry videos. Additionally, they proposed a set of video evaluation metrics based on machine learning classifiers, which effectively reflected the quality of video detection in the experiments. The study utilized a dataset composed of images and videos of rice pests and diseases for training. The experiment utilized a Linux server running Ubuntu 16.04, which featured an Intel i7-6700K CPU (Intel Corporation, Santa Clara, CA, USA), 64 GB DDR4 RAM, and two NVIDIA Titan X GPUs (Taiwan Semiconductor Manufacturing Company, Hsinchu, Taiwan). Experimental results indicated that this method effectively identified pest targets in videos, providing a powerful tool for real-time monitoring and control of rice pests and diseases.

ResNet and its improved algorithms have achieved remarkable results in the field of pest detection, thanks to their powerful feature extraction and representation capabilities. Researchers have continuously enhanced the performance of ResNet in pest detection tasks by introducing strategies such as attention mechanisms, feature pyramids, transfer learning, and meta-learning. Additionally, ResNet has been applied to a feature-map-guided pest detection and recognition method, demonstrating good adaptability and practicality [24]. In the future, the fusion of multi-disciplinary technologies is expected to further expand the application scope of ResNet in pest detection, contributing to the sustainable development of smart agriculture.

2.3. The Study Utilized a Dataset Composed of Images and Videos of Rice Pests and Diseases for Training

Faster-RCNN and ResNet are both classic algorithms in the field of object detection, each achieving excellent performance in pest detection tasks. However, challenges and areas for improvement remain due to the specific nature of agricultural environments and the practical needs of pest detection. Both Faster-RCNN and ResNet have notable limitations within their domains. Researchers have employed various methods to mitigate these limitations and enhance detection accuracy. This section will explore the improvements made to both algorithms and the advancements derived from integrating modifications between these two approaches.

Introducing attention mechanisms is one method to enhance model expressiveness. By learning dependencies between features, attention mechanisms adaptively adjust feature weights to highlight salient regions [25]. When integrated with Faster R-CNN and ResNet, attention mechanisms can better address pest detection challenges in complex backgrounds. For instance, a spatial attention module can be introduced after the ROI pooling layer to emphasize regions where pests are likely present by learning correlations between different areas [26]. Du et al. [27] proposed an improved Faster R-CNN-based action recognition method. This approach first detects all objects in an image and predicts interactions between them. During feature extraction, additional parameters are added to the sampling points of each convolutional kernel to enhance the network’s adaptability to complex scenes. In object detection, the attention mechanism is combined with the ResNet network, transitioning the network structure from “post-activation” to “pre-activation”, thus mitigating overfitting. For action prediction, the network focuses on instances within feature maps, detecting surrounding interacting objects based on their appearance features and attention weights, and predicting interaction scores between them. The experiment utilized the commonly used open-source annotation tool LabelMe in Python (Python v3.9) to generate a detection dataset for S. frugiperda based on its feeding habits and characteristics. The model was trained using an NVIDIA GPU 2080 (NVIDIA Semiconductor Technology Inc., Santa Clara, CA, USA) for deep learning computations. Experimental results indicate that, compared to traditional methods, the improved approach better detects actions in images, achieving a mean average precision (mAP) of 67.2%, representing an improvement of nearly 14 percentage points and demonstrating high experimental value.

Data augmentation is a crucial technique for enhancing model generalization and robustness. By applying transformations and combinations to original images to generate new training samples, data augmentation effectively expands the dataset size and improves the model’s adaptability [28]. Liu et al. [29] proposed the PD-IFRCNN model within the Faster R-CNN framework, integrating transfer learning and data augmentation techniques for three-stage pest detection. The authors examined the impact of class imbalance in their custom dataset on the cost of misclassification from two different perspectives: class balance and label balance. They also performed a comparative analysis with an SSD model using VGG16 as the feature extraction network. The experiment was conducted on the Ubuntu 18.04 operating system, using the TensorFlow-GPU-1.13.1 framework for deep learning algorithms. The experiment involved tracking and collecting images of six types of flower diseases during the growth process: Stephanotis blight, brown spot disease, leaf blight, etc. A custom dataset for Rhododendron leaf diseases was created. This method not only breaks away from traditional manual detection approaches for plant diseases but also improves the accuracy of identifying various common pests. Additionally, it adapts better to natural environments, providing new research insights for plant disease recognition. Underwater regions represent a limitation in pest detection areas; Krishnan et al. [30] employed UResNet combined with edge difference loss (EDL) and mean squared edge (MSE) loss for underwater image enhancement. This study utilized UnderwaterGAN (UWGAN) to generate the training dataset. The images generated from the experiments were evaluated using the UIQM metric. Subsequently, UResNet was employed to enhance the input images, and an asynchronous training mode was utilized to improve the functionality of multiple loss functions. Their method enhances image quality and improves visual outcomes of underwater images, offering a methodology for detecting pests below the water surface in rice fields.

Model lightweighting is critical for achieving real-time capability and deployability in pest detection. While Faster R-CNN and ResNet offer excellent performance, there is a current need to explore model compression and acceleration techniques to reduce model size and computational costs while maintaining performance. Aekkasit Krueangsai et al. [31] investigated the impact of shortcut connections on object recognition in lightweight or small-sized ResNet11-based ResNet (RoR) models. They evaluated the recognition performance of traditional ResNet11 and ResNet models with 1 and 2 shortcut levels (1L- and 2L-RoRs) based on accuracy, recall, F1 score, and precision score, using the CIFAR-100 image dataset. The experiments achieved high accuracy and demonstrated that 2L-RoR maintains superior recognition performance as the number of data categories increases. This model successfully improved the detection rate of small-sized pests. Common methods for model lightweighting include pruning, quantization, and knowledge distillation [32,33].

2.4. Improvements and Applications of Faster R-CNN and ResNet Algorithms

Cao et al. [34] proposed an improved algorithm for small object detection based on Faster R-CNN. This algorithm operates in two stages. In the localization stage, they proposed an IOU-based improved loss function for bounding box regression and employed bilinear interpolation to enhance the Region of Interest (RoI) pooling operation, addressing localization bias. In the recognition stage, they utilized multi-scale convolutional feature fusion to enrich the feature maps with more information and applied an enhanced Non-Maximum Suppression (NMS) algorithm to avoid the loss of overlapping objects. The experiment selected the TT100K dataset jointly released by Tsinghua University and Tencent. The results indicate that the improved detection performance significantly surpasses that of the unaltered version. Detection of small targets remains one of the key aspects in pest detection. Yang et al. [35] proposed a target detection model based on a Multi-Attention Residual Network (MA-ResNet) to enhance detection capability and performance, particularly for multi-scale and small objects. Firstly, MA-ResNet was designed, and label smoothing was applied to the dataset. Based on this, MA-ResNet was used to replace the original VGG-16 feature extractor in Faster-RCNN. Different layers of MA-ResNet were utilized to construct a feature pyramid, resulting in an improved object detection model based on MA-ResNet within the Faster-RCNN framework. And conducted object detection experiments on the MS-COCO and PASCAL-VOC2012 datasets. The results indicate that the improved Faster-RCNN object detection model achieved an mAP of 78.4% on the PASCAL-VOC2012 dataset. Weeds growing uncontrollably in agricultural fields pose a significant challenge for pest detection. Thanh Le et al. [36] proposed a Faster R-CNN model utilizing the Inception-ResNet-V2 network to address weed problems in farmlands. The authors conducted weed detection in barley crops for wild radish and arctotheca. After acquiring images, they used “LabelImg” software (LabelImg v1.8.1) to draw bounding boxes around the weeds at different growth stages. In the context of localizing target weeds and estimating weed density, a portion of the FT_BRC dataset was comprehensively annotated and applied to Faster RCNN models with various feature extractors. The FT_BRC dataset used in this experiment consists of 3380 images collected in complex field environments and various weather conditions. The experiment was conducted using the Ubuntu 18.04 LTS operating system and a geforce gtx1080ti gpu (NVIDIA Semiconductor Technology Inc., Santa Clara, CA, USA), with all models implemented in tensorflow (Google LLC, Mountain View, CA, USA). Experimental results show that this model achieves higher accuracy than the average precision of other networks. Shi et al. [37] introduced a method using ResNet as the backbone feature extraction network in Faster R-CNN. They then constructed a ResNet-bidirectional Feature Pyramid Network (bifpn) structure to enhance feature extraction and multi-scale feature fusion capabilities. The experiment used the urpc2018 dataset and applied k-means++ (Matlab 2021b) clustering to generate a set of anchor boxes for enhancing the detection accuracy of the model. The experiment was conducted with hardware consisting of an intel xeon(r) cpu e5-2620 v4 @ 2.10 ghz processor, 32 gb (Intel Corporation, Santa Clara, CA, USA) of memory, and a nvidia geforce rtx2080 (NVIDIA Semiconductor Technology Inc., Santa Clara, CA, USA) graphics card. Experimental results demonstrate that the detection accuracy of the algorithm based on the improved Faster R-CNN increased by 8.26% compared to the original model. These results fully validate the effectiveness of the proposed algorithm.

Identifying pests is crucial in agricultural pest prediction, which is vital for a stable agricultural economy and food security. Automated monitoring of crop pests has become a key method for managing and optimizing agricultural resources. Continuous optimization of detection algorithms can significantly advance pest detection technology and promote overall progress in the field of pest detection. For the task of pest detection in farmlands, improvements to Faster R-CNN and ResNet can be achieved through algorithm optimization, data augmentation, and model lightweighting, enabling real-time and efficient pest detection. Table 1 presents a comparative analysis of research outcomes on various algorithms for pest detection. These enhancements contribute to improved performance and practicality of pest detection systems, providing stronger technical support for the development of smart agriculture.

Based on the table and the associated literature, Fast R-CNN and ResNet neural networks exhibit distinct differences. The former focuses on object detection tasks, achieving detection through region proposal networks and classifiers. In contrast, ResNet is a general-purpose deep residual network that addresses the gradient vanishing problem in deep neural network training by incorporating residual blocks. Fast R-CNN is noted for its efficient object detection performance and real-time processing capability, whereas ResNet’s deeper network architecture enables it to learn more complex feature representations, thereby enhancing the model’s generalization ability.

3. Detection of Mature Crops Using Agricultural Robots

3.1. Maturity Detection Based on Image Processing

Agricultural robots play an increasingly critical role in enhancing production efficiency and reducing labor costs. The acquisition, storage, and processing of digital images have become very important applications in our daily lives. Among these roles, automated detection and recognition of mature crops using image processing techniques are crucial for enabling autonomous harvesting by agricultural robots. Typical image processing procedures include methods such as image acquisition and contrast enhancement [38]. In recent years, researchers have conducted extensive studies on this topic, achieving significant advancements.

Miao et al. [39] proposed an efficient tomato harvesting robotic system based on image processing and deep learning, as illustrated in Figure 6. Additionally, the authors designed a visual servo-based robotic arm control strategy to achieve precise localization and autonomous harvesting of tomatoes. The system performed tomato object detection using an enhanced YOLOv5 algorithm trained on datasets Pascal VOC and Microsoft COCO. Subsequently, a convolutional neural network (CNN) was utilized to classify tomato maturity. The experiment was conducted using a laptop with hardware specifications including an Intel i7-6500 CPU @ 2.5 GHz processor, 16GB of RAM (Intel Corporation, Santa Clara, CA, USA), and a GeForce 960m GPU (NVIDIA Semiconductor Technology Inc., Santa Clara, CA, USA) Experimental results demonstrate that this system achieves the expected outcomes in terms of tomato detection accuracy, ripeness classification accuracy, and harvesting success rate, showcasing excellent practical performance.

In nature, many crops exhibit different color characteristics at various stages of maturity. Al-Mashhadani and Chandrasekaran [40] proposed a ripeness estimation algorithm based on color and shape features. First, color histograms and shape descriptors of fruits were extracted from RGB images. Data collection utilized two image sources. Subsequently, a decision tree algorithm was employed to classify maturity levels in specific datasets. Furthermore, the Pi Camera was mounted on the ground robot to achieve navigation and perform monitoring and data collection tasks. Experimental results demonstrate that the algorithm achieves high accuracy, providing a reliable decision-making basis for autonomous operations of agricultural robots. Visual inspection is a crucial initial step for agricultural robots to achieve autonomous harvesting. Puttemans et al. [41] presented a method for automatic visual detection of fruits to enable yield estimation and robotic harvesting. This study sequentially employed a three-camera stereo setup to capture strawberries from different perspectives to establish a dataset and collected training data for apples using side views. This method initially uses color threshold segmentation to extract regions of interest (ROI), followed by HOG feature extraction and fruit classification using SVM. Experimental results show that the method achieves an accuracy rate of over 90%, laying the foundation for practical applications of agricultural robots. Xiao et al. [42] reviewed the application of target detection and recognition technologies based on digital image processing and traditional machine learning in fruit and vegetable harvesting robots. This review introduces image preprocessing methods such as color threshold segmentation, shape analysis, and texture feature extraction, as well as the application of classical machine learning algorithms like Support Vector Machines (SVMs) and decision trees in fruit and vegetable recognition. This comprehensive review provides valuable insights for researchers seeking to understand the image processing technologies used in fruit and vegetable harvesting robots.

Researchers have utilized traditional image processing algorithms and machine learning methods, such as color threshold segmentation, shape analysis, and SVM, to achieve automatic detection and ripeness estimation of fruit targets. However, color variations caused by lighting from different angles can still lead to recognition errors, which will be an important research direction in the future [43]. This research direction is significant for enhancing the practicality and robustness of agricultural robots.

3.2. Maturity Detection Based on the YOLO Algorithm

With the rapid advancement of artificial intelligence technology, object detection algorithms have been widely applied in the field of agricultural robotics. The YOLO algorithm, a single-stage object detection method, treats object detection as a regression problem, enabling direct prediction of object classes and generation of detection frames. It offers advantages such as fast inference speed and low model deployment costs. The YOLO algorithm, based on deep learning, excels in simultaneously predicting class probabilities and bounding boxes for each object involved in a single pass through the neural network [44], demonstrating significant potential in mature crop detection tasks. In recent years, researchers have conducted a series of studies on the YOLO algorithm, continuously advancing the intelligence of agricultural robots. This section explores the application of the YOLO algorithm in crop maturity detection.

Gai et al. [45] optimized the YOLOv4 model by incorporating an attention mechanism and a feature pyramid structure, thereby enhancing the model’s ability to extract features of cherry fruits. Additionally, they employed data augmentation and transfer learning strategies to effectively mitigate the issue of insufficient samples in the cherry dataset. The dataset for this experiment consists of a total of 400 cherry images collected between 2016 and 2019. The hardware setup for the study utilized an Intel Core i7-7700 processor and an NVIDIA Tesla V100 GPU. Experimental results demonstrated that this algorithm achieved a precision of 95.2% in detection tasks, meeting the practical requirements of cherry-picking robots. Assessing crop growth status is a critical aspect of agricultural production. Paul et al. [46] collected sufficient training datasets over approximately three months using the Realme AI Quad Camera from experimental sites. They performed laboratory testing of target point coordinates with the YOLOv5 model and the RealSense D455 RGB-D camera. They then classified the growth stages of chili peppers based on their morphological features and used supervised algorithms for chili counting and tracking. They also designed a convolutional neural network-based algorithm for counting peppers, as illustrated in Figure 7, which was integrated into mobile devices to enable real-time identification. Experimental results demonstrated that this approach can accurately assess the growth status of peppers, providing robust support for the intelligent operations of agricultural robots.

Real-time performance is one of the key requirements in agricultural robot applications. Selvam et al. [47] proposed a real-time ripe palm fruit bunch detection method based on the YOLOv3 algorithm, as illustrated in Figure 8, to meet the operational requirements of palm harvesting robots. The authors annotated a dataset of oil palm images captured in oil palm plantations using an image annotation tool released by Darrenl on his GitHub. By optimizing the network architecture and training strategies of the YOLOv3 model, images were captured using a portable digital camera mounted on the robot, and digital image processing techniques were employed to extract colors from the models to assess maturity levels. After completing the preprocessing phase, the authors trained and tested the detection model using the Darknet framework and the pre-trained detection models prepared by Darknet. They compiled and trained the detection model and dataset, then created a software application as the system interface using Python libraries (Python v3.9.1). The application was integrated with the Darknet framework through Tkinter, and operations were conducted using Darknet commands. They enhanced both the inference speed and detection accuracy of the model, demonstrating excellent practical performance. The olive harvesting process is time-consuming and labor-intensive. To address the issue of difficult fruit harvesting, Aljaafreh et al. [48] developed a real-time olive fruit detection method for olive harvesting robots based on the YOLO algorithm. They established a dataset comprising 1200 source images captured by an RGB camera with a resolution of 2736 × 3648 pixels. The authors compared the performance of YOLOv3, YOLOv4, and YOLOv5 models in olive fruit detection tasks and found that YOLOv5 exhibited significant advantages in both accuracy and speed. To further determine the accuracy of the data, the authors selected YOLOv5s and YOLOv5x, which are the smallest and largest variants, respectively. To further improve detection robustness, they introduced a fruit verification mechanism based on color and texture features, effectively reducing false detection rates. To evaluate the final detection results, they measured the mean average precision (mAP) at IoU thresholds ranging from [0.5:0.05:0.95]. Experimental results demonstrate that this method can achieve stable detection of ripe olive fruits in complex environments. This improvement increased the success rate of harvesting difficult-to-pick fruits and established a technical foundation for the harvesting of such challenging fruits.

The YOLO algorithm has achieved significant results in the detection of mature crops by agricultural robots. Researchers have continuously improved the performance of the YOLO algorithm in fruit and vegetable detection tasks through algorithmic enhancements, feature optimizations, and real-time capability enhancements. In the future, by integrating techniques such as multi-sensor information fusion, domain adaptation learning, and few-shot learning, it is expected to further expand the application scope of the YOLO algorithm in the field of agricultural robots and promote the intelligent development of modern agriculture.

3.3. Comprehensive Application of Agricultural Robots in Crop Maturity Detection

Crop maturity is a critical indicator for assessing crop growth conditions and determining the optimal harvest time. Traditional maturity assessment relies heavily on manual experience, which is often subjective and inefficient. These issues can lead to mature crops rotting in the field during busy agricultural periods, resulting in reduced yields, food waste, and increased costs. With the advancement of agricultural robotics technology, automated detection and analysis of crop maturity using computer vision and artificial intelligence algorithms have improved efficiency and reduced labor time. This technology has become an important means of enhancing the intelligence of agricultural production. This section explores navigation technologies, core technologies in agricultural robotics, computer vision technologies, and the application of UAV-based multispectral imaging in crop maturity detection.

Row navigation is a fundamental capability for agricultural robots performing tasks between rows. The review by Shi et al. [49] focuses on the navigation and guidance methods of agricultural robots based on visual row detection algorithms and their applications in inter-row crop fields. The authors focused on vision-based row detection algorithms, such as Hough Transform and Random Sample Consensus (RANSAC), as well as methods for row navigation using sensors like LiDAR and Global Navigation Satellite Systems (GNSSs). It also discusses the practical application of row detection navigation techniques in field management of different crops, such as corn and soybeans, providing references for the actual deployment of agricultural robots. The core technologies of agricultural robots directly impact their effectiveness in crop production applications. Yang et al. [50] comprehensively review the maturity assessment methods based on features such as color, texture, and spectrum, highlighting the core technologies of agricultural robots in crop production and the application of deep learning in maturity recognition. The authors introduced maturity assessment methods based on color, texture, and spectral features, as well as the application of deep learning in maturity recognition. Additionally, they discussed the importance of multi-sensor information fusion and field environment perception technologies in enhancing the intelligence of agricultural robots. Computer vision technology is a key support for achieving agricultural automation. The review by Tian et al. [51] summarizes the maturity estimation methods based on features such as color, shape, and size, as well as the research progress of machine learning algorithms in maturity classification in the field of agricultural automation. In the field of crop maturity detection, the authors summarized maturity estimation methods based on features such as color, shape, and size, as well as the application of machine learning algorithms in maturity classification. Predicting crop maturity using multispectral imagery is one of the research directions in crop maturity detection. Zhou et al. [52] conducted a detailed classification, selection, and computation of soybean varieties using visual grading methods. The authors visually assessed the maturity dates of soybean breeding lines from the early maturity stage (R7) to the full maturity stage (R8) and captured multispectral aerial imagery during this period. A total of 130 features were extracted from the five-band multispectral images. The maturity dates of the soybean lines were predicted and evaluated using a Partial Least Squares Regression (PLSR) model with 10-fold cross-validation. Twenty important image features for estimation were selected, and the rate of change of these features between data collection dates was calculated. The experiment involved mounting a RedEdge-M spectral camera on a UAV to capture multispectral images. Pix4D Mapper, a UAV image processing software (Pix4D Mapper v4.5.6), was used to generate orthomosaic images of the field. A custom dataset was created by removing background effects from these images. The computed data indicate the feasibility of predicting the maturity stage of soybean varieties based on multi-spectral images captured by unmanned aerial vehicles.

The comprehensive application of agricultural robots for crop maturity detection has become a crucial direction in the development of modern agriculture. Researchers are leveraging computer vision, machine learning, and other technologies to continuously improve the accuracy and efficiency of crop maturity detection. In the future, agricultural robotics technology will become increasingly mature and widespread, fostering interdisciplinary integration and close collaboration between industry, academia, and research institutions. Table 2 provides a comparative analysis of research outcomes on various algorithms for maturity detection. This will significantly enhance the intelligence and precision of agricultural production, contributing substantially to sustainable agricultural development.

Based on the table and related literature, image detection technology involves capturing images of target objects using imaging devices and then analyzing pixel distribution, brightness, color, and other information through image processing systems to achieve automatic detection and recognition of the objects. The core idea of the YOLO algorithm is to treat object detection as a regression problem, where a convolutional neural network can directly predict bounding boxes and class probabilities from the input image, employing a regression approach for detection. The former method is characterized by good reproducibility, high precision, and broad applicability, although hardware conditions significantly impact such detection. The latter method, YOLO, offers rapid detection speeds, enabling highly efficient performance.

4. Leveraging UWB Technology for Localization and Its Various Application Scenarios

Ultra- Wideband (UWB) technology, known for its extremely wide spectrum bandwidth and precise positioning capability, began research in the late 20th century primarily for military and scientific purposes. By 2002, following FCC approval for widespread use, the technology began extensive commercial application. It became widely adopted in the 2010s across sectors such as smartphones, automotive electronic systems, and IoT devices. Future developments in UWB technology are expected to prioritize enhancing energy efficiency, improving security, and optimizing spectrum utilization (Table 3). A comparative analysis of UWB indoor positioning technology and various other techniques based on their performance. This article introduces the primary positioning methods of UWB technology in agricultural settings. The integration of this technology into agricultural robotics positioning systems operates within the frequency range of 3.1 to 10.6 GHz. Current research indicates that indoor wireless positioning heavily relies on RF signals. UWB technology demonstrates significant differences in positioning methods compared to other technologies. Common ranging methods employed by UWB include Time of Flight (ToF), Single-Sided Two-Way Ranging (SS-TWR), and Double-Sided Two-Way Ranging (DS-TWR). Additionally, it utilizes four main positioning techniques: Received Signal Strength Indication (RSSI), Angle of Arrival (AoA), Time of Arrival (ToA), and Time Difference of Arrival (TDoA). Furthermore, through the integration of complementary technologies, UWB positioning technology has been enhanced.

Based on the table, UWB positioning technology exhibits substantially higher accuracy than the previously mentioned technology, with precision reaching the centimeter level. Although RFID can also achieve centimeter-level accuracy, its stability in positioning precision is considerably inferior to that of UWB. Additionally, UWB possesses robust anti-interference capabilities, efficient real-time transmission, stability, and strong signal penetration. However, its accuracy is susceptible to factors such as non-line-of-sight errors and multipath effects. These represent the current issues and challenges facing technology.

4.1. RSSI-Assisted Localization Method Based on Integrated UWB Technology

In agricultural and other indoor environments, UWB (Ultra-Wideband) technology has achieved precise ranging and enables position estimation through trilateration methods. However, in practical applications, the positioning accuracy can be influenced by the strength of its own signals. To achieve real-time localization, researchers currently integrate UWB technology with RSSI (Received Signal Strength Indication) positioning methods. This principle entails measuring the received signal strength, employing established signal attenuation models for distance calculation, and subsequently applying multilateration algorithms for target position determination. This approach effectively mitigates the influence of signal fluctuations on UWB technology precision.

Alvin-Ming-Song Chong et al. [53] integrate UWB-RSS positioning into Wi-Fi RSS fingerprint-based Indoor Positioning Systems (IPSs), combining signals from different wireless technologies to obtain location information. Experimental validation was conducted in designated environments equipped with both Wi-Fi and UWB anchors. Experimental results indicate that even with fewer RSS sampling points in indoor environments, the combined UWB positioning accuracy is twice that of the baseline approach. Laial Alsmadi et al. [54] proposed an improved filtering method for Received Signal Strength Indication (RSSI) and beacon weighting based on Kalman filtering. They fused RSSI measurements of beacon signals using Kalman filtering, effectively reducing the impact of signal variations on localization accuracy. Experimental results demonstrate that the enhanced RSSI localization can efficiently deploy beacons and achieve positioning accuracy within a few centimeters, thereby enhancing localization stability.

Based on the optimized RSSI localization methods in UWB technology, leveraging Laial Alsmadi’s research, Jingjing Wang et al. [55], due to the influence of noise signals and multipath in practical applications, have stated that RSSI-based localization often exhibits unstable performance as illustrated in Figure 9. Therefore, Jingjing Wang et al. proposed a hybrid fingerprint localization technique based on RSS and CSI. This method preprocesses RSSI values using a combination of Kalman filtering and Gaussian functions and incorporates an enhanced CSI phase after linear transformation. This approach effectively eliminates spikes and noisy data, ensuring accurate and smoother RSSI output. Experimental results demonstrate that the algorithm performs well in noise resistance, fusion localization accuracy, and real-time filtering. It represents a refined localization method with higher precision and reduced positioning errors after fusion correction. It proposed an improved filtering method for Received Signal Strength Indication (RSSI) and beacon weighting based on Kalman filtering. They fused RSSI measurements of beacon signals using Kalman filtering, effectively reducing the impact of signal variations on localization accuracy. Experimental results demonstrate that the enhanced RSSI localization can efficiently deploy beacons and achieve positioning accuracy within a few centimeters, thereby enhancing localization stability.

Based on UWB technology, the RSSI positioning method primarily focuses on its susceptibility to environmental influences and its improvement through research aimed at obtaining smoothed values to mitigate the impact of environmental factors on received signals. This method is widely adopted in various commercial devices. In wireless network localization, UWB technology plays a crucial role in UWB positioning base stations, analogous to GPS “positioning satellites”, and essential components of positioning systems. These positioning anchors include three types: LORA, wired, and WIFI. In communication, this method is implemented in UWB positioning communication base stations utilizing LoRa 2.4G narrowband IoT technology, noted for its low power consumption, strong stability, anti-interference capability, and long-distance transmission. RSSI positioning stands as one of the primary localization methods leveraging UWB technology. Additionally, we analyzed this positioning method by incorporating current research findings, integrating corresponding datasets, and conducting an application analysis based on devices in various environments. Below is the table of transmission loss model parameters for different environments, as shown in Table 4.

From the table, it is evident that in both indoor and outdoor environments, the LOS (line-of-sight) transmission loss is lower than that of NLOS (non-line-of-sight), and the signal power is higher for LOS compared to NLOS. This suggests that the model performs better in LOS conditions than in NLOS conditions. Additionally, the model’s performance in indoor environments is marginally superior to that in outdoor environments for LOS conditions. Conversely, for NLOS conditions, the transmission loss is higher indoors compared to outdoors, while the signal power is lower indoors than outdoors. Overall, the model demonstrates that UWB positioning performs more effectively in indoor environments compared to outdoor environments.

4.2. The AOA (Angle of Arrival) Positioning Method Based on UWB (Ultra-Wideband) Technology

As a fundamental positioning technique in Ultra-Wideband (UWB) technology, Angle of Arrival (AOA) positioning depends on the signal’s Angle of Arrival to estimate distance. This algorithm calculates distances based on angles and operates on the principles of bidirectional ranging using three signals, thereby improving the positioning accuracy of UWB technology. Nevertheless, AOA positioning frequently encounters challenges from multipath effects and non-line-of-sight conditions in complex environments, resulting in positioning inaccuracies caused by signal reflection, refraction, diffraction, or physical obstructions that hinder reception. To mitigate these challenges, researchers have conducted numerous experiments.

Xizhong Lou et al. [56] proposed a UWB-based single-anchor AOA 3D localization system, building upon advancements in the AOA algorithm. This system analyzes rectangular 2D quadrature antenna arrays to determine azimuth and elevation angles. By integrating AOA estimation with TOF ranging methods, the system assesses the accuracy of Indoor Positioning Systems utilizing AOA mechanisms conforming to Ultra- Wideband standards. Experimental results demonstrate that the enhanced AOA system effectively calculates tag height, thereby obtaining precise three-dimensional coordinates. This system represents an economical localization solution. Kun Zhang et al. [57] proposed a method for joint Angle of Arrival (AOA) estimation in Ultra- Wideband MIMO systems. The method initially determines tag positions and subsequently integrates these data with other detection information for real-time transmission to the base station, facilitating precise localization. Experimental findings demonstrate that the use of MIMO technology enhances system spectral efficiency, improves overall performance, and achieves precise localization. The principles for launching and receiving are depicted in Figure 10 and Figure 11, respectively.

Based on the research approach by Kun Zhang et al., Tianyu Wang et al. [58] proposed a residual network deep learning algorithm for AOA estimation in complex indoor environments. This method extracts information fully by utilizing the amplitude and arrival phase of channel impulse responses as inputs to the model. The experiment adheres to FCC standards, and the results suggest that this approach achieves superior localization accuracy compared to traditional AOA methods in practical scenarios.

This paragraph focuses on optimizing AOA localization algorithms in UWB technology, considering the traditional AOA algorithm’s susceptibility to factors like non-line-of-sight propagation. It progressively examines estimation errors and influences within UWB-based AOA localization. Moreover, this algorithm finds application in various commercial devices, particularly in automotive contexts where it supports UWB-AOA positioning systems. Leveraging phased array radar technology for precise angle and range measurements, this approach simplifies system complexity by reducing harnesses and connectors, thereby cutting costs and enhancing reliability significantly. In the realm of positioning, it serves both core IoT AOA positioning base stations and intelligent positioning beacons. Specific commercial applications are detailed for these methods, including their respective data parameters outlined in Table 5.

Overall, the coverage range of AOA positioning base stations is more than twice that of AOA asset-tracking beacons. Additionally, the former utilizes a standard Ethernet interface, thereby ensuring signal stability. In contrast, the latter is equipped with a Bluetooth interface, offering diverse signal connectivity. However, the signal coverage range of the asset-tracking beacons is significantly lower than that of the base stations and is inadequate for ranges around 10 m. Meanwhile, AOA asset-tracking beacons are more portable than the base stations, highlighting their distinct characteristics.

4.3. TOA and TODA Localization Methods Incorporating UWB Technology

TOA and TODA, compared with the two positioning methods of UWB technology mentioned earlier, exhibit both similarities and differences. TOA is a positioning method based on Time of Arrival, employing time-of-arrival measurements for ranging and positioning. In contrast, TODA represents an enhancement over TOA, utilizing Time Difference of Arrival as its basis. With TOA, variations in the arrival times of tag signals frequently occur during positioning. Researchers calculate the time difference from this basis, thereby determining variations in signal arrival distances, which enhances the accuracy of UWB technology in positioning to some extent.

Due to power issues in UWB positioning technology, Juri Sidorenko et al. [59] combined Time of Arrival (TOA) and Time Difference of Arrival (TDOA) measurement techniques to correct arrival times and time differences. Experimental results indicate that TDOA measurement technology enhances UWB’s correction of signal power dependency and hardware delays, thereby correcting time differences between signals transmitted at different times. The combined schematic is shown in Figure 12.

However, in TODA localization algorithms, the nonlinear equations are often approximated as linear, leading to relatively low positioning accuracy. Dong Li et al. [60] tackled this challenge by employing swarm intelligence algorithms capable of addressing nonlinearities. They proposed an improved Artificial Bee Colony (IABC) algorithm that integrates a dynamic Levy mechanism. Initially, they modeled TODA localization for indoor positioning, deriving error likelihood coefficients. Subsequently, they dynamically improved the Levy mechanism by incorporating proportion values into the ABC algorithm. Finally, they utilized the error mitigation function as the fitness criterion for the IABC algorithm, wherein the minimum fitness value indicates optimal coordinates. Experimental results demonstrate that the algorithm achieves optimal positioning accuracy of approximately 25 cm, effectively mitigating nonlinear challenges in localization. In addition, TODA necessitates tightly synchronized anchors, achieved through either wireless synchronization in small-scale deployments or costly wired backbone infrastructure. Both impede the widespread adoption of TDOA in large-scale areas. To address this challenge, Vecchia et al. [61] proposed a novel purely wireless TODA approach. This method utilizes TDMA scheduling to facilitate continuous multi-hop operations of anchor infrastructure, generating synthesized timing information using a specific technology. Experimental results indicate that this improved approach achieves decimeter-level accuracy even with moving.

This section provides an overview of the Time of Arrival (TOA) and Time Difference of Arrival (TODA) algorithms. TOA determines when a target arrives, while TODA calculates time differences between arrivals at various base stations. Compared to TODA, TOA positioning methods are more established. Recently, TODA has found commercial applications as a signal processing and computing module. National Instruments’ latest embedded controllers, leveraging i7 processors, utilize this technology for high-speed digital transmission and precise time measurement. Conversely, TOA algorithms are widely employed in wireless communications for interference resistance and noise reduction.

5. Application of Multi-Sensor Fusion and Integration with Deep Learning Algorithms

With the advancement of technology, particularly in the information age, sensor technology plays an increasingly crucial role across various fields. Sensors serve as essential tools for perceiving the real world and converting it into data, indispensable in sectors such as military operations, aerospace exploration, healthcare systems, and intelligent transportation [62]. However, single sensors often struggle to comprehensively capture all necessary information in complex environments, leading to the development of multi-sensor fusion technology. Moreover, recent years have witnessed remarkable progress in deep learning algorithms, a significant branch of artificial intelligence, which has achieved notable successes in tasks like image recognition, speech processing, and natural language understanding [63]. Unlike traditional machine learning methods, deep learning utilizes hierarchical neural network structures to learn and extract high-level abstract features from data, demonstrating superior performance on intricate tasks. Consequently, the integration of multi-sensor fusion with deep learning algorithms has emerged as a prominent area of research, driving advancements in sensor technology and finding wide-ranging applications across diverse domains [64]. This article provides a detailed introduction to multi-sensor fusion technology in agriculture, alongside an expanded overview of its integration with deep learning algorithms. By leveraging visual sensors for perception, LiDAR for precise mapping and management of agricultural fields, and ultrasound sensors for obstacle avoidance and distance measurement in conjunction with visual sensors, enhanced outdoor localization capabilities for agricultural robots are demonstrated [65].

5.1. Fusion Localization Leveraging LiDAR and Vision Sensors Enhances Precision in Spatial Positioning

In the initial stages, laser sensors and vision sensors developed independently within their respective domains. Laser sensors were initially used for ranging, measurement, and scanning, including laser rangefinders widely adopted in industrial and military applications. Vision sensors, on the other hand, were primarily used for image capture, analysis, and recognition, employed in automation control, computer vision, and robotics. With technological advancements and increasing demands, laser sensors and vision sensors started to integrate gradually. In smart manufacturing, this combination enables precise positioning and detection, thereby enhancing production efficiency and quality control. This integration significantly improves the precision of positioning in agricultural robotics, maximizing agricultural production efficiency.

In agriculture, especially within orchard environments, the positioning capability of agricultural robots is frequently uncertain due to complex environmental factors such as dense foliage or other disturbances. Therefore, Young-Sik Shin et al. [66] proposed a Visual-LiDAR SLAM framework that integrates laser-ranging sensors with monocular cameras. This framework utilizes the combination of these two sensors to conduct measurements under conditions of sparse depth and a narrow field of view. Experimental results of this framework show substantial enhancements in both localization accuracy and robustness. The introduction of this framework offers theoretical support for applying sensor fusion. Based on this approach, Hanwen Kang et al. [67] also proposed a visual sensing and perception strategy utilizing LiDAR-camera fusion. They evaluate two state-of-the-art LiDAR-camera extrinsic calibration methods to acquire precise extrinsic matrices between the LiDAR and camera. By fusing point clouds with color images and employing a deep learning-based instance segmentation network, they achieve accurate fruit localization. Comprehensive experiments illustrate that integrating LiDAR-camera fusion technology significantly enhances the accuracy and robustness of fruit localization, thereby reducing measurement errors. Outdoor positioning, particularly in adverse weather conditions, presents a substantial challenge in agricultural environments. Nguyen Anh Minh Mai et al. [68] proposed a laser radar stereo fusion network and trained it using both original and augmented datasets collected in severe weather conditions. Their findings indicate that employing specific training strategies after sensor fusion enhanced positioning accuracy by 26.72% compared to previous methods, with a minimal performance decrease of 6.23%. Implementation of this method substantially enhances positioning precision in agricultural robots. Subsequently, Nguyen Anh Minh Mai et al. [69] employed the KITTI dataset the next year to calculate average precision. Their study demonstrated strong resilience to signal interference in adverse conditions by integrating stereo cameras with laser radar sensors, thereby slightly improving positioning accuracy and precision.

With advancements in technology, the integration of lidar sensors and vision sensors has become increasingly sophisticated. This section explores research issues related to their combined application in achieving localization in agriculture. Compared to using each sensor individually, their combination not only leverages their respective capabilities but also enables comprehensive and efficient data acquisition and processing, providing precise decision support for agricultural production. Additionally, multi-sensor fusion extends beyond localization to include tasks such as using lidar for terrain height measurement combined with crop growth data from vision sensors, enabling farmers to manage fertilization and irrigation with precision, thereby maximizing crop yield and minimizing resource waste. Sensor fusion technology facilitates real-time monitoring of soil moisture, temperature, and crop health, which supports timely pest outbreak alerts and enables precise preventive measures to protect crop growth and quality. These advantages underscore the benefits of multi-sensor fusion. However, current lidar and vision sensor technologies face challenges such as high costs, data processing complexity, and the need for high-precision localization. Future developments aim to reduce technology costs further, optimize data processing algorithms, and explore broader applications of sensor fusion technology in agriculture. Additionally, we performed comprehensive data analysis on the updated positioning technique, illustrated in Table 6.

Through comparison, it is evident that among the three types of LiDAR sensors, the semi-solid-state type is the most prevalent, whereas the mechanical type is costly and has a limited lifespan. The solid-state type is the least expensive, with OPA costing less than half of the other types. In the semi-solid-state category, both MEMS and prisms operate by reflecting laser beams. However, MEMS exhibits limited angular measurement capabilities and challenges with stitching, along with a relatively shorter lifespan, whereas prisms allow for long-distance detection but possess a complex structure susceptible to wear, with similar costs for both. In the solid-state category, flash LiDAR provides lower detection accuracy and a shorter range compared to OPA, which offers superior accuracy and faster scanning speed. Overall, OPA-type LiDAR within the solid-state category demonstrates a comparative advantage over other types.

5.2. Integration of Multi-Sensor Fusion Technology with Deep Learning Algorithms for Localization Applications in Agriculture

With the advancement of technology and the advent of the information age, sensor technology is playing an increasingly crucial role in various fields. As a key technology for perceiving the real world and transforming it into data, sensors are indispensable in areas such as military, aerospace, healthcare, and intelligent transportation. However, a single sensor often struggles to comprehensively capture all information in complex environments, thus giving rise to multi-sensor fusion technology. In addition, in recent years, with the rise and development of deep learning algorithms, a significant branch of artificial intelligence, tremendous successes have been achieved in areas such as image recognition, speech processing, and natural language understanding. Compared to traditional machine learning methods, deep learning can learn and extract high-level abstract features from data using hierarchical neural network structures, thereby demonstrating excellent performance on complex tasks. Therefore, the combination of multi-sensor fusion and deep learning algorithms has become a research hotspot, driving advancements in sensor technology and yielding significant applications across multiple domains. This article provides a detailed introduction to multi-sensor fusion technology in agriculture, along with an expanded overview of its integration with deep learning algorithms. Leveraging the capabilities of visual sensors for perception and LiDAR for precise mapping and management of agricultural fields, as well as the integration of ultrasound sensors for obstacle avoidance and distance measurement alongside visual sensors, demonstrates enhanced outdoor localization capabilities for agricultural robots.

In complex environments, multi-sensor fusion technology demonstrates good adaptability but may fail due to external noise or specific geographical factors. Therefore, Meibo Lv et al. [70] proposed an agriculture-based multi-sensor fusion method in Changji, employing loosely coupled Extended Kalman Filter (EKF) algorithms to mitigate external environmental interference. The method integrates Inertial Measurement Units (IMU), robot odometry (ODOM), Global Navigation Satellite System (GPS), and Visual-Inertial Odometry (VIO), utilizing visualization tools to simulate and analyze robot trajectories and errors. Experimental data, along with Table 7, indicate that this algorithm shows higher accuracy and robustness during sensor failures.

According to the data in Table 7, both the average absolute errors and the root mean square deviations in the x, y, and z directions were analyzed. Fusion-groundtruth exhibits significantly lower x and y error values compared to MSCKF-groundtruth, achieving single-digit precision accuracy, which is markedly superior to the latter. However, IMU-groundtruth demonstrates clear dominance in the z-direction errors. Overall, the first method integrating multi-sensor fusion technology demonstrates superior positioning accuracy compared to the latter. Additionally, in the same year, Peng Gao et al. [71] addressed the impact of noise on positioning using an enhanced algorithm employing multi-sensor fusion and autoencoder neural networks. This approach integrates data from multiple sensors, RTK-GNSS, inertial measurement units (IMUs), and dual rotary encoders using an Extended Kalman Filter (EKF). Testing its positioning accuracy in three different environments demonstrates that, even under conditions where multi-sensor fusion technology fails, the system can still improve position estimation accuracy. This finding is crucial for future applications of multi-sensor fusion combined with deep learning algorithms in positioning. Apart from noise interference, agricultural positioning is frequently influenced by factors such as the presence of non-target crops. For example, in orchard environments, robot positioning may be impeded by tree canopy occlusion, resulting in diminished signal strength. To address these challenges, Bingjie Tang et al. [72] have proposed an innovative semi-structured global positioning and mapping system for orchard fruit trees, utilizing a multi-sensor fusion approach within simultaneous localization and mapping (SLAM). This system employs a SLAM framework that integrates cameras, inertial measurement units, and LiDAR sensors to improve odometry accuracy within complex orchard settings. Subsequently, it utilizes a YOLO-based fruit tree localization algorithm to detect adjacent fruit trees through image analysis and LiDAR point clouds. The robot pose, derived from SLAM, is used to refine the global positions of the detected fruit trees in real-time. Results indicate that the average root mean square error (RMSE) for odometry in pear and persimmon orchards is 6.04 cm and 14.59 cm, respectively, while the localization errors for fruit trees are 0.57 cm and 3.23 cm. This method facilitates precise localization and mapping of fruit trees in intricate environments. This approach enhances a framework previously proposed by Yibo Zhang et al. [73], which integrates LiDAR, visual, and inertial data with the EKF algorithm. In contrast to the latter approach, this method employs SLAM to estimate localization accuracy, enabling more precise obstacle calculations and a significant reduction in localization errors. This suggests that the method holds substantial potential for autonomous agricultural positioning and establishes a foundation for future advancements in this research area.

The integration of multi-sensor fusion technology with deep learning algorithms is becoming increasingly prevalent across various domains. This section offers a concise overview of the application of these methods in agricultural localization, with a particular emphasis on aspects such as obstacle avoidance and noise reduction, thereby underscoring their role in mitigating noise and enhancing data accuracy. Moreover, advanced data analysis and decision support capabilities can be realized in additional areas of agriculture. For example, in real-time irrigation systems, deep learning algorithms can analyze sensor data and adjust irrigation volumes in real-time to accommodate fluctuating environmental conditions. This real-time responsiveness significantly improves resource efficiency and reduces operational costs. Nevertheless, despite the considerable potential offered by the integration of multi-sensor fusion and deep learning algorithms in practical applications, several challenges persist. Firstly, data fusion and processing necessitate significant computational resources, thereby imposing substantial demands on both hardware and software. Secondly, the quality and accuracy of the data directly influence the final analysis results, necessitating ongoing enhancements in data collection and processing methodologies. Finally, further research and development are required in relevant areas to reduce costs and enhance the accessibility of these technologies.

6. Comprehensive Applications of Agricultural Robots for Detection and Localization

With continuous breakthroughs in agricultural positioning and detection technologies, the integration of these technologies has become a prominent trend in precision agriculture. For example, agricultural robots equipped with GPS and sensor networks can efficiently perform tasks such as seeding and fertilizing. Some robots can adjust the amount of fertilizer and the depth of seeding based on real-time soil moisture data and crop growth status. This enhancement not only increases seeding accuracy but also optimizes resource utilization and minimizes waste. In orchards or vegetable greenhouses, smart agricultural robots utilize image recognition technology for pest and disease detection [74]. These robots are capable of scanning crops using cameras to identify disease spots and pests on the leaves. Detection results are transmitted in real-time to a central system that analyzes the data to devise the optimal treatment plan, including pesticide application or manual removal of infected leaves. In extensive fruit plantations, agricultural robots integrate positioning systems with image recognition technology for automated fruit harvesting. These robots evaluate fruit ripeness to ascertain the optimal harvesting time and utilize positioning technology to precisely locate the fruit. This not only enhances harvesting efficiency but also decreases dependence on human labor [75]. This paper offers a concise overview of the technologies employed in the integration of positioning and detection and examines these technologies in practical applications.

6.1. Detection of Soil and Crop Positioning and Planting Using Deep Learning Algorithms

Soil fertility is essential for crop production efficiency, and precise seeding entails placing seeds at the optimal depth and spacing to maximize germination and growth. Deep learning algorithms can improve precision seeding by analyzing soil conditions and predicting optimal seeding configurations [76]. Models can incorporate soil sensor data, weather forecasts, and historical crop performance to recommend precise seeding patterns. The integration of deep learning algorithms enhances the efficiency of soil management and seeding processes. The model offers precise predictions and recommendations, which aid in increasing crop yields, reducing labor demands, and optimizing resource use, thereby reducing costs for farmers. This section presents methodologies for soil testing to enhance crop seeding and boost growth efficiency.

To ensure optimal seedbed preparation, it is essential to assess and regulate the size of soil aggregates during tillage operations. This approach results in increased crop yields and more efficient use of resources. In this context, Rahim Azadnia and colleagues [77] proposed a machine vision system utilizing convolutional neural networks (CNN) for classifying soil texture images at depths of 20, 40, and 60 cm. They developed a CNN model comprising multiple layers and two blocks. Experimental results demonstrate that the CNN method can swiftly and precisely identify soil texture types on large-scale farms, thereby enhancing seedling placement through deep learning algorithms. However, the widespread adoption of soil detection and precision planting necessitates the timely acquisition of low-cost, high-quality soil and crop yield maps. To address this, Sami Khanal et al. [78] proposed a method that integrates remote sensing data with machine learning algorithms for the spatial prediction of soil properties and crop yields, providing a cost-effective and time-efficient solution. The study combined field data on five soil properties (i.e., soil organic matter (SOM), cation exchange capacity (CEC), magnesium (Mg), potassium (K), and pH) with corn yield data from yield monitoring, as well as multispectral aerial imagery and terrain data. Linear regression (LM) and five machine learning algorithms, including Random Forest (RF), were employed to develop models for predicting soil properties and corn yield. Experimental results indicated that neural networks achieved the highest accuracy for SOM (R² = 0.64, RMSE = 0.44) and CEC (R² = 0.67, RMSE = 2.35); Support Vector Machines (SVMs) were most effective for K (R² = 0.21, RMSE = 0.49) and Mg (R² = 0.22, RMSE = 4.57); while Gradient Boosting Machines (GBMs) were utilized for pH (R² = 0.15, RMSE = 0.62). The results suggest that integrating remote sensing data with machine learning algorithms enables accurate detection of soil properties and effectively supports precision planting, thereby enhancing crop yields.

Deep learning algorithms can classify soil based on visual features, including texture and color. Through training on diverse soil samples, the algorithms can predict soil properties and categorize them as sandy, clayey, or loamy. This classification aids farmers in understanding soil types across different fields, enabling targeted adjustments in agricultural practices. This section explores soil detection and precision seeding through deep learning algorithms, evaluating experimental results from relevant researchers to illustrate the method’s effectiveness in both soil detection and precise seed placement. Additionally, it offers insights into potential future research directions within this domain. Furthermore, automated seeding systems equipped with deep learning algorithms can execute seeding tasks with high precision. These systems utilize computer vision and deep learning models to navigate fields, evaluate soil conditions, and accurately place seeds. By minimizing manual intervention and seeding errors, these systems improve the efficiency and reliability of seeding operations.

6.2. Integration of Depth Cameras and Deep Learning Algorithms for Pest and Weed Localization and Detection

In modern agriculture, advancements in technology have significantly propelled the intelligent and automated evolution of agricultural production. The integration of depth algorithms and depth cameras reveals substantial potential for agricultural positioning and detection applications. In crop health monitoring, deep learning algorithms can analyze 3D images captured by depth cameras to assess crop growth status, pest and disease conditions, and nutrient needs. Convolutional neural networks (CNNs) can extract features from images, such as changes in leaf color and shape variations, thereby aiding in the assessment of crop health [79]. For instance, when a depth camera identifies anomalous leaf color in specific regions, the system can employ CNN algorithms to determine whether these changes correlate with prevalent diseases, thus offering targeted solutions. Depth cameras can also supply 3D structural data of the soil, which deep learning algorithms can analyze to predict soil moisture content and nutrient composition [80]. This paragraph primarily explores the specific applications of integrating depth cameras with deep learning algorithms in agricultural positioning and detection and examines related research to illustrate the sustainability of this integration.

In agricultural production, aside from intrinsic factors, adverse elements, such as pest damage and weed growth, also impact crop development. To localize and detect pests, Ching-Ju Chen et al. [81] utilized a depth camera to capture images of the stink bug and employed a Tiny-YOLOv3 neural network model, implemented on the NVIDIA Jetson TX2 embedded system, to identify T. papillosa in orchards. This configuration enables real-time determination of pest locations, facilitating the optimal route planning for drone pesticide application. Furthermore, the TX2 embedded platform transmits pest locations and occurrences to the cloud, enabling the monitoring and analysis of longan growth using computers or mobile devices. This method has been demonstrated to provide real-time insights into pest distribution and reduce pesticide usage, thus minimizing environmental impact. Building on this research, Francesco Betti Sorbelli et al. [82] in 2023 investigated the use of RGB cameras combined with computer vision algorithms for the localization and monitoring of the “brown marmorated stink bug.” Their team developed a specialized field image dataset, conducted an in-depth analysis of the captured images, and employed ML algorithms for target monitoring. Their results indicate that this combination optimizes the maximum likelihood model and corrects image defects, thereby effectively advancing pest localization and monitoring. In addition, to address the issue of weed proliferation, depth cameras can capture three-dimensional information of the scene and, when integrated with depth algorithms, accurately analyze the shape, size, and position of weeds. This facilitates effective differentiation between weeds and crops, thereby optimizing weed control strategies. In this context, Ke Xu and colleagues [83] proposed a framework based on multimodal information fusion, which re-encodes single-channel depth images into a novel three-channel image. Features are subsequently extracted using convolutional neural networks (CNN). Experimental results demonstrate that, compared to weed detection methods utilizing RGB images, this approach significantly enhances accuracy. Using weights of α = 0.4 for RGB images and β = 0.3 for depth images, the mean average precision (mAP) for grasses and broadleaf weeds is 36.1% and 42.9%, respectively, while the overall detection accuracy (IoG) is 89.3%. This method effectively improves weed localization and monitoring performance, thereby enabling precise weed control.

Based on the extensive application of positioning and detection techniques in agricultural robots, this section primarily explores the integration of depth cameras and deep learning algorithms. It specifically examines how this integration improves the detection and localization of pests and weeds, thus facilitating precise pest and weed control. Additionally, depth cameras can acquire three-dimensional morphological data of crops, which, when combined with deep learning algorithms, enable the analysis of crop growth rates and morphological changes. Despite the promising potential for the application of depth cameras and algorithms in agriculture, several challenges remain. The high cost of depth cameras and their performance variability under varying environmental conditions, such as lighting changes and weather factors, constitute significant issues. Furthermore, deep learning algorithms demand substantial training data and computational resources, potentially constraining their adoption in certain small-scale agricultural applications. Nevertheless, as technology evolves, the cost of depth cameras is anticipated to decline progressively, while advancements in computational resources and algorithm optimization will facilitate the broader adoption of deep learning technologies. Intelligent and automated agricultural systems are projected to become increasingly efficient and cost-effective, fostering further innovation and transformation in agricultural production.

6.3. Localization and Harvesting of Crops Based on Depth Images and Deep Learning Algorithm Integration

Modern agriculture is experiencing a technological revolution through the integration of depth imaging and deep learning algorithms, which presents new opportunities for precision farming. Depth imaging technology delivers precise distance data, whereas deep learning algorithms extract valuable insights from these data, thereby enabling accurate crop positioning and automated harvesting [84]. This integration not only enhances harvesting efficiency but also optimizes resource utilization and minimizes labor costs. This section investigates the application of depth imaging and deep learning algorithms in crop positioning and harvesting. By synthesizing findings from various studies, this paper will examine the practical applications of this approach. The integration of these technologies facilitates real-time analysis of depth images, precise crop location identification, and maturity assessment [85]. The model accurately identifies the exact location of each crop and assesses its suitability for harvesting. During this process, crop location data and automated harvesting systems (e.g., robotic arms or drones) execute precise operations in the field [86]. The system navigates to the target crop location and performs the harvesting task. This automated process reduces manual intervention and improves harvesting efficiency.

In harvesting robot research, vision serves as the primary external information source, with key tasks encompassing target recognition, positioning, and feature extraction. Significant challenges exist in visual extraction for specific crops, such as green peppers, which exhibit colors that closely resemble their environmental background. Previous approaches, including spectral image classification and LED light reflection recognition, exhibit certain limitations. The former method demonstrates a high misidentification rate, resulting in reduced accuracy, whereas the latter proves effective at night but suffers from poor performance due to daytime light interference. To address these challenges, Wei Ji et al. [87] introduced a target recognition approach grounded in popular ranking. This approach enhances pepper images via an enhancement algorithm and extracts superpixels from these images using Energy-Driven Sampling (SEEDS). By integrating significant maps from the top, bottom, left, and right and applying morphological operations to eliminate noise, this approach delineates the contours of green peppers. Experimental results demonstrate that this approach effectively identifies green peppers, achieving recognition accuracy and recall rates of 83.6% and 81.2%, respectively. The performance metrics of this approach significantly outperform those of the other two methods, making it highly suitable for positioning detection in harvesting robots. In lychee orchards, lychee fruit clusters are dispersed randomly and appear irregularly, making it challenging to detect and localize multiple clusters simultaneously. This poses a significant challenge for vision-based harvesting robots operating continuously in natural environments. To address this challenge, Jinhui Li et al. [88] proposed a robust algorithm utilizing a field RGB-D camera (Figure 13). Their team employed the semantic segmentation method Deeplabv3 to classify RGB images into three categories: background, fruit, and branches. They then applied a preprocessing step to align the segmented RGB images and eliminate branches devoid of fruit. Subsequently, they processed the binary map of small branches using skeleton extraction and pruning operations, retaining only the primary branches. A non-parametric density-based spatial clustering method was employed to cluster pixels in the 3D space of the branch skeleton map, identifying fruiting branches within the same lychee cluster. Finally, principal component analysis was employed to fit a 3D line for each cluster, with the line data representing the position of the fruiting branches. Experimental results indicated that the detection accuracy of lychee fruiting branches was 83.33%, localization accuracy was 17.29 ± 24.57, and the time required to determine a single lychee fruiting branch was 0.464 s. This method facilitates rapid, effective, and continuous, accurate harvesting.

Crop localization and harvesting technologies based on depth images and deep learning algorithms represent the cutting edge of modern agriculture. Through precise data collection and the application of intelligent algorithms, the efficiency and accuracy of agricultural production have been significantly improved. This not only brings higher productivity to agricultural enterprises but also promotes sustainable development in agriculture. This section offers a brief overview of the localization and harvesting of representative vegetables and fruits, evaluating the research results that combine depth images and deep learning algorithms to validate the accuracy and effectiveness of these methods. Furthermore, the integration of these technologies allows adaptation to various types of crops and environmental conditions, providing high flexibility. However, despite the immense potential of depth images and deep learning algorithms in crop localization and harvesting, several challenges remain at the current research stage. Issues such as the quality of depth images being influenced by lighting conditions and camera precision may impact model training effectiveness and performance in practical applications. Additionally, training and inference of deep learning models require substantial computational resources, which can be a significant challenge for resource-limited agricultural enterprises. Future development directions include improving the quality and precision of depth images, reducing computational resource requirements, enhancing system robustness, and integrating depth images with other technologies (such as drones and the Internet of Things) to achieve more efficient agricultural production.

7. Discussion and Prospects

Agricultural robots, as a crucial component of modern agricultural technology, are progressively transforming traditional farming practices. By leveraging advanced positioning and detection technologies, agricultural robots can achieve automation, enhance productivity, and reduce manual labor. This paper primarily explores advanced technological methods employed in the positioning and detection of agricultural robots. Firstly, the paper offers a comprehensive introduction to the technological methods utilized in detection and positioning. In the context of detection, technologies such as CNN (convolutional neural networks) and YOLO (You Only Look Once) are described, accompanied by a comparative analysis of results from various researchers. Experimental comparisons reveal that convolutional neural networks and deep learning algorithms achieve superior recognition accuracy in agricultural detection. Regarding agricultural robot positioning, the paper compares this technology with alternative positioning techniques using tables and offers a detailed discussion of the four UWB (Ultra-Wideband) positioning methods and their extensions. The paper highlights that UWB technology provides high positioning accuracy. Additionally, the paper examines multi-sensor fusion technology and its integration with deep learning algorithms, demonstrating that multi-sensor fusion mitigates the limitations of individual sensors and that deep learning algorithms improve positioning accuracy when multi-sensor fusion technology encounters limitations. Finally, the paper investigates the application of agricultural robots in soil testing, seeding, pest and weed control, and harvesting, integrating detection with positioning techniques. The analysis reveals that deep learning algorithms play a crucial role in the detection and positioning of agricultural robots.

In summary, although the positioning and detection technologies of agricultural robots play a significant role in advancing agricultural modernization, they also encounter challenges such as environmental complexity, sensor accuracy, data processing, and cost-effectiveness [89]. Through technological innovation and ongoing optimization, agricultural robots are anticipated to achieve significant breakthroughs in precision agriculture and contribute to the sustainable development of global agriculture. In light of this, I propose the following recommendations:

(1): Innovate and Revamp the Structure of Agricultural Robots: Explore practical structural innovations to enhance the adaptability and flexibility of agricultural robots in complex environments. Upholding the concept of low-carbon environmental protection, develop agricultural robots powered by clean energy sources. Concurrently, improve their battery life to ensure prolonged and efficient real-time monitoring of agricultural environments while adhering to principles of environmental conservation in farmland.
(2): Enhance Autonomous Pest Management Capabilities: Improving the autonomous handling of pests by robots is crucial for protecting crops from infestations, a key function of detection and localization. Delve deeper into robotic capabilities for pest management to promptly eradicate pests, thereby reducing labor inputs and enhancing crop protection capabilities [90], steering agricultural robot development towards benefiting society.
(3): Strengthen Infrastructure Development for Agricultural Robots: Effective detection and localization by agricultural robots rely on various sensors, with base station infrastructure being critical for signal strength. Strengthen the construction of base stations around farmlands to ensure comprehensive coverage, thereby providing robust support for the detection and localization capabilities of agricultural robots.

Author Contributions

The presented work was under supervision by R.W. and S.W.; S.W.: concept ualization, software, and writing—original draft; R.W and W.Z: validation and writing—review and editing; L.C. and W.Z.: investigation and editing; Z.H.: validation. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Research on Performance Degradation Mechanism and Life Prediction of Rolling Bearings Based on Recurrent Impact Characteristics (2023-JC-YB-436).

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

The authors acknowledge the editors and reviewers for their constructive comments and support in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Han, X.; Xu, L.; Peng, Y.; Wang, Z. Trend of Intelligent Robot Application Based on Intelligent Agriculture System. In Proceedings of the 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture (AIAM), Manchester, UK, 23–25 October 2021; pp. 205–209. [Google Scholar] [CrossRef]
Li, W.; Zhang, Z.; Li, C.; Zou, J. Small Target Detection Algorithm Based on Two-Stage Feature Extraction. In Proceedings of the 2023 6th International Conference on Software Engineering and Computer Science (CSECS), Chengdu, China, 22–24 December 2023; pp. 1–5. [Google Scholar] [CrossRef]
Su, D.; Qiao, Y.; Kong, H.; Sukkarieh, S. Real time detection of inter-row ryegrass in wheat farms using deep learning. Biosyst. Eng. 2021, 204, 198–211. [Google Scholar] [CrossRef]
Liu, R.; Yu, Z.; Mo, D.; Cai, Y. An Improved Faster-RCNN Algorithm for Object Detection in Remote Sensing Images. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7188–7192. [Google Scholar] [CrossRef]
Bajait, V.; Malarvizhi, N. Recognition of suitable Pest for Crops using Image Processing and Deep Learning Techniques. In Proceedings of the 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 16–17 December 2022; pp. 1042–1046. [Google Scholar] [CrossRef]
Labbé, M.; Michaud, F. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J. Field Robot. 2019, 36, 416–446. [Google Scholar] [CrossRef]
Abdelnasser, H.; Mohamed, R.; Elgohary, A.; Alzantot, M.F.; Wang, H.; Sen, S.; Choudhury, R.R.; Youssef, M. Semantic SLAM: Using Environment Landmarks for Unsupervised Indoor Localization. IEEE Trans. Mob. Comput. 2016, 15, 1770–1782. [Google Scholar] [CrossRef]
Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
Della Corte, B.; Andreasson, H.; Stoyanov, T.; Grisetti, G. Unified Motion-Based Calibration of Mobile Multi-Sensor Platforms with Time Delay Estimation. IEEE Robot. Autom. Lett. 2019, 4, 902–909. [Google Scholar] [CrossRef]
He, Y.; Zhou, Z.; Tian, L.; Liu, Y.; Luo, X. Brown rice planthopper (Nilaparvata lugens Stal) detection based on deep learning. Precis. Agric. 2020, 21, 1385–1402. [Google Scholar] [CrossRef]
Ayan, E.; Erbay, H.; Varçın, F. Crop pest classification with a genetic algorithm-based weighted ensemble of deep convolutional neural networks. Comput. Electron. Agric. 2020, 179, 105809. [Google Scholar] [CrossRef]
Shi, F.; Liu, Y.; Wang, H. Target Detection in Remote Sensing Images Based on Multi-scale Fusion Faster RCNN. In Proceedings of the 2023 35th Chinese Control and Decision Conference (CCDC), Yichang, China, 20–22 May 2023; pp. 4043–4046. [Google Scholar] [CrossRef]
Wang, Z.; Qiao, L.; Wang, M. Agricultural pest detection algorithm based on improved faster RCNN. In Proceedings of the International Conference on Computer Vision and Pattern Analysis (ICCPA 2021), Hangzhou, China, 19–21 November 2021; Volume 12158, pp. 104–109. [Google Scholar]
Patel, D.; Bhatt, N. Improved accuracy of pest detection using augmentation approach with Faster R-CNN. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Sanya, China, 12–14 November 2021; Volume 1042, p. 012020. [Google Scholar]
Zhang, M.; Chen, Y.; Zhang, B.; Pang, K.; Lv, B. Recognition of pest based on faster rcnn. In Signal and Information Processing, Networking and Computers, Proceedings of the 6th International Conference on Signal and Information Processing, Networking and Computers (ICSINC), Guiyang, China, 13–16 August 2019; Springer: Singapore, 2020; pp. 62–69. [Google Scholar]
Deng, F.; Mao, W.; Zeng, Z.; Zeng, H.; Wei, B. Multiple diseases and pests detection based on federated learning and improved faster R-CNN. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
Sudha, C.; JaganMohan, K.; Arulaalan, M. Real Time Riped Fruit Detection using Faster R-CNN Deep Neural Network Models. In Proceedings of the 2022 International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN), Villupuram, India, 25–26 March 2022; pp. 1–4. [Google Scholar] [CrossRef]
Quach, L.-D.; Quoc, N.P.; Thi, N.H.; Tran, D.C.; Hassan, M.F. Using SURF to Improve ResNet-50 Model for Poultry Disease Recognition Algorithm. In Proceedings of the 2020 International Conference on Computational Intelligence (ICCI), Bandar Seri Iskandar, Malaysia, 8–9 October 2020; pp. 317–321. [Google Scholar] [CrossRef]
Gui, J.; Xu, H.; Fei, J. Non-destructive detection of soybean pest based on hyperspectral image and attention-resnet meta-learning model. Sensors 2023, 23, 678. [Google Scholar] [CrossRef]
Dewi, C.; Christanto, H.J.; Dai, G.W. Automated identification of insect pests: A deep transfer learning approach using resnet. Acadlore Trans. Mach. Learn 2023, 2, 194–203. [Google Scholar] [CrossRef]
Hassan, S.M.; Maji, A.K. Pest Identification based on fusion of Self-Attention with ResNet. IEEE Access 2024, 12, 6036–6050. [Google Scholar] [CrossRef]
Wang, P.; Luo, F.; Wang, L.; Li, C.; Niu, Q.; Li, H. S-ResNet: An improved ResNet neural model capable of the identification of small insects. Front. Plant Sci. 2022, 13, 1066115. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Wang, R.; Xie, C.; Liu, L.; Zhang, J.; Li, R.; Wang, F.; Zhou, M.; Liu, W. A recognition method for rice plant diseases and pests video detection based on deep convolutional neural network. Sensors 2020, 20, 578. [Google Scholar] [CrossRef] [PubMed]
Chen, M.; Chen, Y.; Guo, M.; Wang, J. Pest Detection and Identification Guided by Feature Maps. In Proceedings of the 2023 Twelfth International Conference on Image Processing Theory, Tools and Applications (IPTA), Paris, France, 16–19 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
Bari, B.S.; Islam, M.N.; Rashid, M.; Hasan, M.J.; Razman, M.A.M.; Musa, R.M.; Ab Nasir, A.F.; Majeed, A.P.A. A real-time approach of diagnosing rice leaf disease using deep learning-based faster R-CNN framework. PeerJ Comput. Sci. 2021, 7, e432. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Lee, J.; Jung, I.; Lee, Y. Improvement of object detection based on faster R-CNN and YOLO. In Proceedings of the 2021 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), Jeju, Republic of Korea, 27–30 June 2021; pp. 1–4. [Google Scholar]
Du, B.; Zhao, J.; Cao, M.; Li, M.; Yu, H. Behavior Recognition Based on Improved Faster RCNN. In Proceedings of the 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 23–25 October 2021; pp. 1–6. [Google Scholar] [CrossRef]
Lin, T.L.; Chang, H.Y.; Chen, K.H. The pest and disease identification in the growth of sweet peppers using faster R-CNN and mask R-CNN. J. Internet Technol. 2020, 21, 605–614. [Google Scholar]
Liu, J.; Zhang, G.; Feng, B.; Hou, Y.; Kang, W.; Shen, B. A Method for Plant Diseases Detection Based on Transfer Learning and Data Enhancement. In Proceedings of the 2022 International Conference on High Performance Big Data and Intelligent Systems (HDIS), Tianjin, China, 10–11 December 2022; pp. 154–158. [Google Scholar] [CrossRef]
Krishnan, H.; Lakshmi, A.A.; Anamika, L.S.; Athira, C.H.; Alaikha, P.V.; Manikandan, V.M. A Novel Underwater Image Enhancement Technique using ResNet. In Proceedings of the 2020 IEEE 4th Conference on Information & Communication Technology (CICT), Chennai, India, 3–5 December 2020; pp. 1–5. [Google Scholar] [CrossRef]
Krueangsai, A.; Supratid, S. Effects of Shortcut-Level Amount in Lightweight ResNet of ResNet on Object Recognition with Distinct Number of Categories. In Proceedings of the 2022 International Electrical Engineering Congress (iEECON), Khon Kaen, Thailand, 9–11 March 2022; pp. 1–4. [Google Scholar] [CrossRef]
Teng, Y.; Zhang, J.; Dong, S.; Zheng, S.; Liu, L. MSR-RCNN: A multi-class crop pest detection network based on a multi-scale super-resolution feature enhancement module. Front. Plant Sci. 2022, 13, 810546. [Google Scholar] [CrossRef] [PubMed]
Du, L.; Sun, Y.; Chen, S.; Feng, J.; Zhao, Y.; Yan, Z.; Zhang, X.; Bian, Y. A novel object detection model based on faster R-CNN for spodoptera frugiperda according to feeding trace of corn leaves. Agriculture 2022, 12, 248. [Google Scholar] [CrossRef]
Cao, C.; Wang, B.; Zhang, W.; Zeng, X.; Yan, X.; Feng, Z.; Liu, Y.; Wu, Z. An Improved Faster R-CNN for Small Object Detection. IEEE Access 2019, 7, 106838–106846. [Google Scholar] [CrossRef]
Yang, L.; Zhong, J.; Zhang, Y.; Bai, S.; Li, G.; Yang, Y.; Zhang, J. An Improving Faster-RCNN with Multi-Attention ResNet for Small Target Detection in Intelligent Autonomous Transport With 6G. IEEE Trans. Intell. Transp. Syst. 2023, 24, 7717–7725. [Google Scholar] [CrossRef]
Le, V.N.T.; Truong, G.; Alameh, K. Detecting weeds from crops under complex field environments based on Faster RCNN. In Proceedings of the 2020 IEEE Eighth International Conference on Communications and Electronics (ICCE), Phu Quoc Island, Vietnam, 13–15 January 2021; pp. 350–355. [Google Scholar] [CrossRef]
Shi, P.; Xu, X.; Ni, J.; Xin, Y.; Huang, W.; Han, S. Underwater Biological Detection Algorithm Based on Improved Faster-RCNN. Water 2021, 13, 2420. [Google Scholar] [CrossRef]
Altun, A.A.; Taghiyev, A. Advanced image processing techniques and applications for biological objects. In Proceedings of the 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), Beijing, China, 8–11 September 2017; pp. 340–344. [Google Scholar] [CrossRef]
Miao, Z.; Yu, X.; Li, N.; Zhang, Z.; He, C.; Li, Z.; Deng, C.; Sun, T. Efficient tomato harvesting robot based on image processing and deep learning. Precis. Agric. 2023, 24, 254–287. [Google Scholar] [CrossRef]
Al-Mashhadani, Z.; Chandrasekaran, B. Autonomous ripeness detection using image processing for an agricultural robotic system. In Proceedings of the 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 28–31 October 2020; pp. 743–748. [Google Scholar]
Puttemans, S.; Vanbrabant, Y.; Tits, L.; Goedemé, T. Automated visual fruit detection for harvest estimation and robotic harvesting. In Proceedings of the 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA), Oulu, Finland, 12–15 December 2016; pp. 1–6. [Google Scholar]
Xiao, F.; Wang, H.; Li, Y.; Cao, Y.; Lv, X.; Xu, G. Object detection and recognition techniques based on digital image processing and traditional machine learning for fruit and vegetable harvesting robots: An overview and review. Agronomy 2023, 13, 639. [Google Scholar] [CrossRef]
Nguyen, T.T.; Parron, J.; Obidat, O.; Tuininga, A.R.; Wang, W. Ready or Not? A Robot-Assisted Crop Harvest Solution in Smart Agriculture Contexts. In Proceedings of the 2023 IEEE International Conference on Smart Computing (SMARTCOMP), Nashville, TN, USA, 26–30 June 2023; pp. 373–378. [Google Scholar] [CrossRef]
Irham, A.; Kurniadi; Yuliandari, K.; Fahreza, F.M.A.; Riyadi, D.; Shiddiqi, A.M. AFAR-YOLO: An Adaptive YOLO Object Detection Framework. In Proceedings of the 2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS), Manama, Bahrain, 28–29 January 2024; pp. 594–598. [Google Scholar] [CrossRef]
Gai, R.; Chen, N.; Yuan, H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput. Appl. 2023, 35, 13895–13906. [Google Scholar] [CrossRef]
Paul, A.; Machavaram, R.; Kumar, D.; Nagar, H. Smart solutions for capsicum Harvesting: Unleashing the power of YOLO for Detection, Segmentation, growth stage Classification, Counting, and real-time mobile identification. Comput. Electron. Agric. 2024, 219, 108832. [Google Scholar] [CrossRef]
Selvam, N.A.M.B.; Ahmad, Z.; Mohtar, I.A. Real time ripe palm oil bunch detection using YOLO V3 algorithm. In Proceedings of the 2021 IEEE 19th Student Conference on Research and Development (SCOReD), Kota Kinabalu, Malaysia, 23–25 November 2021; pp. 323–328. [Google Scholar]
Aljaafreh, A.; Elzagzoug, E.Y.; Abukhait, J.; Soliman, A.-H.; Alja’afreh, S.S.; Sivanathan, A.; Hughes, J. A Real-Time Olive Fruit Detection for Harvesting Robot Based on YOLO Algorithms. Acta Technol. Agric. 2023, 26, 121–132. [Google Scholar] [CrossRef]
Shi, J.; Bai, Y.; Diao, Z.; Zhou, J.; Yao, X.; Zhang, B. Row detection BASED navigation and guidance for agricultural robots and autonomous vehicles in row-crop fields: Methods and applications. Agronomy 2023, 13, 1780. [Google Scholar] [CrossRef]
Yang, Q.; Du, X.; Wang, Z.; Meng, Z.; Ma, Z.; Zhang, Q. A review of core agricultural robot technologies for crop productions. Comput. Electron. Agric. 2023, 206, 107701. [Google Scholar] [CrossRef]
Tian, H.; Wang, T.; Liu, Y.; Qiao, X.; Li, Y. Computer vision technology in agricultural automation—A review. Inf. Process. Agric. 2020, 7, 1–19. [Google Scholar] [CrossRef]
Zhou, J.; Yungbluth, D.; Vong, C.N.; Scaboo, A.; Zhou, J. Estimation of the Maturity Date of Soybean Breeding Lines Using UAV-Based Multispectral Imagery. Remote Sens. 2019, 11, 2075. [Google Scholar] [CrossRef]
Chong, A.-M.; Yeo, B.-C.; Lim, W.-S. Integration of UWB RSS to Wi-Fi RSS fingerprinting-based indoor positioning system. Cogent Eng. 2022, 9, 2087364. [Google Scholar] [CrossRef]
Alsmadi, L.; Kong, X.; Sandrasegaran, K.; Fang, G. An Improved Indoor Positioning Accuracy Using Filtered RSSI and Beacon Weight. IEEE Sens. J. 2021, 21, 18205–18213. [Google Scholar] [CrossRef]
Wang, J.; Park, J. An Enhanced Indoor Positioning Algorithm Based on Fingerprint Using Fine-Grained CSI and RSSI Measurements of IEEE 802.11n WLAN. Sensors 2021, 21, 2769. [Google Scholar] [CrossRef] [PubMed]
Lou, X.; Ye, K.; Jiang, R.; Wang, S. Research and Implementation of Mono-Anchor AOA Positioning System Based on UWB. In Proceedings of the The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, Xi’an, China, 1–3 August 2020. [Google Scholar]
Zhang, K.; Shen, C.; Aslam, B.U.; Long, K.; Chen, X.; Li, N. Simulation Optimization of AOA Estimation Algorithm Based on MIMO UWB Communication System. In Proceedings of the 2nd International Conference on Information Technologies and Electrical Engineering, Changsha, China, 6–7 December 2019. [Google Scholar] [CrossRef]
Wang, T.; Man, Y.; Shen, Y. A Deep Learning Based AoA Estimation Method in NLOS Environments. In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021. [Google Scholar] [CrossRef]
Sidorenko, J.; Schatz, V.; Scherer-Negenborn, N.; Arens, M.; Hugentobler, U. Error corrections for ultra-wideband ranging. IEEE Trans. Instrum. Meas. 2020, 69, 9037–9047. [Google Scholar] [CrossRef]
Li, D.; Chen, L.; Hu, J.; Wu, H. Research on UWB Positioning Based on Improved ABC Algorithm. In Proceedings of the 2021 4th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Yibin, China, 20–22 August 2021. [Google Scholar] [CrossRef]
Vecchia, D.; Corbalan, P.; Istomin, T.; Picco, G.P. TALLA: Large-scale TDoA Localization with Ultra-wideband Radios. In Proceedings of the IEEE 2019 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Pisa, Italy, 30 September–3 October 2019; pp. 1–8. [Google Scholar] [CrossRef]
Liu, Z.; Xiao, G.; Liu, H.; Wei, H. Multi-Sensor Measurement and Data Fusion. IEEE Instrum. Meas. Mag. 2022, 25, 28–36. [Google Scholar] [CrossRef]
de Farias, C.M.; Pirmez, L.; Fortino, G.; Guerrieri, A. A multi-sensor data fusion technique using data correlations among multiple applications. Future Gener. Comput. Syst. 2018, 92, 109–118. [Google Scholar] [CrossRef]
Senel, N.; Kefferpütz, K.; Doycheva, K.; Elger, G. Multi-Sensor Data Fusion for Real-Time Multi-Object Tracking. Processes 2023, 11, 501. [Google Scholar] [CrossRef]
Xu, X.; Zhang, L.; Yang, J.; Cao, C.; Wang, W.; Ran, Y.; Tan, Z.; Luo, M. A Review of Multi-Sensor Fusion SLAM Systems Based on 3D LIDAR. Remote Sens. 2022, 14, 2835. [Google Scholar] [CrossRef]
Shin, Y.-S.; Park, Y.S.; Kim, A. DVL-SLAM: Sparse depth enhanced direct visual-LiDAR SLAM. Auton. Robot. 2019, 44, 115–130. [Google Scholar] [CrossRef]
Kang, H.; Wang, X.; Chen, C. Accurate fruit localization using high resolution LiDAR-camera fusion and instance segmentation. Comput. Electron. Agric. 2022, 203, 107450. [Google Scholar] [CrossRef]
Mai, N.A.M.; Duthon, P.; Khoudour, L.; Crouzil, A.; Velastin, S.A. 3D Object Detection with SLS-Fusion Network in Foggy Weather Conditions. Sensors 2021, 21, 6711. [Google Scholar] [CrossRef]
Mai, N.A.M.; Duthon, P.; Salmane, P.H.; Khoudour, L.; Crouzil, A.; Velastin, S.A. Camera and LiDAR analysis for 3D object detection in foggy weather conditions. In Proceedings of the International Conference on Pattern Recognition Systems (ICPRS), Saint-Etienne, France, 7–10 June 2022. [Google Scholar] [CrossRef]
Lv, M.; Wei, H.; Fu, X.; Wang, W.; Zhou, D. A Loosely Coupled Extended Kalman Filter Algorithm for Agricultural Scene-Based Multi-Sensor Fusion. Front. Plant Sci. 2022, 13, 849260. [Google Scholar] [CrossRef]
Gao, P.; Lee, H.; Jeon, C.-W.; Yun, C.; Kim, H.-J.; Wang, W.; Liang, G.; Chen, Y.; Zhang, Z.; Han, X. Improved Position Estimation Algorithm of Agricultural Mobile Robots Based on Multisensor Fusion and Autoencoder Neural Network. Sensors 2022, 22, 1522. [Google Scholar] [CrossRef]
Tang, B.; Guo, Z.; Huang, C.; Huai, S.; Gai, J. A Fruit-Tree Mapping System for Semi-Structured Orchards based on Multi-Sensor-Fusion SLAM. IEEE Access, 2024; early access. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, H.; Zhang, F.; Zhang, B.; Tao, S.; Li, H.; Qi, K.; Zhang, S.; Ninomiya, S.; Mu, Y. Real-Time Localization and Colorful Three-Dimensional Mapping of Orchards Based on Multi-Sensor Fusion Using Extended Kalman Filter. Agronomy 2023, 13, 2158. [Google Scholar] [CrossRef]
Ding, H.; Zhang, B.; Zhou, J.; Yan, Y.; Tian, G.; Gu, B. Recent developments and applications of simultaneous localization and mapping in agriculture. J. Field Robot. 2022, 39, 956–983. [Google Scholar] [CrossRef]
Xie, B.; Jin, Y.; Faheem, M.; Gao, W.; Liu, J.; Jiang, H.; Cai, L.; Li, Y. Research progress of autonomous navigation technology for multi-agricultural scenes. Comput. Electron. Agric. 2022, 211, 107963. [Google Scholar] [CrossRef]
Dyson, J.; Mancini, A.; Frontoni, E.; Zingaretti, P. Deep Learning for Soil and Crop Segmentation from Remotely Sensed Data. Remote. Sens. 2019, 11, 1859. [Google Scholar] [CrossRef]
Azadnia, R.; Jahanbakhshi, A.; Rashidi, S.; Khajehzadeh, M.; Bazyar, P. Developing an automated monitoring system for fast and accurate prediction of soil texture using an image-based deep learning network and machine vision system. Measurement 2021, 190, 110669. [Google Scholar] [CrossRef]
Khanal, S.; Fulton, J.; Klopfenstein, A.; Douridas, N.; Shearer, S. Integration of high resolution remotely sensed data and machine learning techniques for spatial prediction of soil properties and corn yield. Comput. Electron. Agric. 2018, 153, 213–225. [Google Scholar] [CrossRef]
Du, Y.; Mallajosyula, B.; Sun, D.; Chen, J.; Zhao, Z.; Rahman, M.; Quadir, M.; Jawed, M.K. A Low-cost Robot with Autonomous Recharge and Navigation for Weed Control in Fields with Narrow Row Spacing. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 3263–3270. [Google Scholar] [CrossRef]
Paul, S.; Jhamb, B.; Mishra, D.; Kumar, M.S. Edge loss functions for deep-learning depth-map. Mach. Learn. Appl. 2022, 7, 100218. [Google Scholar] [CrossRef]
Chen, C.-J.; Huang, Y.-Y.; Li, Y.-S.; Chen, Y.-C.; Chang, C.-Y.; Huang, Y.-M. Identification of Fruit Tree Pests With Deep Learning on Embedded Drone to Achieve Accurate Pesticide Spraying. IEEE Access 2021, 9, 21986–21997. [Google Scholar] [CrossRef]
Sorbelli, F.B.; Palazzetti, L.; Pinotti, C.M. YOLO-based detection of Halyomorpha halys in orchards using RGB cameras and drones. Comput. Electron. Agric. 2023, 213, 108228. [Google Scholar] [CrossRef]
Xu, K.; Zhu, Y.; Cao, W.; Jiang, X.; Jiang, Z.; Li, S.; Ni, J. Multi-Modal Deep Learning for Weeds Detection in Wheat Field Based on RGB-D Images. Front. Plant Sci. 2021, 12, 732968. [Google Scholar] [CrossRef] [PubMed]
Fu, L.; Majeed, Y.; Zhang, X.; Karkee, M.; Zhang, Q. Faster R–CNN–based apple detection in dense-foliage fruiting-wall trees using RGB and depth features for robotic harvesting. Biosyst. Eng. 2020, 197, 245–256. [Google Scholar] [CrossRef]
Islam, M.H.; Wadud, M.F.; Rahman, M.R.; Alam, A.S. Greenhouse Monitoring and Harvesting Mobile Robot with 6DOF Manipulator Utilizing ROS, Inverse Kinematics and Deep Learning Models. Ph.D. Dissertation, Brac University, Dhaka, Bangladesh, 2022. Available online: http://hdl.handle.net/10361/21813 (accessed on 20 January 2022).
Aghi, D.; Cerrato, S.; Mazzia, V.; Chiaberge, M. Deep Semantic Segmentation at the Edge for Autonomous Navigation in Vineyard Rows. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 3421–3428. [Google Scholar] [CrossRef]
Ji, W.; Gao, X.; Xu, B.; Chen, G.; Zhao, D. Target recognition method of green pepper harvesting robot based on manifold ranking. Comput. Electron. Agric. 2020, 177, 105663. [Google Scholar] [CrossRef]
Li, J.; Tang, Y.; Zou, X.; Lin, G.; Wang, H. Detection of Fruit-Bearing Branches and Localization of Litchi Clusters for Vision-Based Harvesting Robots. IEEE Access 2020, 8, 117746–117758. [Google Scholar] [CrossRef]
Adhikari, S.P.; Kim, G.; Kim, H. Deep Neural Network-Based System for Autonomous Navigation in Paddy Field. IEEE Access 2020, 8, 71272–71278. [Google Scholar] [CrossRef]
Balaska, V.; Adamidou, Z.; Vryzas, Z.; Gasteratos, A. Sustainable Crop Protection via Robotics and Artificial Intelligence Solutions. Machines 2023, 11, 774. [Google Scholar] [CrossRef]

Figure 1. The basic structure of Faster R-CNN.

Figure 2. Faster R-CNN framework diagram.

Figure 3. Distributed framework diagram.

Figure 4. The overall framework of the attention ResNet network.

Figure 5. Transfer learning strategy model.

Figure 6. Ripeness estimation algorithm.

Figure 7. The convolutional neural network-based pepper counting algorithm.

Figure 8. Detection algorithm for ripe palm fruit bunches.

Figure 9. Indoor fingerprint localization architecture.

Figure 10. UWB MIMO system transmitter structure.

Figure 11. UWB MIMO system receiver structure.

Figure 12. The effect of signal power on TDOA ranging is shown in (a) and the effect of the hardware offset in (b) on the TDOA.

Figure 13. Algorithm flow chart of litchi fruit-bearing branch detection and localization based on RGB-D images.

Table 1. Comparative analysis of research outcomes from various algorithms in pest detection.

Model	Research Object	Technical Characteristics and Performance Indicators	References
Faster R-CNN	Farmland in a natural environment	This study utilized an enhanced Faster R-CNN to identify agricultural pests under natural environmental conditions; the mAP has increased from 62.88% to 87.7%, and the time consumption has increased by 12.29 milliseconds.	[13]
	White grub, Helicoverpa, and Spodoptera	The Faster R-CNN architecture proposed in this study, based on deep learning using TensorFlow (TensorFlow v2.6) for multi-class pest detection and classification, achieved an insect detection rate of 91.02%.	[14]
	Actual field scene.	This study utilized an improved Faster R-CNN for pest detection, achieving 84.96% mAP with a 5% accuracy improvement observed for 10 common pest samples.	[15]
Faster R-CNN	Apple	This study employed an enhanced Faster R-CNN to detect pests on apples in orchards, achieving an average precision mAP) of 89.34%. The model showed improved fitting, with training speed increasing by 59%.	[16]
	Target detection and behavior prediction	This study employs an improved Faster R-CNN for action recognition, effectively detecting actions in images with an mAP of 67.2%, an increase of 14 percentage points.	[27]
	Plant Diseases Detection	This study compares the improved Faster R-CNN before and after enhancement and finds that with a training epoch of 20,000, PD-IFRCNN can enhance the model’s generalization capability without sacrificing detection accuracy, thus mitigating overfitting.	[29]
ResNet	Soybean	This study combined the ResNet network with Attention and designed the classifier as a multi-class Support Vector Machine (SVM) for learning. Experimental results demonstrate that the model achieved an accuracy of 94.57 ± 0.19%.	[19]
	Injurious insect	This study proposes a method to enhance automation by manually segmenting images or extracting features in the preprocessing stage. When ResNet-50 is applied in a transfer learning paradigm, the model achieves a typical classification accuracy of 99.40% in pest detection.	[20]
	Injurious insect	This study introduces the ResNet50-SA model, which combines the self-attention mechanism with ResNet. This model achieves a detection accuracy of 99.80% for specific pests.	[21]
	Aphid, Red Spider, Locust et al.	This study utilized the improved S-ResNet for specific pest identification. Compared to meta-models with depths of 18, 30, and 50 layers, this model achieved a 7% improvement in recognition accuracy.	[22]
	Rice Plant Diseases	This study employed an improved ResNet model for identifying and detecting rice diseases, achieving a positional accuracy of 88.90% with the model used in this research.	[23]

Table 2. Comparative study of research outcomes from various algorithms in maturity detection.

Model	Research Object	Technical Characteristics and Performance Indicators	References
Image Processing	Tomato	This experiment employed classical image processing methods combined with the YOLOv5 network for tomato harvesting, achieving a harvesting success rate of 90%.	[39]
	Tomato	This study employed computer vision and image processing techniques to detect the ripeness of tomatoes, enabling accurate differentiation between fully ripe and nearly ripe fruits.	[41]
	Strawberry and Apple	This experiment utilized color thresholding for fruit detection and classification, achieving a detection accuracy of over 90%.	[42]
YOLO	Cherry Fruit	This study employed an optimized YOLO-V4 algorithm for cherry fruit detection, achieving an accuracy of 95.2%. The YOLO-V4-Dense model exhibited a 0.15 increase in mAP compared to the YOLO-V4 model.	[45]
	Capsicum	This study explores various YOLO algorithms to enhance pepper harvesting. Among these, the YOLOv8s model achieved a mAP of 0.967 at a 0.5 Intersection over Union threshold, marking it as the most successful model.	[46]
	Palm Oil	This study utilizes an enhanced version of the YOLOv3 algorithm for real-time detection and harvesting of oil palm bunches, achieving a 95.18% mAP at the 5000th iteration.	[47]
	Olive Fruit	This study utilized an improved YOLOv5 network model for detecting olives, achieving high-precision detection of olive fruits with mAP_0.5 exceeding 0.75.	[48]

Table 3. Comparison chart of UWB technology with other positioning technologies.

Positioning Technology	Positioning Accuracy	Spectral Range	Positioning Strengths and Weaknesses
Wi-Fi	2~50 m	2.4 GHz, 5 GHz	Low cost, strong communication capability, but susceptible to environmental interference.
Bluetooth	2~10 m	2.4 GH~2.4835 GHz	Easy to integrate and popularize; however, it has limited transmission distances and lower stability.
ZigBee	1~2 m	2.4 GHz, 816/915 MHz	Low power consumption and cost-effective; however, it exhibits poor stability and susceptibility to environmental interference.
Infrared	5~10 m	0.3~400 THz	While it offers high positioning accuracy, it is limited by short line-of-sight and transmission distances and is susceptible to interference.
RFID	0.05~5 m	LF: 120~150 kHz HF: 13.56 MHz UHF: 433 MHz, 800/900 MHz, 2.45 GHz, 5.8 GHz	Economical with high precision yet limited by short positioning distance.
UWB	6~10 cm	3.1~10.6 GHz	The system features centimeter-level high positioning accuracy, robust anti-interference capabilities, efficient real-time transmission, stability, and strong signal penetration. Nevertheless, this advanced performance is accompanied by higher implementation costs. Additionally, its accuracy is susceptible to non-line-of-sight errors, multipath effects, challenges in clock synchronization, and variations in the spatial arrangement of base stations.

Table 4. Transmission loss model parameters in different environments.

Application Environment (A)	Transmission Loss Coefficient	The Signal Power Value at Distance A (dBm)	Random Variable
Indoor environment with line-of-sight (LOS)	1.7	5.07	2.22
Indoor environment with non-line-of-sight (NLOS)	4.58	3.64	3.51
Outdoor environment LOS (line-of-sight).	1.76	4.86	0.83
Outdoor environment NLOS (non-line-of-sight)	2.5	4.23	2

Table 5. Two types of positioning system configuration data parameter tables.

Product Name		AOA Positioning Base Station	AOA Asset Tracking Beacon
Product Model		CL-GA10-P	CL-TA40
Product Parameters	Network Interface	10/100M RJ45	power switch
Product Parameters	Power Supply	standard PoE	button cell battery
Technical Specifications	Installation	standard Ethernet interface	standard Ethernet and Bluetooth interface
Technical Specifications	Data Interface	The maximum coverage radius extends to twice the installation height of the base station, with an upper limit for installation height set at 10 m.	Within the permissible range, the height typically does not exceed 10 m.
General Parameters	Positioning Accuracy	0.1–1 m	Less than 1 m
General Parameters	Operating Temperature	−20~60 °C	−20~60 °C

Table 6. The characteristics table of the three forms of laser radar.

Type	Mechanical Type	Semi-Solid-State Type			Solid-State Type
Architecture	Mechanical Rotation	MEMS	Mirror	Prism	FLASH	OPA
Technical features include	Mechanical components drive the launcher to rotate and pitch.	Utilizes microscanning mirror to reflect laser light.	Motor-driven mirror rotation while the transceiver module remains stationary.	Uses polygonal, irregular mirrors for non-repetitive scanning.	Employs short-pulsed lasers for large area coverage, followed by high-sensitivity detectors for image detection.	Adjusts the phase of each phase shifter in the array phase shifter to emit light in a specific direction using interferometric principles.
Advantages	Fast scanning speed, 5–20 revolutions per second; high accuracy.	Reduced moving parts, improved reliability, lower cost.	Low power consumption, high accuracy, lower cost.	High point cloud density, capable of long-range detection.	Small size, large amount of information, simple structure.	Fast scanning speed, high accuracy, lower cost after mass production.
Disadvantages	Poor stability, high automotive-grade costs, manual assembly involvement, low reliability, and short lifespan.	Limited detectionangles, significant difficulty in electroplating adjustment links if splicing is required, and a short lifespan.	Low signal-to-noise ratio, limited detection distance segment, and restricted FOV (Field of View) angle.	Complex mechanical structure prone to bearing or bushing wear.	Low detection accuracy and short detection distance.	Stringent requirements for materials and processes are leading to current high costs.
Cost	Over USD 3000.	Over USD 500 to USD 1000.	From USD 500 to USD 1200.	USD 800.	Expected to drop below USD 500 after mass production.	Expected to drop below USD 200 after mass production.

Table 7. The study calculates both the average absolute errors and the root mean square deviations in the x, y, and z directions to assess the accuracy of the measurements.

Compare	Average of Absolute Value of Errors in x, y, and z Directions			Mean Square Deviation of Errors in x, y, and z Directions
Axis	x/m	y/m	z/m	x/m	y/m	z/m
Fusion-groundtruth	4.5948	14.9209	4.7763	2.6959	2.2211	2.2949
MSCKF-groundtruth	20.9995	75.3966	54.2210	35.0502	96.6908	77.0708
IMU-ndtruth	45.2532	75.0033	0.8808	25.8619	44.9099	0.2860

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Chen, L.; Huang, Z.; Zhang, W.; Wu, S. A Review on the High-Efficiency Detection and Precision Positioning Technology Application of Agricultural Robots. Processes 2024, 12, 1833. https://doi.org/10.3390/pr12091833

AMA Style

Wang R, Chen L, Huang Z, Zhang W, Wu S. A Review on the High-Efficiency Detection and Precision Positioning Technology Application of Agricultural Robots. Processes. 2024; 12(9):1833. https://doi.org/10.3390/pr12091833

Chicago/Turabian Style

Wang, Ruyi, Linhong Chen, Zhike Huang, Wei Zhang, and Shenglin Wu. 2024. "A Review on the High-Efficiency Detection and Precision Positioning Technology Application of Agricultural Robots" Processes 12, no. 9: 1833. https://doi.org/10.3390/pr12091833

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review on the High-Efficiency Detection and Precision Positioning Technology Application of Agricultural Robots

Abstract

1. Introduction

2. Detection of Agricultural Pests in Farmland Based on Image Processing Algorithms

2.1. Pest Detection Based on the Faster R-CNN Algorithm

2.2. Detection for Improvement and Applications of Faster R-CNN and ResNet AlgorithMS

2.3. The Study Utilized a Dataset Composed of Images and Videos of Rice Pests and Diseases for Training

2.4. Improvements and Applications of Faster R-CNN and ResNet Algorithms

3. Detection of Mature Crops Using Agricultural Robots

3.1. Maturity Detection Based on Image Processing

3.2. Maturity Detection Based on the YOLO Algorithm

3.3. Comprehensive Application of Agricultural Robots in Crop Maturity Detection

4. Leveraging UWB Technology for Localization and Its Various Application Scenarios

4.1. RSSI-Assisted Localization Method Based on Integrated UWB Technology

4.2. The AOA (Angle of Arrival) Positioning Method Based on UWB (Ultra-Wideband) Technology

4.3. TOA and TODA Localization Methods Incorporating UWB Technology

5. Application of Multi-Sensor Fusion and Integration with Deep Learning Algorithms

5.1. Fusion Localization Leveraging LiDAR and Vision Sensors Enhances Precision in Spatial Positioning

5.2. Integration of Multi-Sensor Fusion Technology with Deep Learning Algorithms for Localization Applications in Agriculture

6. Comprehensive Applications of Agricultural Robots for Detection and Localization

6.1. Detection of Soil and Crop Positioning and Planting Using Deep Learning Algorithms

6.2. Integration of Depth Cameras and Deep Learning Algorithms for Pest and Weed Localization and Detection

6.3. Localization and Harvesting of Crops Based on Depth Images and Deep Learning Algorithm Integration

7. Discussion and Prospects

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI