Enhancing Autonomous Orchard Navigation: A Real-Time Convolutional Neural Network-Based Obstacle Classification System for Distinguishing ‘Real’ and ‘Fake’ Obstacles in Agricultural Robotics

Syed, Tabinda Naz; Zhou, Jun; Lakhiar, Imran Ali; Marinello, Francesco; Gemechu, Tamiru Tesfaye; Rottok, Luke Toroitich; Jiang, Zhizhen

doi:10.3390/agriculture15080827

Open AccessArticle

Enhancing Autonomous Orchard Navigation: A Real-Time Convolutional Neural Network-Based Obstacle Classification System for Distinguishing ‘Real’ and ‘Fake’ Obstacles in Agricultural Robotics

by

Tabinda Naz Syed

¹

,

Jun Zhou

^1,*,

Imran Ali Lakhiar

²

,

Francesco Marinello

³

,

Tamiru Tesfaye Gemechu

¹

,

Luke Toroitich Rottok

¹ and

Zhizhen Jiang

¹

College of Engineering, Nanjing Agricultural University, Nanjing 210095, China

²

Research Center of Fluid Machinery Engineering and Technology, Jiangsu University, Zhenjiang 212013, China

³

Department of Land, Environment, Agriculture and Forestry, University of Padova, 35020 Legnaro, Italy

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(8), 827; https://doi.org/10.3390/agriculture15080827

Submission received: 6 March 2025 / Revised: 8 April 2025 / Accepted: 8 April 2025 / Published: 10 April 2025

(This article belongs to the Special Issue Cutting-Edge Technology in Agricultural Robotics: Sensing and Actuation)

Download

Browse Figures

Versions Notes

Abstract

:

Autonomous navigation in agricultural environments requires precise obstacle classification to ensure collision-free movement. This study proposes a convolutional neural network (CNN)-based model designed to enhance obstacle classification for agricultural robots, particularly in orchards. Building upon a previously developed YOLOv8n-based real-time detection system, the model incorporates Ghost Modules and Squeeze-and-Excitation (SE) blocks to enhance feature extraction while maintaining computational efficiency. Obstacles are categorized as “Real”—those that physically impact navigation, such as tree trunks and persons—and “Fake”—those that do not, such as tall weeds and tree branches—allowing for precise navigation decisions. The model was trained on separate orchard and campus datasets and fine-tuned using Hyperband optimization and evaluated on an external test set to assess generalization to unseen obstacles. The model’s robustness was tested under varied lighting conditions, including low-light scenarios, to ensure real-world applicability. Computational efficiency was analyzed based on inference speed, memory consumption, and hardware requirements. Comparative analysis against state-of-the-art classification models (VGG16, ResNet50, MobileNetV3, DenseNet121, EfficientNetB0, and InceptionV3) confirmed the proposed model’s superior precision (p), recall (r), and F1-score, particularly in complex orchard scenarios. The model maintained strong generalization across diverse environmental conditions, including varying illumination and previously unseen obstacles. Furthermore, computational analysis revealed that the orchard-combined model achieved the highest inference speed at 2.31 FPS while maintaining a strong balance between accuracy and efficiency. When deployed in real-time, the model achieved 95.0% classification accuracy in orchards and 92.0% in campus environments. The real-time system demonstrated a false positive rate of 8.0% in the campus environment and 2.0% in the orchard, with a consistent false negative rate of 8.0% across both environments. These results validate the model’s effectiveness for real-time obstacle differentiation in agricultural settings. Its strong generalization, robustness to unseen obstacles, and computational efficiency make it well-suited for deployment in precision agriculture. Future work will focus on enhancing inference speed, improving performance under occlusion, and expanding dataset diversity to further strengthen real-world applicability.

Keywords:

obstacle detection and classification; autonomous robots; convolutional neural networks (CNNs); vision-based sensing; real-time decision-making

1. Introduction

The global challenges of climate change, population growth, and food security have driven significant advancements in modern agriculture. Researchers have introduced several mechanization technologies, including advanced farm machinery, autonomous navigation systems, artificial intelligence, sensing technologies, and communication tools, to enhance productivity and sustainability [1,2]. These innovations improve land utilization, labor efficiency, and resource management and increase farmers’ profitability. Among these advancements, integrating autonomous navigation systems with machine vision and image processing has become crucial for intelligent farm machinery [3]. This technology enables precise movement, real-time crop monitoring, and efficient execution of field tasks, reducing labor dependency and operational costs [4,5].

Mobile robots, such as Unmanned Ground Vehicles (UGVs), are increasingly used in complex and unstructured environments, extending beyond traditional indoor settings. With the increasing environmental complexity, fast and robust environment perception and recognition have become critical for autonomous robotic navigation [6,7]. These environments pose unique challenges for autonomous navigation due to dynamic conditions, irregular terrain, and dense vegetation. Adequate perception and recognition of surroundings are critical for safe and efficient movement [8]. These robots must distinguish between “Real” obstacles (e.g., tree trunks, humans) that require avoidance and “Fake” obstacles (e.g., branches, tall grass) that do not impede movement [9,10]. This distinction enhances decision-making, optimizes navigation, and improves safety and efficiency [11,12]. Existing research underscores the need for robust scene understanding in agricultural robotics, particularly in environments with varying illumination, unstructured boundaries, and unpredictable features [13,14].

An orchard navigation system comprises three key components: environment detection, path planning, and navigation control. Presently, the primary sensors used for environmental sensing include the Global Navigation Satellite System (GNSS), machine vision, Light Detection and Ranging (LiDAR), and multi-sensor fusion [15]. Machine learning and artificial intelligence models have recently been widely applied in real-world situations, such as feature extraction and target detection in field environments [16]. Similarly, artificial intelligence has garnered significant attention in traditional agriculture, enabling the planning of various activities and missions effectively by utilizing limited resources with minimal human interference [17].

Furthermore, deep learning has significantly enhanced obstacle detection in agricultural robotics, primarily through convolutional neural networks (CNNs). Models such as YOLO (You Only Look Once) have gained prominence for real-time obstacle detection. YOLOv3 balances speed and accuracy, resulting in optimized versions such as YOLOv4 and YOLOv5, better suited for agricultural robotics [18,19,20,21,22,23,24]. Wang and Wei [6] stated that for a machine-learning approach, a large amount of labeled data for training, computation, and storage is required. Therefore, using deep learning algorithms necessitates support from powerful computational resources, such as high-performance graphics processors (GPUs) and high-capacity storage devices. However, despite these advancements, orchard environments remain semi-structured, requiring robots to detect and avoid obstacles [25] promptly. Dynamic conditions and environmental variability compound the challenge of detecting obstacles in such settings. To enhance robustness, several modifications to standard models have been proposed [26,27,28,29]. Recent studies have explored the integration of multimodal sensors (e.g., LIDAR, radar, and cameras) to mitigate environmental noise and occlusions, thereby improving detection accuracy, reducing false positives, and facilitating real-time decision-making [30,31,32,33,34,35]. Advanced obstacle avoidance algorithms, such as the Obstacle-Dependent Gaussian Potential Field (ODG-PF) model, have also been introduced to refine collision avoidance by evaluating obstacle proximity and collision probabilities [36].

Han et al. [37] combined 3D CNNs with Long Short-Term Memory (LSTM) networks to enhance obstacle recognition by integrating spatial and temporal features. While CNN-based architectures, such as Mask R-CNN and Faster R-CNN, have been explored for dynamic and multi-class detection tasks, these models and traditional deep learning approaches focus on object detection rather than classifying obstacles into functional categories. Although YOLOv5 and YOLOv8 offer high detection accuracy, they do not inherently differentiate between ‘Real’ and ‘Fake’ obstacles, making them suboptimal for autonomous navigation in semi-structured orchard environments [38,39].

Moreover, some recent works incorporate advanced sensor fusion techniques, integrating multiple sensing modalities, such as LIDAR, cameras, and radar, to enhance obstacle detection accuracy. However, these approaches often increase computational complexity and hardware costs, limiting their deployment on lightweight, mobile agricultural robots. Thus, a computationally efficient yet robust classification mechanism is required to enhance real-time obstacle differentiation without significantly increasing processing overhead.

Machine vision solutions often rely on depth cameras and deep learning algorithms for obstacle detection [18,19,20,40]. Feng et al. [41] developed a segmentation network with dual semantic-feature complementary fusion to segment diverse fake obstacles. At the same time, Liu et al. [42] proposed the YOLO-SCG model for faster and more precise target recognition. Koch et al. [43] evaluated the geometric features of potholes, and Matthies et al. [44] introduced a method for detecting fake obstacles using thermal infrared imaging. However, existing methods primarily focus on identifying obstacles rather than differentiating their impact on navigation.

In real-world orchard environments, a binary approach to obstacle detection is insufficient, as not all detected obstacles require the same level of avoidance. For instance, a tree trunk should trigger a different response than a flexible branch, which an autonomous robot can navigate without requiring a detour. However, current research lacks a dedicated real-time classification mechanism to separate obstacles into actionable categories [45,46]. This limitation can lead to unnecessary path deviations, reduced efficiency, and increased operational complexity for orchard robots.

This study enhances the YOLOv8 detection framework to address these challenges by integrating a lightweight CNN classifier with Ghost Modules and Squeeze-and-Excitation (SE) blocks. These modifications improve feature extraction while reducing computational overhead, ensuring efficient real-time deployment in orchard environments. By distinguishing between “Real” obstacles (e.g., tree trunks, humans) and “Fake” obstacles (e.g., branches, tall grass), our model minimizes unnecessary stops and detours, thereby improving navigation efficiency. Additionally, Hyperband optimization balances accuracy and speed for real-world applications.

Building on our previous work [47], which enhanced YOLOv8-based obstacle detection but lacked classification capabilities, we introduce a dedicated classification mechanism that distinguishes between obstacles that require avoidance and those that do not. To validate the effectiveness of our model, we evaluate it on multiple datasets, including orchard-based and campus datasets, as well as an external dataset [48] to ensure generalizability.

The key contributions include the following:

Utilizing Ghost Modules and SE blocks to enhance feature extraction while reducing computational overhead, facilitating deployment on energy-limited robotic platforms.
Ensuring model robustness by training on diverse datasets (orchard and campus environments) to improve adaptability to varying operational conditions.
Evaluating model generalization on an open dataset to ensure applicability to previously unseen obstacles [48].
Employing Hyperband optimization for efficient hyperparameter tuning, increasing accuracy while reducing training time for real-time applications.
Classifying obstacles into actionable categories (“Real” and “Fake”) to facilitate faster decision-making for collision avoidance and navigation.
Demonstrating superior performance compared to conventional classifiers and state-of-the-art models in terms of accuracy, efficiency, and reliability within practical orchard environments.

With these advancements, our work aims to significantly improve the reliability and efficiency of autonomous robots in orchard settings. By providing a robust solution for real-time obstacle classification, our model enhances robots’ ability to make informed decisions regarding obstacle avoidance and trajectory adjustments, ultimately improving safety and optimizing navigation in dynamic agricultural environments. Furthermore, this paper is structured as follows: Section 1 introduces the research objectives, highlights the significance of obstacle classification, and identifies the gap addressed by our approach. Section 2 presents the system description, navigation framework, experimental setup, dataset preparation, and model architecture design. Section 3 details the experimental results, analyzes model performance compared to traditional classifiers, and evaluates its effectiveness in real-world orchard and campus environments. Section 4 discusses the results, elaborates on strengths and limitations, explores practical implications, and suggests future research directions. Finally, Section 5 concludes the paper with key insights and recommendations.

2. Materials and Methods

2.1. System Description and Vision System Overview

The teleoperated orchard management robot developed by our team integrates multiple high-precision sensors for accurate environmental monitoring. These sensors include the RS-LIDAR-16 for three-dimensional mapping, a high-resolution camera for visual data acquisition, a GNSS unit for precise positioning, and an IMU for motion sensing, as illustrated in Figure 1. The RS-LIDAR-16 (Robosense, Shenzhen, China) was chosen for its cost-effectiveness and sufficient performance for navigation and mapping, serving a complementary role within the sensor suite. Its compatibility with other sensors facilitated real-time data processing. Although its contribution was limited in this study, it remains a valuable component for future research where its capabilities can be fully utilized.

The robot was manually operated via a wireless control interface, with sensor data recorded at a rate of 10 Hz for immediate processing and decision-making. The Intel RealSense D455 RGB-D camera (Santa Clara, CA, USA), positioned at a height of 1.2 m, captured high-resolution RGB images (1280 × 800 pixels) and depth data for accurate distance measurement. The high resolution enables better detail in obstacle features, which is crucial for accurate classification. Additionally, the camera’s frame rate of up to 90 fps ensures rapid data capture, supporting real-time processing and minimizing the likelihood of missed obstacles in dynamic environments.

The embedded computing platform comprised an Advantech MIC770Q industrial computer. This MIC770Q (Taipei, Taiwan) model is powered by an Intel 8-core i7-9700E processor, which features a built-in Intel UHD Graphics 630 card and has two 16GB DDR4 memory modules. After initial evaluations for real-time validation, we deployed the classification model on this local machine. This deployment required the configuration of essential libraries, including Python 3.8.20, PyTorch 1.13.1, YOLOv8, a CNN-based classification framework, and OpenCV for video processing. An Intel RealSense D455 camera captured real-time RGB images, significantly enhancing the system’s perception capabilities.

The selection of the RGB-D camera underwent a rigorous evaluation, comparing potential alternatives such as the D415, D435i, and Azure Kinect. Key specifications—including depth range, resolution, and frame rate—were analyzed to determine the most suitable option. Table 1 presents a comparative analysis, highlighting the advantages of the Intel RealSense D455 in complex orchard environments.

2.2. Subsystem Integration and Coordination

Subsystem integration was accomplished using the Robot Operating System (ROS), which enabled seamless communication among the robot’s sensors, navigation systems, and control components. ROS facilitated real-time data exchange, ensuring synchronized processing of sensor inputs from the Intel RealSense D455 camera.

Environmental data were continuously monitored via ROS topics and nodes for obstacle detection, with processed results feeding into the robot’s decision-making system. This integration allowed dynamic trajectory adjustments based on real-time environmental changes, ensuring efficient and safe navigation in complex orchard environments.

2.3. Obstacle Classification and Navigation Framework

The robot’s navigation relied on a real-time obstacle classification model, distinguishing between Real and Fake obstacles. Real obstacles were physical obstructions requiring the robot to stop or alter its path, while Fake obstacles were negligible and did not affect navigation. This binary classification approach streamlined decision-making while minimizing computational demands, essential for real-time autonomous navigation. The classification procedures outlined in Algorithms 1 and 2 were integrated into the navigation system to enhance efficiency and responsiveness. The model underwent rigorous testing in both orchard and campus environments to ensure robust performance.

Algorithm 1: Obstacle Classification and Navigation Decision

Input: Detected Obstacles (O), Navigation Status, Output: Stop Signal, Continue Signal

1. Initialize the Stop Signal ← false

2. Initialize the Continue Signal ← true

3. For each obstacle o ∈ O (Detected Obstacles)

4. Classify(o) → obstacle class # Classify the obstacle (real/fake)

5. If obstacle class = “Real” then

6. Set Stop Signal ← true # Real obstacle detected, stop robot

7. Break the loop, stop further processing

8. Else if obstacle class = “Fake” then

9. Set Continue Signal ← true # Fake obstacle, continue robot movement

end for

Algorithm 2. Navigation Control Based on Obstacle Classification

Input: Detected Obstacles (O), Current Position (Poscurrent), Output: Status (Moving or Stopped)

1. Initialize Status ← “Moving”

2. While Status = “Moving” do

3. Detected ← Detect_Obstacles (Poscurrent) # Detect obstacles at current position

4. Stop_Signal, Continue_Signal ← Obstacle_Classification_and_Navigation_Decision (Detected)

# Classify and decide navigation

5. If Stop_Signal = true then

6. Update Status ← “Stopped” # Stop robot if real obstacle detected

7. Else if Continue_Signal = true then

8. Update Status ← “Moving” # Continue moving if fake or no obstacles

end while

2.4. Data Collection and Dataset Preparation

The teleoperated robot was deployed in two environments: Nanjing Agricultural University campus, which is heavily forested, and a sweet apple orchard in Yanghe Town, Suqian City, where fruit trees are approximately 3 m in height, with canopy widths of 1 to 1.5 m and row spacing of around 3 m. Data were collected in 2024 across different lighting conditions and seasons. Approximately 30,000 RGB images were acquired per environment to ensure robustness. These images captured diverse obstacles, including barriers, dead vegetation, fencing, fruit baskets, grass, live vegetation, people, stones, debris, tree trunks, and branches, as shown in Figure 2.

Initial data processing involved a YOLOv8-based detection algorithm (Figure 3) to streamline classification by generating bounding boxes around potential obstacles [47]. These were manually verified for accuracy, particularly for partially occluded objects. A balanced dataset of 24,000 labeled bounding boxes, equally distributed between Real and Fake obstacles, was created for each environment. The dataset was split into 85% for training and 15% for unbiased testing. Further generalization testing utilized 2400 images from both environments.

To ensure dataset reliability, a comprehensive data-cleaning process was implemented. Images with severe motion blur, overexposed regions, or excessive occlusions were removed to maintain dataset integrity. Duplicate images captured under similar conditions were filtered out to prevent overfitting. Additionally, any mislabeled data points identified during manual verification were corrected. If an image contained significant missing or incomplete visual information due to sensor noise or transmission errors, it was either removed or reconstructed using adjacent frames when possible. To further enhance dataset quality, outlier detection was performed by identifying bounding boxes with extreme aspect ratios or inconsistent obstacle labeling, ensuring that objects were accurately represented in the dataset.

Data augmentation techniques were applied to improve generalization, including random rotations (±30°), brightness adjustments, horizontal flipping, and 20% shifts (Figure 4). All images were resized to a fixed resolution to maintain consistency. Additionally, pixel values were normalized between 0 and 1 to standardize network input.

2.5. Model Architecture Design

The proposed system follows a two-stage process: first, object detection using a YOLOv8n model to identify regions of interest [47], and second, a CNN-based classification model to categorize detected objects as Real or Fake obstacles. The CNN architecture integrates Ghost Modules and SE blocks to enhance computational efficiency and feature extraction. Integrating these modules is intentionally designed to balance lightweight processing and enhanced feature discrimination, essential for real-time applications in orchard environments.

Ghost Modules generate lightweight feature maps to reduce computational overhead, making them ideal for real-time applications in resource-constrained environments, as Shen et al. [49] demonstrated. While SE blocks recalibrate feature maps through channel-wise attention, improving interpretability and performance, as Hu et al. [12] discussed. The cooperation between Ghost and SE modules allows the model to operate efficiently while maintaining high classification accuracy. The Ghost Module ensures reduced computational complexity by generating simplified yet informative feature maps. In contrast, the SE Module enhances the network’s ability to focus on the most critical features by prioritizing channel-wise attention. This combination ensures a balance between accuracy and speed, which is crucial for real-time obstacle classification in orchard environments.

As depicted in Figure 5, the CNN architecture begins with an input layer that processes RGB images resized to 224 × 224 pixels. The input image, with dimensions H × W × C (where H is the height, W is the width, and C represents the three color channels), undergoes standard preprocessing steps, such as normalization and resizing, before being fed into the network.

The overall architecture consists of convolutional layers with ReLU activation functions interspersed with max-pooling layers for dimensionality reduction. SE blocks and Ghost Modules follow these layers, strategically incorporated to optimize computation and enhance feature representation, particularly for the challenging task of obstacle detection in orchard environments. The coordinated use of these modules ensures robustness by improving feature extraction while reducing computational costs, making the model well-suited for deployment on energy-limited robotic platforms.

Fully connected layers were regularized with dropout to mitigate overfitting, while the final layer uses a sigmoid activation function for binary classification to distinguish between Real and Fake obstacles. By combining the computational efficiency of Ghost Modules with the feature-enhancing capabilities of SE blocks, the model ensures both accuracy and speed, making it well-suited for real-time applications in agricultural robotics.

2.5.1. Integration of Ghost Module

Convolutional layers serve as the primary feature extractors in the CNN architecture. Ghost Modules were integrated into these layers to optimize computational efficiency. Unlike traditional convolutions, which generate C output feature maps, Ghost Modules create fewer primary feature maps via standard convolutions and then augment these maps with “Ghost” feature maps produced by lightweight depth-wise convolutions. This strategy reduces computational load while preserving feature richness. Mathematically, this is expressed as follows:

X_{P} = C o n v (X, C_{P})

(1)

where

C_{P} = c / r

is the reduction ratio. The ghost feature maps are computed as follows:

X_{g} = D e p t h w i s e C o n v (X_{P,} C - C_{p})

(2)

The combined output is as follows:

\tilde{X} = X_{p} + X_{g}

(3)

Here,

X_{P}

represents the primary feature maps,

X_{g}

represents the ghost feature maps, and

\tilde{X}

is the combined output. This approach reduces computation while preserving feature richness.

2.5.2. Integration of Squeeze-and-Excitation (SE) Module

The SE blocks are integrated following the Ghost Modules to enhance the model’s representational power by focusing on informative channels. The SE blocks recalibrate the feature maps through squeeze and excitation.

Squeeze Step: Global Average Pooling (GAP) is applied to each channel to compress the spatial dimensions into a squeezed vector z:

Z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x_{i j c}

(4)

where

x_{i j c}

represents the feature value at the position

(i, j)

in channel

c

.

Excitation Step: The squeezed vector z is passed through fully connected layers to produce channel weights

s

:

S = σ (W_{2} \cdot R e L U (W_{1} \cdot z))

(5)

where

W_{1}

and

W_{2}

are learnable weight matrices, and σ denotes the sigmoid activation function. The recalibrated feature maps are obtained via channel-wise multiplication:

\tilde{X} = X \otimes s

(6)

where

\otimes

denotes element-wise multiplication.

The SE block helps the model focus on the most informative features by recalibrating feature maps based on their importance, thus improving the model’s interpretability and performance.

2.5.3. Hyperband Optimization

The Hyperband algorithm was used for hyperparameter optimization to further enhance the model’s performance. This method efficiently explores the hyperparameter space to identify the optimal values for key parameters such as learning rate, batch size, and dropout rate. As the researcher suggested, this technique dynamically allocates resources to the most promising configurations and ensures the best possible model performance [50].

The optimization process involves running several configurations of the model with different hyperparameter values. These configurations were evaluated, and the top-performing ones were allocated more resources for further evaluation. The best combination of hyperparameters was then selected for training the final model, resulting in improved accuracy and efficiency.

2.6. Model Training and Evaluation Metric

The training process, which utilized the Adam optimizer with a learning rate of 0.0001, a batch size of 32, and multiple epochs, was used to train the model. Regularization techniques were implemented to prevent overfitting, including L2 weight decay and dropout at a rate of 0.2. Early stopping was employed based on validation loss to optimize training efficiency.

The performance evaluation encompassed key metrics, including accuracy, precision recall, and F1-score, as outlined in Equations (7)–(10). The training was conducted on a Google Colab GPU (https://colab.research.google.com/, accessed on 20 December 2024). Various visualization techniques were employed to analyze generalization performance, including bar graphs, radar charts, confusion matrices, and heat maps. A comparative study was performed against state-of-the-art models under identical conditions. Analysis of Variance (ANOVA) tests analyzed performance variations, followed by Tukey’s Honestly Significant Difference (HSD) post hoc analysis to validate statistical significance.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} \times 100

(7)

P = \frac{T P}{T P + F P}

(8)

R = \frac{T P}{T P + F N}

(9)

F 1 = 2 \times \frac{P \times R}{P + R}

(10)

where (P) is precision and (R) is a recall, true positive (TP) represents correctly identified Real obstacles, and false positive (FP) represents Fake obstacles incorrectly classified as Real. In addition, true negatives (TN) and false negatives (FN) represent correctly identified Fake and Real obstacles incorrectly classified as Fake simultaneously. Additionally, the overall workflow of obstacle classification and navigation is illustrated in Figure 6, which outlines the integration of classification and decision-making steps for autonomous navigation in orchard environments.

3. Results

3.1. Results of Model Training and Performance Evaluation on Campus and Orchard Datasets

This section outlines the proposed model’s training outcomes using the campus and orchard datasets.

3.1.1. Training and Evaluation of the Campus Dataset

The Campus dataset demonstrated strong classification performance. Figure 7 presents the confusion matrix, showing that the model accurately classified 982 instances as “Fake” and 821 as “Real”. However, it misclassified 47 “Real” instances as “Fake” and 208 “Fake” instances as “Real”. The model recalled 0.80 for the “Fake” class and 0.95 for the “Real” class, with a precision of 0.83. The F1-scores were 0.89 for “Fake” and 0.87 for “Real”, and the overall accuracy was 0.88. The weighted averages for precision, recall, and F1-score were 0.89, 0.88, and 0.88, respectively.

Furthermore, the learning curve analysis in Figure 8a,b illustrates the model’s generalization capability. Training accuracy steadily improved across epochs, while validation accuracy exhibited consistent growth. Training loss decreased steadily, although validation loss showed slight fluctuations, indicating occasional optimization challenges. Nevertheless, the model demonstrated strong performance with high accuracy and balanced classification metrics.

3.1.2. Training and Evaluation of the Orchard Dataset

Similarly, the evaluation of the orchard dataset showed strong classification performance. As depicted in Figure 9, the confusion matrix indicates that the model correctly classified 1017 “Fake” and 993 “Real” instances, misclassifying 109 “Real” instances as “Fake” and 136 “Fake” instances as “Real”. The model achieved precision values of 0.88 for “Fake” and 0.90 for “Real”, with recall values of 0.90 and 0.88, respectively. The F1-scores for both classes were 0.89, and the overall accuracy was 0.89. The weighted averages for precision, recall, and F1-score were all 0.89.

The accuracy and loss curves in Figure 10a,b demonstrate consistent improvement throughout training. Validation accuracy displayed minor fluctuations before stabilizing at a high level, while training accuracy and loss exhibited steady trends. These findings confirm the model’s robustness, achieving high accuracy and balanced classification across obstacle classes.

3.2. Comparison of Campus and Orchard Model Performance on Individual and Combined Datasets

After training on the campus and orchard datasets, we evaluated their performance on separate test sets. To assess generalization across diverse environments, both models were tested on a combined dataset, including images from both environments.

3.2.1. Performance on Individual Datasets

As shown in Table 2, the orchard model exhibited superior performance compared to the campus model in precision, recall, and F1-score, particularly in detecting Real obstacles. Conversely, the campus model showed weaker performance in identifying Fake obstacles. This suggests that the orchard model is better suited for handling complex orchard environments due to its exposure to diverse obstacles. The superior performance of the orchard model is due to factors such as dataset size, diversity, and the complexity of the orchard environment. The orchard dataset featured a wider variety of obstacles, enabling the model to learn more robust identification features, which enhanced its ability to detect Real obstacles. In contrast, while strong overall, the campus model was trained on a less varied dataset, resulting in balanced accuracy but limited adaptability to diverse scenarios. However, the campus model maintained strong performance under less varied conditions, reflected in its balanced accuracy.

3.2.2. Performance on the Combined Dataset for Generalization Ability Evaluation

We evaluated various metrics to analyze the models’ generalization performance on the combined dataset, as illustrated in Figure 11. The orchard-trained model exhibited better generalization, outperforming the campus-trained model across all metrics. This advantage likely stems from the orchard dataset’s diverse obstacles and varying environmental conditions, enhancing its adaptability to new, unseen data.

Furthermore, as shown in Figure 12, confusion matrices further support this observation, revealing that the orchard model had fewer false negatives (200 vs. 100) than the campus model in detecting Real obstacles. However, both models had the same number of false positives (300), suggesting that the orchard model possesses a slightly better ability to distinguish between Real and Fake obstacles—an essential feature for autonomous navigation in complex environments.

Figure 13 compares precision, recall, and F1-score for detecting Real and Fake obstacles using both models on combined datasets. The orchard model consistently outperformed the campus model, achieving higher precision (fewer false positives) and recall (fewer missed Real obstacles), even under challenging conditions like occlusion and variable lighting.

Additionally, Figure 14 presents a radar chart comparing overall performance across key metrics. The orchard model exhibited a significantly broader coverage, indicating superior performance, particularly in precision and recall. Figure 15 further reinforces this by showing a heatmap comparison, demonstrating that the orchard model consistently outperformed the campus model across all evaluation metrics, validating its effectiveness in handling complex orchard scenarios.

3.2.3. Comparative Analysis Under Different Lighting Conditions

To assess the model’s robustness under varying illumination conditions, we conducted a comparative analysis across three distinct periods: early morning (5:30–6:30 a.m.), afternoon, and evening (around 6:00 p.m.). Performance metrics, including precision, recall, and F1-score, were computed for each lighting condition. The results are summarized in Table 3.

The models demonstrated strong performance across all lighting conditions, with the orchard-trained model outperforming the campus-trained model in every scenario. Minor fluctuations in precision and recall were observed under low-light conditions, particularly during early mornings and evenings in the campus environment. These findings highlight the model’s effectiveness for real-time agricultural monitoring across different lighting conditions.

3.2.4. Computational Efficiency and Hardware Requirements Analysis

We conducted an assessment of computational efficiency to thoroughly evaluate the suitability of the models for real-time agricultural monitoring. This evaluation included metrics such as model inference speed, memory consumption, and hardware requirements. As we discussed in Section 2.6, the models were trained and tested using Google Collaboratory, a cloud-based platform equipped with NVIDIA Tesla T4 GPUs featuring 16 GB of VRAM. This setup was specifically utilized for model development and evaluation prior to conducting real-time tests. We measured the inference speed by determining the average time required to process individual video frames and recorded memory consumption during this phase to estimate the computational load. The findings are summarized in Table 4.

The orchard-trained model demonstrated a significantly lower average inference time (0.1446 s) than the campus-trained model (0.2505 s), indicating greater efficiency. Additionally, the orchard-trained model achieved a higher average FPS (6.96) versus the campus-trained model (6.48), reflecting superior processing speed. Both models had similar memory usage, with the campus-trained model slightly higher (994.52 MB) than the orchard-trained model (985.62 MB), which may be relevant for deployment considerations. While the orchard-trained model is computationally more demanding, it remains feasible for deployment on cloud-based services or upgraded edge devices.

Despite being more computationally intensive, the orchard-trained model offers superior accuracy and generalization, making it highly suitable for practical agricultural monitoring applications. Its robustness in identifying diverse obstacles outweighs the slight increase in computational cost. Furthermore, real-time testing of the models was conducted in orchard and campus environments (see Section 3.5), providing practical insights into their deployment and validating the findings obtained during the cloud-based evaluation phase. This analysis underscores the practicality of deploying the orchard model in real-world scenarios, particularly where accuracy is prioritized over minor differences in computational efficiency.

3.3. Comparison of Orchard-Combined Model with State-of-the-Art Models

The comparison includes models known for accuracy (e.g., VGG16, ResNet50, DenseNet121) and those optimized for speed (e.g., MobileNetV3, EfficientNetB0, and InceptionV3). The orchard-combined model outperformed all state-of-the-art models, achieving the highest accuracy, precision, recall, and F1-score, as illustrated in Figure 16. This robustness is attributed to the orchard dataset’s diversity, which enhances generalization.

As summarized in Table 5, the orchard-combined model consistently demonstrated higher precision (0.95 vs. 0.91), recall (0.96 vs. 0.94), and F1-score (0.96 vs. 0.94) for both Real and Fake obstacles. This reflects its ability to identify obstacles accurately while minimizing false positives and negatives, which is crucial for autonomous robots operating in orchards. Comparatively, well-established models such as VGG16, ResNet50, and MobileNetV3 showed lower precision and recall, particularly for the “Real” class. The orchard-combined model maintained high detection scores across both obstacle types, emphasizing its robustness and superiority.

Regarding computational efficiency, the orchard-combined model achieved a total inference time of 130.39 s with a frames per second (FPS) rate of 2.31, outperforming several state-of-the-art models in speed, including DenseNet121 and EfficientNetB0. For instance, the InceptionV3 model achieved an FPS of 2.18, while MobileNetV2 had a lower FPS of 1.91. Table 6 summarizes the models’ computational efficiency and memory usage, providing key insights into their suitability for real-time deployment in resource-constrained environments like orchards.

This performance demonstrates that, while maintaining high detection accuracy, the orchard-combined model is well-suited for real-time deployment in autonomous robotic systems, offering a balance between computational efficiency and performance.

3.4. Performance Comparison Using Statistical Tests

We conducted an ANOVA test on accuracy, precision, recall, and F1-score to assess whether performance differences were statistically significant. The results (p-value < 0.05) confirmed that the orchard-combined model significantly outperformed the others. Figure 17 illustrates that this model achieved the highest accuracy with minimal variation, suggesting consistent performance across different test scenarios. In contrast, other models exhibited more substantial fluctuations, indicating less stability.

Furthermore, to determine which models significantly differed from each other, we applied Tukey’s HSD test, a post hoc analysis method that identifies statistically significant differences between groups. The results demonstrated that the orchard-combined model significantly surpassed all other models, with p-values consistently under 0.05 for all pairwise comparisons, as illustrated in Figure 18. This reinforces the strength and consistency of the orchard-combined model in handling diverse and complex environments.

In conclusion, the statistical analysis highlights the orchard-combined model’s superior performance, particularly in environments with dynamic and varied obstacles, such as orchards.

3.5. Visualization and Performance Analysis of Obstacle Classification for Real-Time Navigation Systems in the Field

As described in Section 2.3, the obstacle classification framework was tested in real-world environments using the deployed model in both orchard and campus settings. It is essential to reiterate that “Real” obstacles refer to physical objects that must be avoided for safe navigation (e.g., trees, poles, and people), whereas “Fake” obstacles include irrelevant visual artifacts or benign elements (e.g., shadows, leaves, or clutter) that do not necessitate avoidance. The system successfully avoided collisions with tree trunks, support poles, fencing, people, fruit baskets, and barriers while navigating through non-threatening obstacles such as dead vegetation, stones, and tree branches. As shown in Figure 19, the system effectively differentiates between Real obstacles (marked in green) and Fake obstacles (marked in red). The model achieved an overall classification accuracy of 92.0% in the campus environment and 95.0% in the orchard environment. The false positive rate (FPR) was 8.0% for the campus and 2.0% for the orchard, while the false negative rate (FNR) remained at 8.0% in both settings, ensuring effective obstacle differentiation. The system exhibited high adaptability across various contexts, as summarized in Table 7 below.

Although some misclassifications occurred, particularly in dense foliage and areas with overlapping objects, these had a minimal impact on navigation. In a few of these situations, the robot treated things like low-hanging branches or thick leaves as real obstacles, which caused it to take small, unnecessary turns or pauses. These did not cause any problems, but they slightly slowed the movement and showed how tricky visual clutter can affect path planning. The system maintained smooth decision-making, confirming its ability to handle unexpected obstacles, such as moving persons in the orchard. Real-time processing enabled immediate reactions, ensuring seamless operation in agricultural environments. To enhance the interpretability of the model’s decision-making, Figure 20 presents a heatmap generated using Grad-CAM. The heatmap highlights the areas of the image that most influence the model’s classification of Real vs. Fake obstacles. These visualizations help illustrate the model’s focus on key features, such as tree trunks and fences, when distinguishing between obstacles in real-time field conditions.

However, specific scenarios involving dense foliage and branches posed challenges for the model. As shown in Figure 21, the model mistakenly classified a tree branch as a Real obstacle. This error is likely due to the similarity in texture and structure between the branch and real obstacles, such as tree trunks or support poles. Such cases highlight the model’s difficulty in distinguishing between targets with similar textures when viewed from certain angles or under varying lighting conditions. Improving the model’s ability to differentiate between these scenarios could enhance robustness. These findings validate the proposed model’s ability to strengthen autonomous navigation through effective obstacle classification, offering a practical and robust solution for real-world applications.

3.6. Evaluation on Open Test Set for Generalization

To assess the model’s generalization ability on previously unseen obstacle categories, an external dataset—Tractor Detection Dataset from Roboflow—was employed [48]. This dataset comprises 463 images containing a variety of agricultural scenarios, including tractors and multiple overlapping objects such as tree trunks, support poles, and many more. The diversity of the dataset provided a practical testbed for evaluating the model’s robustness against unknown obstacles.

The proposed model demonstrated satisfactory performance in categorizing “Real” and “Fake” obstacles, even when exposed to novel objects like tractors (Figure 22). This capability indicates the model’s potential for open-set recognition, enhancing its practical applicability in dynamic agricultural environments. However, some misclassifications were observed when objects were heavily occluded or overlapping, suggesting room for further improvement. The results of this evaluation indicate that the model can maintain high accuracy even when encountering previously unseen objects, thereby confirming its robustness and adaptability.

4. Discussion

4.1. Performance Analysis of Model Generalization Across Different Datasets

The model demonstrated strong adaptability across individual datasets, achieving a precision, recall, and F1-score of 0.96 on the orchard dataset, which included complex conditions. For Fake obstacles, it recorded a precision of 0.94, recall of 0.99, and F1-score of 0.96. For Real obstacles, the precision was 0.99, recall 0.94, and F1-score 0.96.

In contrast, the campus dataset showed slightly lower performance, with all metrics averaging 0.91. Specifically, for Fake obstacles, precision was 0.87, recall 0.94, and F1-score 0.91, while for Real obstacles, precision was 0.94, recall 0.88, and F1-score 0.91. The varied conditions in the orchard dataset contributed to improved model generalization. In contrast, the more uniform data in the campus dataset led to a higher misclassification rate for Fake obstacles. Expanding the training set to include more diverse environments is recommended to improve performance further.

When tested on the combined dataset, the orchard-trained model outperformed the campus-trained model by 2% for Fake obstacles (precision: 0.91 vs. 0.87, recall: 0.94 vs. 0.90, F1-score: 0.94 vs. 0.89) and by 4% for Real obstacles (precision: 0.95 vs. 0.92, recall: 0.96 vs. 0.91, F1-score: 0.96 vs. 0.92). This indicates the benefit of training on a more diverse dataset. The orchard-trained model excelled under challenging conditions such as varying lighting and occlusions, resulting in fewer false positives and negatives during low-light scenarios. This analysis reinforces the model’s suitability for real-time agricultural monitoring across varying illumination conditions. This ensures safer and faster autonomous navigation in dynamic environments like orchards. Furthermore, the computational efficiency analysis discussed in Section 3.2.4 shows that while the orchard-trained model requires more computational resources, it significantly enhances both accuracy and generalization. Future research should focus on optimizing these resource needs to improve real-time performance on low-power devices. Additionally, the evaluation of the open test set [48] illustrated the model’s capacity to generalize across new obstacle categories, including tractors and overlapping objects such as tree trunks and support poles, underscoring its robustness and potential applicability in various agricultural settings.

4.2. Comparison with State-of-the-Art Models

A comparative analysis of the orchard-combined trained model against VGG16, ResNet50, MobileNetV3, DenseNet121, EfficientNetB0, and InceptionV3 shows that the proposed model outperforms these state-of-the-art architectures. It achieves an accuracy of 0.90 and excels in Fake obstacle detection with a precision of 0.91, recall of 0.94, and F1-score of 0.94. For Real obstacle detection, the model achieves a precision of 0.95, recall of 0.96, and F1-score of 0.96, which is crucial for minimizing false alarms in autonomous navigation. These results highlight the proposed model’s superior adaptability across diverse environments, reinforcing its suitability for real-time obstacle detection in agricultural robotics.

Furthermore, the orchard-combined model achieved 2.31 FPS, outperforming DenseNet121 (1.92 FPS) and EfficientNetB0 (2.01 FPS), demonstrating its efficiency for real-time agricultural applications. It also maintained reasonable memory usage (3191.65 MB), making it suitable for lightweight autonomous platforms. While MobileNetV3 offers a slightly higher FPS, it comes with lower accuracy. The orchard-combined model effectively balances accuracy and efficiency, reinforcing its suitability for real-time deployment in agricultural robotics.

4.3. Key Strengths and Limitations of the Proposed Method

The method presented in this study offers real-time obstacle detection and classification into Real or Fake obstacles. It demonstrates several key strengths, including high accuracy, strong generalization, and a competitive advantage over existing models. These qualities make it well-suited for deployment on autonomous robots, enabling the fastest navigation in agricultural settings while minimizing delays and errors. Additionally, the open test set evaluation revealed that the proposed model can recognize previously unseen obstacles, such as tractors and various agricultural scenarios. This suggests the model’s strong generalization capabilities and robustness. The model achieved 92.0% accuracy on the campus dataset and 95.0% in the orchard, confirming its reliability across diverse agricultural environments. The comparative analysis under different lighting conditions further validates its adaptability, particularly in orchard environments with low-light scenarios. The model’s inference speed of 2.31 FPS surpasses several well-known architectures, including InceptionV3, DenseNet121, and MobileNetV3, striking an optimal balance between accuracy and efficiency for real-time deployment. However, occasional misclassifications indicate that additional training data or architecture refinement could further enhance its performance. Addressing this limitation would improve its applicability for open-set recognition tasks in dynamic agricultural environments.

Despite these strengths, the approach has certain limitations. The training phase exhibited fluctuations in validation loss, indicating potential optimization challenges and occasional instability. While the complex orchard dataset contributed to enhanced generalization, the more straightforward campus dataset may have restricted the model’s adaptability to less complex environments. Addressing these issues by fine-tuning hyperparameters and introducing additional diverse training data could further improve robustness. The dataset was collected under varying lighting conditions—early morning, afternoon, and evening—which adds to its robustness. However, it does not take into account factors such as temperature and adverse weather conditions (e.g., rain and fog). Expanding the dataset to include these elements could significantly improve the model’s applicability in real-world situations. Additionally, handling occluded or overlapping objects remains a challenge, and future refinements could focus on improving object recognition in dense foliage or cluttered environments. Misclassifications between Real and Fake obstacles in such scenarios may lead to operational inefficiencies. For example, falsely identifying benign elements like overhanging branches as Real obstacles can trigger unnecessary avoidance behavior, while failing to detect actual obstacles could pose safety risks. Addressing this edge case remains essential for ensuring both smooth navigation and reliable decision-making in densely vegetated environments. Future efforts will focus on optimizing the model for lightweight embedded systems, thereby enhancing its utility across diverse agricultural scenarios.

4.4. Practical Implications

This research underscores the significance of advanced CNN architectures for autonomous obstacle classification. Integrating Ghost Modules and SE Blocks enhances computational efficiency, making the model ideal for real-world deployment on autonomous robots. Additionally, this study highlights the practical advantages of autonomous robots in agriculture, showcasing high precision and recall for obstacle detection, which is crucial for minimizing disruptions and collisions.

For instance, during orchard harvesting, the model effectively distinguishes between Real obstacles—physical objects that must be avoided, such as tree trunks, humans, or machinery—and “Fake” obstacles—visual artifacts or benign elements like overhanging branches or tall weeds—thereby optimizing autonomous navigation and reducing operational downtime. Furthermore, the model’s robust performance under varying lighting conditions makes it suitable for low-light environments, which are common in agricultural operations. Its strong generalization capability suggests potential applications in automated pesticide spraying, fruit harvesting, and field monitoring, further advancing precision agriculture. To explore multi-modal data fusion, our future work will utilize depth data from the Intel RealSense D455 camera for close-range obstacle distance detection experiments. This approach is economically efficient compared to LiDAR systems. Additionally, we plan to integrate LiDAR and camera fusion for long-range scenarios to further enhance classification accuracy and robustness.

Beyond technical performance, the proposed system has broader implications for sustainable agriculture. Automating obstacle detection and navigation can significantly reduce reliance on manual labor during time-intensive tasks like harvesting and monitoring, ultimately lowering labor costs and improving operational efficiency in orchards. Furthermore, since the system operates entirely on visual data captured in open environments, it minimizes concerns around privacy or ethical data use, making it a responsible choice for deploying AI in agricultural fields.

5. Conclusions

This study contributes to autonomous robotics by developing a CNN model that accurately classifies Real and Fake obstacles, enhancing navigation safety. The system effectively differentiates between Real and Fake obstacles, enabling faster autonomous decision-making and seamless navigation. Its onboard framework optimizes computational efficiency, ensuring reliable deployment in agricultural environments. Building on our previous work with YOLOv8-based detection, this research introduces an efficient architecture tailored for real-time applications.

The findings demonstrate that incorporating advanced CNN components and hyperparameter optimization significantly improves performance. In real-time evaluations, the model achieved high accuracy (92.0% on campus, 95.0% in the orchard) across different environments, confirming its reliability for real-world deployment. The comparative analysis under different lighting conditions validates the model’s adaptability, particularly in orchard environments with common low-light scenarios. The proposed model effectively addresses the challenges of varying illumination, ensuring reliable performance in real-time agricultural applications. The evaluation of an open test set further confirmed its robustness and adaptability for real-world deployment.

The orchard-combined model achieved 2.31 FPS, demonstrating superior inference speed compared to all other comparative models, including InceptionV3 (2.18 FPS), DenseNet121 (1.92 FPS), VGG16 (1.61 FPS), ResNet50 (2.07 FPS), MobileNetV3 (1.91 FPS), and EfficientNetB0 (2.01 FPS). While lightweight models like MobileNetV2 offer slightly lower FPS, the orchard-combined model strikes an optimal balance between accuracy and efficiency, making it a strong candidate for real-time agricultural robotics.

Moreover, the analysis of computational efficiency (Section 3.2.4) underscores the significance of inference speed and memory consumption in selecting models for deployment on low-power devices. By ensuring that the system remains efficient in practical agricultural settings, the orchard-combined model stands out for its ability to meet the demands of real-time applications. Future work will focus on enhancing the model’s ability to handle occluded or overlapping objects by incorporating more diverse training data and refining the architecture. In particular, incorporating weather variability, such as rain, fog, and temperature changes, will be essential for achieving reliable operation in unpredictable field conditions, further improving the model’s adaptability in real-world agricultural settings. Future improvements will also optimize inference speed and evaluate performance in additional agricultural environments. The implications for sustainable agriculture are notable, particularly in improving the efficiency and safety of agricultural robots for tasks such as fruit harvesting, weeding, spraying, planting, and field monitoring. While minor misclassifications occurred in dense foliage, future improvements will focus on enhancing robustness. This research advances autonomous navigation, providing a practical solution for real-time obstacle classification in agricultural environments.

Author Contributions

Writing—original draft preparation, T.N.S.; supervision and validation, J.Z.; formal analysis and software, I.A.L.; writing—review and editing, F.M.; investigation, T.T.G. and L.T.R.; resources, Z.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to the data also forming part of an ongoing study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript.

ANOVA	Analysis of Variance
CBAM	Convolutional Block Attention Module
CNN	Convolutional neural network
GPU	Graphical Processing Unit
GNSS	Global Navigation Satellite System
HSD	Tukey Honestly Significant Difference
NMS	Non-Maximum Suppression
LiDAR	Light Detection and Ranging
LSTM	Long Short-Term Memory
ODG-PF	Obstacle-Dependent Gaussian Potential Field
ROS	Robot Operating System
SE	Squeeze-and-Excitation
UGVs	Unmanned Ground Vehicles
YOLO	You Only Look Once

References

Maddikunta, P.K.R.; Hakak, S.; Alazab, M.; Bhattacharya, S.; Gadekallu, T.R.; Khan, W.Z.; Pham, Q.V. Unmanned aerial vehicles in smart agriculture: Applications, requirements, and challenges. IEEE Sens. J. 2021, 21, 17608–17619. [Google Scholar] [CrossRef]
Syed, T.N.; Lakhiar, I.A.; Chandio, F.A. Machine vision technology in agriculture: A review on the automatic seedling transplanters. Int. J. Multidiscip. Res. Dev. 2019, 6, 79–88. [Google Scholar]
Yao, Z.; Zhao, C.; Zhang, T. Agricultural machinery automatic navigation technology. iScience 2024, 27, 108714. [Google Scholar] [CrossRef] [PubMed]
Shi, J.; Bai, Y.; Diao, Z.; Zhou, J.; Yao, X.; Zhang, B. Row Detection-Based Navigation and Guidance for Agricultural Robots and Autonomous Vehicles in Row-Crop Fields: Methods and Applications. Agronomy 2023, 13, 1780. [Google Scholar] [CrossRef]
Liu, J.; Yuan, Y.; Zhou, Y.; Zhu, X.; Syed, T.N. Experiments and Analysis of Close-Shot Identification of On-Branch Citrus Fruit with RealSense. Sensors 2018, 18, 1510. [Google Scholar] [CrossRef]
Yu, H.; Zhu, J.; Wang, Y.; Jia, W.; Sun, M.; Tang, Y. Obstacle classification and 3D measurement in unstructured environments based on ToF cameras. Sensors 2014, 14, 10753–10782. [Google Scholar] [CrossRef]
Wang, L.; Wei, H. Understanding amorphous gorge scenes based on the projection of spatial textures. Pattern Recognit. 2025, 158, 111065. [Google Scholar] [CrossRef]
Bergerman, M.; Billingsley, J.; Reid, J.; van Henten, E. Robotics in agriculture and forestry. In Springer Handbook of Robotics; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1463–1492. [Google Scholar]
Korthals, T.; Kragh, M.; Christiansen, P.; Karstoft, H.; Jørgensen, R.N.; Rückert, U. Multi-modal detection and mapping of static and dynamic obstacles in agriculture for process evaluation. Front. Robot. AI 2018, 5, 28. [Google Scholar] [CrossRef]
Shamshiri, R.R.; Navas, E.; Dworak, V.; Schütte, T.; Weltzien, C.; Cheein, F.A. Internet of robotic things with a local LoRa network for teleoperation of an agricultural mobile robot using a digital shadow. Discov. Appl. Sci. 2024, 6, 414. [Google Scholar] [CrossRef]
Zhang, J.; Chen, S.; Xue, Q.; Yang, J.; Ren, G.; Zhang, W.; Li, F. LeGO-LOAM-FN: An Improved Simultaneous Localization and Mapping Method Fusing LeGO-LOAM, Faster_GICP and NDT in Complex Orchard Environments. Sensors 2024, 24, 551. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Wang, L.; Wei, H. Bending Path Understanding Based on Angle Projections in Field Environments. J. Artif. Intell. Soft Comput. Res. 2023, 14, 25–43. [Google Scholar] [CrossRef]
Wang, L.; Wei, H. Winding Pathway Understanding Based on Angle Projections in a Field Environment. Appl. Intell. 2023, 53, 16859–16874. [Google Scholar] [CrossRef]
Jiang, A.; Ahamed, T. Navigation of an Autonomous Spraying Robot for Orchard Operations Using LiDAR for Tree Trunk Detection. Sensors 2023, 23, 4808. [Google Scholar] [CrossRef]
Wang, L.; Hao, Y.; Wang, S.; Wei, H. Vanishing Point Estimation Inspired by Oblique Effect in a Field Environment. Cogn. Neurodyn. 2024, 18, 2487–2502. [Google Scholar] [CrossRef]
Syed, T.N.; Jizhan, L.; Xin, Z.; Shengyi, Z.; Yan, Y.; Mohamed, S.H.A.; Lakhiar, I.A. Seedling-Lump Integrated Non-Destructive Monitoring for Automatic Transplanting with Intel RealSense Depth Camera. Artif. Intell. Agric. 2019, 3, 18–32. [Google Scholar] [CrossRef]
Dodge, D.; Yilmaz, M. Convex Vision-Based Negative Obstacle Detection Framework for Autonomous Vehicles. IEEE Trans. Intell. Veh. 2023, 8, 778–789. [Google Scholar] [CrossRef]
Ahmed, A.; Ashfaque, M.; Ulhaq, M.U.; Mathavan, S.; Kamal, K.; Rahman, M. Pothole 3D Reconstruction with a Novel Imaging System and Structure from Motion Techniques. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4685–4694. [Google Scholar] [CrossRef]
Sun, T.; Pan, W.; Wang, Y.; Liu, Y. Region of Interest Constrained Negative Obstacle Detection and Tracking with a Stereo Camera. IEEE Sens. J. 2022, 22, 3616–3625. [Google Scholar] [CrossRef]
Badrloo, S.; Varshosaz, M.; Pirasteh, S.; Li, J. Image-based obstacle detection methods for the safe navigation of unmanned vehicles: A review. Remote Sens. 2022, 14, 3824. [Google Scholar] [CrossRef]
Redmon, J. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Su, F.; Zhao, Y.; Shi, Y.; Zhao, D.; Wang, G.; Yan, Y.; Zu, L.; Chang, S. Tree Trunk and Obstacle Detection in Apple Orchard Based on Improved YOLOv5s Model. Agronomy 2022, 12, 2427. [Google Scholar] [CrossRef]
Wang, L.; Wei, H. Muddy Irrigation Ditch Understanding for Agriculture Environmental Monitoring. Sustain. Comput. Inform. Syst. 2024, 42, 100984. [Google Scholar] [CrossRef]
Li, Y.; Li, M.; Qi, J.; Zhou, D.; Zou, Z.; Liu, K. Detection of Typical Obstacles in Orchards Based on Deep Convolutional Neural Network. Comput. Electron. Agric. 2021, 181, 105932. [Google Scholar] [CrossRef]
Wang, S.; Song, J.; Qi, P.; Yuan, C.; Wu, H.; Zhang, L.; Liu, W.; Liu, Y.; He, X. Design and Development of Orchard Autonomous Navigation Spray System. Front. Plant Sci. 2022, 13, 960686. [Google Scholar] [CrossRef]
Zhang, J.; Tian, M.; Yang, Z.; Li, J.; Zhao, L. An Improved Target Detection Method Based on YOLOv5 in Natural Orchard Environments. Comput. Electron. Agric. 2024, 219, 108780. [Google Scholar] [CrossRef]
Li, H.; Huang, K.; Sun, Y.; Lei, X.; Yuan, Q.; Zhang, J.; Lv, X. An Autonomous Navigation Method for Orchard Mobile Robots Based on Octree 3D Point Cloud Optimization. Front. Plant Sci. 2025, 15, 1510683. [Google Scholar] [CrossRef]
Dhouioui, M.; Frikha, T. Design and Implementation of a Radar and Camera-Based Obstacle Classification System Using Machine-Learning Techniques. J. Real-Time Image Process. 2021, 18, 2403–2415. [Google Scholar] [CrossRef]
Han, C.; Wu, W.; Luo, X.; Li, J. Visual Navigation and Obstacle Avoidance Control for Agricultural Robots via LiDAR and Camera. Remote Sens. 2023, 15, 5402. [Google Scholar] [CrossRef]
Qin, J.; Sun, R.; Zhou, K.; Xu, Y.; Lin, B.; Yang, L.; Chen, Z.; Wen, L.; Wu, C. LiDAR-Based 3D Obstacle Detection Using Focal Voxel R-CNN for Farmland Environment. Agronomy 2023, 13, 650. [Google Scholar] [CrossRef]
Cornacchia, M.; Kakillioglu, B.; Zheng, Y.; Velipasalar, S. Deep Learning-Based Obstacle Detection and Classification with Portable Uncalibrated Patterned Light. IEEE Sens. J. 2018, 18, 8416–8425. [Google Scholar] [CrossRef]
Vu, C.T.; Chen, H.C.; Liu, Y.C. Toward Autonomous Navigation for Agriculture Robots in Orchard Farming. In Proceedings of the 2024 IEEE International Conference on Recent Advances in Systems Science and Engineering (RASSE), Taiwan, China, 6–8 November 2024; pp. 1–8. [Google Scholar]
Yao, X.; Bai, Y.; Zhang, B.; Xu, D.; Cao, G.; Bian, Y. Autonomous Navigation and Adaptive Path Planning in Dynamic Greenhouse Environments Utilizing Improved LeGO-LOAM and Open Planner Algorithms. J. Field Robot. 2024, 41, 2427–2440. [Google Scholar] [CrossRef]
Cho, J.H.; Pae, D.S.; Lim, M.T.; Kang, T.K. A Real-Time Obstacle Avoidance Method for Autonomous Vehicles Using an Obstacle-Dependent Gaussian Potential Field. J. Adv. Transp. 2018, 2018, 5041401. [Google Scholar] [CrossRef]
Han, B.; Lu, Z.; Zhang, J.; Almodfer, R.; Wang, Z.; Sun, W.; Dong, L. Rep-ViG-Apple: A CNN-GCN Hybrid Model for Apple Detection in Complex Orchard Environments. Agronomy 2024, 14, 1733. [Google Scholar] [CrossRef]
Wang, C.; Yang, G.; Huang, Y.; Liu, Y.; Zhang, Y. A Transformer-Based Mask R-CNN for Tomato Detection and Segmentation. J. Intell. Fuzzy Syst. 2023, 44, 8585–8595. [Google Scholar] [CrossRef]
Wan, S.; Goudos, S. Faster R-CNN for Multi-Class Fruit Detection Using a Robotic Vision System. Comput. Netw. 2020, 168, 107036. [Google Scholar] [CrossRef]
Ghani, M.F.A.; Sahari, K.S.M. Detecting Negative Obstacle Using Kinect Sensor. Int. J. Adv. Robot. Syst. 2017, 14, 172988141771097. [Google Scholar] [CrossRef]
Feng, Z.; Guo, Y.; Sun, Y. Segmentation of Road Negative Obstacles Based on Dual Semantic-Feature Complementary Fusion for Autonomous Driving. IEEE Trans. Intell. Vehicles 2024, 9, 4687–4697. [Google Scholar] [CrossRef]
Liu, C.; Zhao, Z.; Zhou, Y.; Ma, L.; Sui, X.; Huang, Y.; Yang, X.; Ma, X. Object Detection and Information Perception by Fusing YOLO-SCG and Point Cloud Clustering. Sensors 2024, 24, 5357. [Google Scholar] [CrossRef]
Koch, C.; Brilakis, I. Pothole Detection in Asphalt Pavement Images. Adv. Eng. Inform. 2011, 25, 507–515. [Google Scholar] [CrossRef]
Matthies, L.; Rankin, A. Negative Obstacle Detection by Thermal Signature. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), Las Vegas, NV, USA, 27–31 October 2003; IEEE: Las Vegas, NV, USA, 2003; Volume 1, pp. 906–913. [Google Scholar]
Chen, Z.; Fang, Y.; Yin, J.; Lv, S.; Sheikh Muhammad, F.; Liu, L. A Novel Lightweight YOLOv8-PSS Model for Obstacle Detection on the Path of Unmanned Agricultural Vehicles. Front. Plant Sci. 2024, 15, 1509746. [Google Scholar] [CrossRef]
Xie, P.; Wang, H.; Huang, Y.; Gao, Q.; Bai, Z.; Zhang, L.; Ye, Y. LiDAR-Based Negative Obstacle Detection for Unmanned Ground Vehicles in Orchards. Sensors 2024, 24, 7929. [Google Scholar] [CrossRef]
Syed, T.N.; Zhou, J.; Marinello, F.; Lakhiar, I.A.; Chandio, F.A.; Rottok, L.T.; Zheng, Y.; Gamechu, T.T. Definition of a Reference Standard for Performance Evaluation of Autonomous Vehicles Real-Time Obstacle Detection and Distance Estimation in Complex Environments. Comput. Electron. Agric. 2025, 232, 110143. [Google Scholar] [CrossRef]
New Holland. Tractor Detection Dataset. Roboflow Universe. Available online: https://universe.roboflow.com/new-holland/tractor-detection-blxuq (accessed on 24 March 2025).
Shen, X.; Wang, H.; Wei, B.; Cao, J. Real-Time Scene Classification of Unmanned Aerial Vehicles Remote Sensing Image Based on Modified GhostNet. PLoS ONE 2023, 18, e0286873. [Google Scholar] [CrossRef]
Li, L.; Jamieson, K.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. J. Mach. Learn. Res. 2018, 18, 1–52. [Google Scholar]

Figure 1. Orchard robot.

Figure 2. Some of the collected data within the orchard and campus environments show different obstacles.

Figure 3. Obstacles detected after utilizing the YOLOv8n detection model were classified into Real and Fake categories. Panels (a,b) illustrate obstacles from the orchard, while panels (c,d) present those from campus environments.

Figure 4. Applied augmentation techniques. Where: (a) original image, (b) flip, (c) rotate, (d) brightness, and (e) random contrast\shifts.

Figure 5. CNN architecture of obstacle classification.

Figure 6. Workflow of the obstacle classification and navigation process. This figure illustrates the integration of obstacle detection, classification (Real vs. Fake obstacles), and decision-making for autonomous navigation. It highlights how sensor inputs, deep learning-based classification, and real-time decision-making enable smooth and efficient robot movement in dynamic orchard environments.

Figure 7. Confusion matrix for the campus dataset.

Figure 8. Training and validation results for the campus dataset, presenting (a) accuracy and (b) loss over epochs measured.

Figure 9. Confusion matrix for the orchard dataset.

Figure 10. Training and validation results for the orchard dataset, presenting (a) accuracy and (b) loss over epochs measured.

Figure 11. Bar graph comparing precision (p), recall (r), and F1-score for both models on the combined dataset.

Figure 12. Confusion matrices comparing the campus and orchard models on the combined dataset (above: campus model; below: orchard model).

Figure 13. Line graph illustrating a comparison of precision (p), recall (r), and F1-score for Real and Fake obstacle detection utilizing the campus and orchard models.

Figure 14. A radar chart demonstrates the relative overall performance of the campus and orchard models across key evaluation metrics.

Figure 15. Heatmap comparing the performance of the campus and orchard models on the combined dataset.

Figure 16. Overall performance comparison of the orchard-combined and state-of-the-art models in accuracy, precision (p), recall (r), and F1-score.

Figure 17. Boxplot illustrating the distribution of model accuracies. In the MobileNetV3 plot, the small circle represents an outlier within the accuracy distribution.

Figure 18. Tukey’s Honestly Significant Difference (HSD) pairwise comparison of model accuracy.

Figure 19. Real-time obstacle classification results in an agricultural environment. The system distinguishes between Real obstacles in green and false obstacles in red, demonstrating its ability to support faster autonomous navigation in the orchard. Where (a,b) is in the orchard and (c,d) is in the campus environment.

Figure 20. Grad-CAM visualization of obstacle classification. (a) Original image showing a tree trunk (Real obstacle). (b) Corresponding Grad-CAM heatmap highlighting the regions that contributed most to the model’s decision. The red/yellow areas indicate strong activation, showing where the model focused during classification.

Figure 21. An example of misclassification in obstacle detection.

Figure 22. Generalization performance on the open test set, collected from the Tractor Detection Dataset on Roboflow [48]. Real obstacles are shown in green and fake obstacles in red.

Table 1. Comparative specifications of vision sensors.

Device	Technology	IMU	Ideal Range (mm)	Resolution	FOV (° H × V)	Frame Rate (fps)	Shutter
D415	Stereoscopic	No	500–3000	1280 × 720 (Depth)	65 × 40	Up to 90	Rolling
D435i	Stereoscopic	Yes	300–3000	1280 × 720 (Depth)	87 × 58	Up to 90	Global
D455	Stereoscopic	Yes	600–6000	1280 × 800 (Depth)	87 × 58	Up to 90	Global
Azure Kinect	Time of Flight	Yes	250–5460	1024 × 1024	120 × 120	Up to 30	-

Table 2. Performance metrics for campus and orchard models on individual datasets.

Model	Class	precision (p)	recall (r)	F1-Score	Support
Campus	Fake	0.87	0.94	0.91	1143
	Real	0.94	0.88	0.91	1257
Overall		0.91	0.91	0.91	2400
Orchard	Fake	0.94	0.99	0.96	1143
	Real	0.99	0.94	0.96	1257
Overall		0.96	0.96	0.96	2400

Table 3. Performance metrics of the models under different lighting conditions across the combined dataset.

Models	Illumination Condition	precision (p)	recall (R)	F1-Score
Campus	Early Morning	0.91	0.89	0.90
Campus	Afternoon	0.93	0.92	0.93
Campus	Evening	0.90	0.88	0.89
Orchard	Early Morning	0.94	0.92	0.93
Orchard	Afternoon	0.96	0.95	0.95
Orchard	Evening	0.93	0.91	0.92

Table 4. Computational efficiency metrics of the combined campus and orchard models.

Model	Avg. Inference Time (Sec)	Avg. FPS	Overall FPS	Max Memory Usage (MB)
Campus	0.2505	6.48	3.56	994.52
Orchard	0.1446	6.96	6.67	985.62

Table 5. Comparative performance analysis of the proposed orchard-combined model and state-of-the-art models.

Model	Accuracy	precision (p) (Real)	precision (p) (Fake)	recall (r) (Real)	recall (r) (Fake)	F1-Score (Real)	F1-Score (Fake)
VGG16	0.50	0.00	0.50	0.00	1.00	0.00	0.67
ResNet50	0.50	0.00	0.50	0.00	1.00	0.00	0.67
MobileNetV3	0.50	0.00	0.50	0.00	1.00	0.00	0.67
DenseNet121	0.53	0.56	0.52	0.29	0.77	0.39	0.62
EfficientNetB0	0.50	0.50	0.00	1.00	0.00	0.67	0.00
InceptionV3	0.50	0.50	0.45	0.94	0.05	0.65	0.08
Orchard-combined (Our model)	0.90	0.95	0.91	0.96	0.94	0.96	0.94

Table 6. Comparative computational efficiency of the orchard-combined and state-of-the-art models regarding total inference time, FPS, and maximum memory usage.

Model	Total Time (s)	FPS	Max Memory (MB)
Orchard-combined	130.39	2.31	3191.65
InceptionV3	138.37	2.18	3395.52
DenseNet121	156.43	1.92	3020.10
VGG16	187.36	1.61	3151.71
ResNet50	145.55	2.07	3257.19
MobileNetV3	157.64	1.91	3338.98
EfficientNetB0	149.55	2.01	3352.95

Table 7. Performance metrics of the obstacle classification system.

No.	Environment Name	Accuracy (%)	False Positive Rate (FPR, %)	False Negative Rate (FNR, %)
1	Campus	92.0	8.0	8.0
2	Orchard	95.0	2.0	8.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Syed, T.N.; Zhou, J.; Lakhiar, I.A.; Marinello, F.; Gemechu, T.T.; Rottok, L.T.; Jiang, Z. Enhancing Autonomous Orchard Navigation: A Real-Time Convolutional Neural Network-Based Obstacle Classification System for Distinguishing ‘Real’ and ‘Fake’ Obstacles in Agricultural Robotics. Agriculture 2025, 15, 827. https://doi.org/10.3390/agriculture15080827

AMA Style

Syed TN, Zhou J, Lakhiar IA, Marinello F, Gemechu TT, Rottok LT, Jiang Z. Enhancing Autonomous Orchard Navigation: A Real-Time Convolutional Neural Network-Based Obstacle Classification System for Distinguishing ‘Real’ and ‘Fake’ Obstacles in Agricultural Robotics. Agriculture. 2025; 15(8):827. https://doi.org/10.3390/agriculture15080827

Chicago/Turabian Style

Syed, Tabinda Naz, Jun Zhou, Imran Ali Lakhiar, Francesco Marinello, Tamiru Tesfaye Gemechu, Luke Toroitich Rottok, and Zhizhen Jiang. 2025. "Enhancing Autonomous Orchard Navigation: A Real-Time Convolutional Neural Network-Based Obstacle Classification System for Distinguishing ‘Real’ and ‘Fake’ Obstacles in Agricultural Robotics" Agriculture 15, no. 8: 827. https://doi.org/10.3390/agriculture15080827

APA Style

Syed, T. N., Zhou, J., Lakhiar, I. A., Marinello, F., Gemechu, T. T., Rottok, L. T., & Jiang, Z. (2025). Enhancing Autonomous Orchard Navigation: A Real-Time Convolutional Neural Network-Based Obstacle Classification System for Distinguishing ‘Real’ and ‘Fake’ Obstacles in Agricultural Robotics. Agriculture, 15(8), 827. https://doi.org/10.3390/agriculture15080827

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Autonomous Orchard Navigation: A Real-Time Convolutional Neural Network-Based Obstacle Classification System for Distinguishing ‘Real’ and ‘Fake’ Obstacles in Agricultural Robotics

Abstract

1. Introduction

2. Materials and Methods

2.1. System Description and Vision System Overview

2.2. Subsystem Integration and Coordination

2.3. Obstacle Classification and Navigation Framework

2.4. Data Collection and Dataset Preparation

2.5. Model Architecture Design

2.5.1. Integration of Ghost Module

2.5.2. Integration of Squeeze-and-Excitation (SE) Module

2.5.3. Hyperband Optimization

2.6. Model Training and Evaluation Metric

3. Results

3.1. Results of Model Training and Performance Evaluation on Campus and Orchard Datasets

3.1.1. Training and Evaluation of the Campus Dataset

3.1.2. Training and Evaluation of the Orchard Dataset

3.2. Comparison of Campus and Orchard Model Performance on Individual and Combined Datasets

3.2.1. Performance on Individual Datasets

3.2.2. Performance on the Combined Dataset for Generalization Ability Evaluation

3.2.3. Comparative Analysis Under Different Lighting Conditions

3.2.4. Computational Efficiency and Hardware Requirements Analysis

3.3. Comparison of Orchard-Combined Model with State-of-the-Art Models

3.4. Performance Comparison Using Statistical Tests

3.5. Visualization and Performance Analysis of Obstacle Classification for Real-Time Navigation Systems in the Field

3.6. Evaluation on Open Test Set for Generalization

4. Discussion

4.1. Performance Analysis of Model Generalization Across Different Datasets

4.2. Comparison with State-of-the-Art Models

4.3. Key Strengths and Limitations of the Proposed Method

4.4. Practical Implications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI