1. Introduction
Atomization reduces the waste of water and nutrient solutions and effectively prevents the spread of fungal diseases [
1,
2]. In the process of atomization cultivation, the environmental factors in the space where the plant’s root zone is located can be artificially controlled [
3,
4,
5]. Moreover, aeroponic crops grow more rapidly [
6,
7,
8]. However, different buds grow at different rates, with some taking longer to break dormancy, leading to inconsistent development and delays in root formation. Therefore, it is necessary to identify and distinguish mulberry cuttings in an aeroponic system and spray foliar fertilizer to promote the growth of the slow-growing ones. Selectively spraying foliar fertilizer on slow-growing mulberry cuttings saves more fertilizer than spraying foliar fertilizer on all mulberry cuttings, reduces the operating time of the robotic arm, and lowers production costs. In large-scale mulberry aeroponics, manually identifying the growth status of mulberry cuttings is not only time-consuming but also incurs significant costs. With the development of science and technology, deep learning and machine vision technology are gradually being applied to the field of object recognition. Accurately identifying slow-growing individuals in crops using these technologies is of great significance for agricultural production, as it allows for the targeted spraying of nutrient solutions to promote growth [
9,
10,
11]. Precise fertilization, achieved by accurately identifying slow-growing crops and spraying them, is crucial for the development of precision agriculture and intelligent agriculture [
12,
13,
14,
15].
Parallel robotic arms offer several advantages, including higher stiffness, precision, and load-bearing capacities. These features make them particularly well suited for complex agricultural environments [
16,
17,
18,
19]. Consequently, we selected a parallel robotic arm as the primary actuator. As demonstrated in Lu S et al. [
20], the minimum-jerk trajectory planning approach further enhances parallel robotics. It ensures smooth and precise motion, reduces vibrations, and improves the overall performance and reliability.
In recent years, to tackle the challenges of intelligent recognition and detection in complex environments, researchers across the globe have increasingly turned to deep learning methods [
21,
22,
23]. Xu et al. [
24] enhanced YOLOv5 by integrating the Mish activation function, employing DIoU_Loss to accelerate bounding box regression, and incorporating the Squeeze Excitation module. These modifications resulted in a grading precision of 90.6% and a real-time processing speed of 59.63 FPS, significantly boosting both the precision and detection efficiency for apple grading tasks. Ji W et al. [
25] improved the YOLOv5s model by adding ODConv and GSConv convolutions, along with a VoVGSCSP lightweight backbone. This allowed for simultaneous apple surface defect detection and fruit stalk identification, focusing on side information from multi-view images. Their model achieved 98.2% precision in defect detection, processing 30 fps. Ji W et al. [
26] proposed a ShuffleNetv2-based apple object detection model, integrating an adaptive spatial feature fusion (ASFF) module into the PANet network. The model attained 96.76% average precision, 95.62% precision, 93.75% recall, and a 0.95 F1 score, with a detection speed of 65 FPS. Zhu W et al. [
27] introduced the Transformer Encoder and CenterLoss in an improved model to establish an accurate and efficient disease recognition model. Liu S et al. [
28] designed a tomato flower pollination feature recognition method based on the deep learning fully open flower recognition model and the binocular template matching three-dimensional information recognition method. Zhang Z et al. [
29] constructed an all-weather lightweight tea crown shoot detection model (TS-YOLO) by replacing the feature extraction network of YOLOv4 and the standard convolution of the whole network with a lightweight neural network, MobilenetV3, and depth-separable convolution, among other improvements. The size of the improved model was 11.78 M, which was 18.30% of the size of YOLOv4, and the detection speed was improved by 11.68 FPS. Zhang F et al. [
30] proposed the Feature Enhancement Network Block (FENB) based on the YOLOv4-Tiny model. They designed the FENB using the CSPNet structure with a hybrid attention mechanism and constructed a Feature Enhancement Network (FEN) on top of the FENB to enhance the feature extraction capability and improve the detection accuracy of YOLOv4-Tiny. In a study by Wu F et al. [
31] considering an insufficient dataset of nighttime images and the problems of the poor detail restoration and color distortion of existing CycleGAN models, an enhanced CycleGAN method integrating style migration and small sample detection was proposed. By introducing the ResNeXtBlocks generator and optimizing the upsampling module and the hyperparameter strategy, the FID score was reduced by 29.7%, and the precision, recall, and other metrics were improved by 13.34–56.52% compared to the YOLOv7 detection framework.
Most existing recognition methods rely primarily on monocular and binocular image recognition techniques [
32,
33,
34], with limited research on trinocular and multi-eye image recognition methods. Additionally, there is a scarcity of studies focusing on the identification of a need for fertilizer, the location of crops, and the targeted spraying of liquid fertilizer for both fast- and slow-growing crops under identical conditions. To enhance recognition precision, we conducted a comparative analysis of the detection capabilities of visual systems with varying numbers of cameras, specifically for mulberry tree cuttings. Our investigation indicated that the trinocular vision system effectively reduced occlusion between mulberry tree cuttings, thereby enhancing the detection accuracy and overall performance.
The YOLOv8 model appears to be highly mature in practical applications, with significant performance improvements observed in its enhanced versions. For instance, Hemamalini et al. [
35] successfully increased the model’s average precision to 99.2% for plant thermal canopy detection by integrating the compact YOLOv8-C detection technology with the innovative Fast Segment Anything Model (FastSAM) method, thereby greatly enhancing the model’s overall performance. Similarly, Xu J et al. [
36] built upon the YOLOv8 base model by incorporating the Large Separable Kernel Attention (LSKA) mechanism into SPPF and replacing YOLOv8’s Neck with an optimized Slimneck module to develop the SLPD-YOLOv8 model. This improved model achieved an accuracy of 94.8% in recognizing the number of stress cracks in corn seeds, significantly boosting the model’s detection capabilities.
In this study, the primary challenge addressed was the inefficient detection and control of robotic arms in complex agricultural environments, particularly for tasks such as foliar fertilizer spraying on mulberry branches. Traditional detection models and robotic control systems often struggle with occlusions, dense foliage, and the need for precise spatial data. To tackle these challenges, we developed an intelligent mulberry foliar fertilizer spraying system that leverages advanced detection and control methodologies. Specifically, we optimized the YOLOv8n model by introducing the Asymptotic Feature Pyramid Network (AFPN) in the Neck part, fusing the C2f module with MSBlock, replacing the CIoU loss function with XIoU, and integrating the DynamicATSS module. These enhancements significantly improved the detection ability of the YOLOv8n model. In addition, we introduced a multi-camera hybrid data fusion approach to capture spatial diversity, leveraging artificial neural networks (ANNs) to merge and analyze 3D positional data. This method effectively compensated for occlusions and improved the reconstruction precision in dense foliage. Thus, the main objectives of this study were (1) to develop an intelligent mulberry foliar fertilizer spraying system that supports the Internet of Things (IoT) and promotes the growth of slow-growing mulberry cuttings by accurately identifying them for foliar fertilizer spraying; (2) to evaluate different YOLO versions, including YOLOv8n and YOLOv10, to determine their effectiveness in complex agricultural environments; (3) to optimize the YOLOv8n model by introducing the AFPN in the Neck part, fusing the C2f module with MSBlock, replacing the CIoU loss function with XIoU, and integrating the DynamicATSS module, thereby significantly enhancing its detection ability; and (4) to create computational control frameworks for robotic manipulator systems using ANNs to improve the adaptability and precision of robotic arm control in dense foliage.
3. Results
The recognition metrics—Pr, Re, F1, and mAP—were utilized as performance indicators to assess the growth state of mulberry cuttings. These metrics were deployed to assess the effectiveness of the YOLOv8n object detection model across various datasets derived from images captured by monocular, binocular, and trinocular cameras.
Figure 17 illustrates the recognition performance of the YOLOv8n model for images collected using monocular, binocular, and trinocular cameras.
Table 3 shows the precision, recall, and F1 score of YOLOv8n image recognition at different mesh counts.
Table 4 displays the mean average precision for YOLOv8n image recognition at different mesh counts. The performance of the trinocular vision system was superior to that of the monocular and binocular vision systems. The precision of the trinocular vision was 15.51% and 8.96% higher than that of the monocular and binocular vision, respectively, and the mean average precision was 14.61% and 7.67% higher, respectively. The average precision, recall, and mean average precision of trinocular recognition for the current dataset were 68.99%, 67.52%, and 68.21%, respectively. Additionally, the F1 score of trinocular recognition was 0.68, calculated using Equation (4). For binocular recognition, the F1 scores were 60.03%, 60.55%, 60.54%, and 0.60, respectively. In comparison, the Pr, Re, mAP, and F1 scores for monocular recognition were 53.47%, 55.21%, 53.60%, and 0.54, respectively. The trinocular recognition system was more comprehensive in capturing the shape, size, and surface details of and other information on mulberry cuttings. In contrast, the monocular and binocular recognition systems may not have been able to obtain all the key information on mulberry cuttings due to limitations in viewing angles. Consequently, the trinocular recognition system performed better. Moreover, by comparing parameters such as the Pr, Re, mAP, and F1 score, it is evident that the image recognition performance of the trinocular system surpassed that of the monocular and binocular systems. Additionally, the precision of the recognition model generally increased with the size of the training set, indicating that the image acquisition effectiveness of the trinocular system was superior.
Below are the performance evaluation plots and tables showing the results obtained after training YOLOv8n, YOLOv10, Faster RCNN, and the improved YOLOv8n (YOLOv8-improve) using the mulberry cutting dataset.
Figure 18 shows a performance comparison between the original YOLOv8 model and the YOLOv8-improve model.
Figure 19 shows the loss profile of the YOLOv8-improve model for the training and validation sets.
Combining the results shown in
Figure 18 and
Figure 19, although the mulberry cutting dataset used in this paper (3000 images) was small, the recall and loss were normal, and there was no overfitting.
Table 5 shows the precision, recall, and F1 score data for different models, and
Table 6 shows the mean average precision for different models.
This study appraised the performance (Pr, Re, F1, and mAP) of various detection models, including the original YOLOv8n (
Figure 18), improved YOLOv8n (
Figure 18), YOLOv10, and Faster R-CNN, on the mulberry cutting dataset. The original YOLOv8n model achieved recognition Pr, Re, mAP, and F1 scores of 86.08%, 86.83%, 88.44%, and 0.8645, respectively. In contrast, the improved YOLOv8n model achieved recognition Pr, Re, mAP, and F1 scores of 93.11%, 93.40%, 94.48%, and 0.93%, respectively. The Progressive Feature Pyramid Network (AFPN) was introduced to replace the original FPN or PANet structure of YOLOv8n, reducing the semantic gap between different hierarchical features and enhancing the model’s detection performance for small targets. Additionally, the MSBlock module was incorporated into the C2f module, improving the size and structure of the convolutional kernels and optimizing the feature fusion method. This enhancement boosted the model’s performance when processing multi-scale information. Furthermore, the CIoU loss function of YOLOv8n was replaced with Focal_XIoU to address the imbalance between positive and negative samples and to improve the precision of bounding box regression. The DynamicATSS module was also incorporated into the label assignment strategy, enhancing the model’s detection and generalization capabilities while reducing the discrepancy between classification and IoU scores. By comparing the recognition precision, recall, mean average precision, and F1 score, it is evident that the improved YOLOv8n model significantly outperformed the original YOLOv8n model in recognizing the acquired images.
As illustrated in
Table 5, the optimized YOLOv8n model exhibited a notable improvement in both Pr (93.11%) and Re (93.40%) over the original YOLOv8n, which had values of 86.08% and 86.83%, respectively. This indicates a substantial enhancement in its overall detection capabilities. Additionally, YOLOv8n outperformed YOLOv10 in terms of recall, although the latter showed a marginally better precision score. This trade-off between precision and recall suggests that the model’s suitability can vary depending on whether minimizing false positives or maximizing true detections is more critical for the application. On the other hand, Faster R-CNN performed less effectively with a Pr at 55.10% and an Re at 75.50%, signaling that it might not be optimal for the current task and could benefit from further refinement. These findings highlight the importance of carefully choosing models based on specific detection objectives and the balance between precision and recall. Additionally,
Table 6 outlines the mAP values for each model for the mulberry cutting dataset. The value of the mAP for YOLOv8n was 88.43%, while YOLOv8n-improve achieved a higher average of 94.48%, indicating better overall detection performance for the improved model. Our results, based on advanced YOLOv8n, outperform those of Wang et al. [
41], who utilized a spatial channel decoupled downsampling approach. By first enhancing the channels with pointwise convolution (PW) and then reducing the resolution through depth-wise convolution (DW) in the YOLOv10-S framework, they achieved a 0.7% improvement in the average precision by minimizing information loss.
The R-squared (R
2) value, mean squared error (MSE), and root mean squared error (RMSE) were used to evaluate the differences between the model’s predicted values and the true values. These metrics for the X, Y, and Z coordinates are presented in
Table 7. The model’s test performance for predicting the robotic arm’s coordinates was as follows: For the X coordinate, the test set achieved an R
2 of 99.90% and an RMSE of 0.006. For the Y coordinate, the test set had an R
2 of 99.90% and an RMSE of 0.006. For the Z coordinate, the test set reached an R
2 of 99.90% and an RMSE of 0.012. The analysis of these metrics indicated that the error between the predicted and actual coordinates was very small and nearly negligible. Therefore, the coordinate conversion for slow-growing mulberry cuttings was highly accurate, enabling the precise location of the cuttings.
The scatter plot (
Figure 20) depicting a comparison between the actual and predicted X, Y, and Z coordinates of the robotic arm offers valuable insights into the system’s positional accuracy. By comparing the model’s predicted positions against the true values obtained from the manipulator’s sensors, we can evaluate the precision of the arm’s movements along all three axes. A closer alignment between the predicted and actual coordinates indicates higher accuracy in the manipulator’s movements, which directly impacts the spraying precision. When the predicted coordinates closely match the true coordinates, it suggests that the spraying mechanism will operate with greater accuracy, leading to better coverage and the more precise targeting of the intended areas. Thus, the correlation and any discrepancies in the scatter plot are critical for understanding the spraying accuracy and identifying potential areas where positional errors could affect the robotic arm’s performance in spraying tasks.
The average precision, recall, and mean average precision of trinocular recognition in the current dataset were 68.99%, 67.52%, and 68.21%, respectively. Additionally, the F1 score of trinocular recognition was 0.68, calculated using Equation (4). For binocular recognition, the Pr, Re, mAP, and F1 scores were 60.03%, 60.55%, 60.54%, and 0.60, respectively. In comparison, the Pr, Re, mAP, and F1 scores for monocular recognition were 53.47%, 55.21%, 53.60%, and 0.54, respectively. The trinocular recognition system was more comprehensive in capturing the shape, size and surface details of and other information on mulberry cuttings. In contrast, the monocular and binocular recognition systems may not have been able to obtain all the key information on mulberry cuttings due to limitations in viewing angles. Consequently, the trinocular recognition system performed better. Moreover, by comparing parameters such as the Pr, Re, mAP, and F1 score, it is evident that the image recognition performance of the trinocular system surpassed that of the monocular and binocular systems. Additionally, the precision of the recognition model generally increased with the size of the training set, indicating that the image acquisition effectiveness of the trinocular system was superior.
The original YOLOv8n model achieved recognition Pr, Re, mAP, and F1 scores of 86.08%, 86.83%, 88.44%, and 0.8645, respectively. In contrast, the improved YOLOv8n model achieved recognition Pr, Re, mAP, and F1 scores of 93.11%, 93.40%, 94.48%, and 0.93, respectively. The Progressive Feature Pyramid Network (AFPN) was introduced to replace the original FPN or PANet structure of YOLOv8n, reducing the semantic gap between different hierarchical features and enhancing the model’s detection performance for small targets. Additionally, the MSBlock module was incorporated into the C2f module, improving the size and structure of the convolutional kernels and optimizing the feature fusion method. This enhancement boosted the model’s performance when processing multi-scale information. Furthermore, the CIoU loss function of YOLOv8n was replaced with Focal_XIoU to address the imbalance between positive and negative samples and to improve the precision of bounding box regression. The DynamicATSS module was also incorporated into the label assignment strategy, enhancing the model’s detection and generalization capabilities while reducing the discrepancy between classification and IoU scores. By comparing the recognition precision, recall, mean average precision, and F1 score, it is evident that the improved YOLOv8n model significantly outperformed the original YOLOv8n model in recognizing the acquired images.
The analysis of the R2 value, mean square error, root mean square error, and other parameters derived from the predicted coordinates of the robotic arm using the trained neural network model indicated that the error between the predicted and actual coordinates was very small and nearly negligible. Therefore, the coordinate conversion for slow-growing mulberry cuttings was highly accurate, enabling the precise location of the cuttings.
4. Discussion
Liyang [
42] designed an intelligent control system for water and fertilizer integration in a tomato greenhouse. Compared with that, the intelligent spraying system designed in this paper is able to spray foliar fertilizer more accurately after target detection and localization, reducing the waste of fertilizer and lowering the cost. The improved YOLOv8n model used in this paper is more efficient compared to S Li’s model [
43] in intelligent decision-making and control regarding water and fertilizer that integrate multiple sources of data inputs. Moreover, the intelligent spraying system designed in this paper is less affected by environmental factors. Kim et al. [
44] presented an intelligent spraying system for the semantic segmentation of fruit trees in pear orchards. The system was trained with images categorized into five distinct classes. The trained deep learning model achieved a precision of 83.79%. Compared to that system, the initial precision of the target detection model for the intelligent spraying system designed in this paper was 88.43%, and the precision of the improved target detection model reached 94.48%, which was a significant improvement in precision. In summary, the intelligent spraying system designed in this paper has certain advantages and competitiveness, but at the same time, it has certain shortcomings. The advantages of this study include the use of target detection technology to accurately identify slow-growing crop plants and an intelligent control system for precise and quantitative fertilization at specific locations. This approach reduces fertilizer waste and lowers production costs. Additionally, despite limited resources (the fixed arithmetic power of a Raspberry Pi), the recognition model achieves a balance between speed and precision, thereby reducing costs.
- (1)
Regarding the spraying of foliar fertilizer for 20 s after the model has quickly detected slow-growing mulberry cuttings, 20 s is just our own setting, and the optimal foliar fertilizer spraying duration needs to be investigated experimentally to improve the growth rate of mulberry cuttings while reducing the consumption of foliar fertilizer.
- (2)
The current 1 h spraying interval for foliar fertilizer is only suitable for the current growth state of the mulberry cuttings; with the gradual growth of the mulberry cuttings, the foliar fertilizer they need will increase, and continuing to spray it according to the current spraying interval and spraying time will lead to insufficient nutrients. Several experiments are needed to optimize the spraying interval and duration.
- (3)
It is necessary to adjust the angle and height of the camera according to the growth of mulberry tree cuttings in order to better detect the growth of mulberry tree cuttings.
In our experimental study, we observed that 25% of the mulberry cuttings exhibited slower growth rates. By employing our intelligent spraying system to target and fertilize only these underperforming cuttings, we achieved a notable reduction in the overall fertilizer usage. This targeted approach to spraying, as opposed to the conventional uniform application method, significantly decreased fertilizer consumption. The potential cost savings and environmental benefits of this targeted strategy warrant further investigation in future research endeavors. Beyond reducing material costs, our targeted fertilization method aligns with eco-friendly agricultural practices by minimizing waste and conserving valuable resources.
In the future, we will test the system in a greenhouse or in an external environment with the following research focuses:
- (1)
The environmental adaptability of the system.
- (2)
The effectiveness of the system in saving fertilizer.
- (3)
The actual identification and localization capabilities of the system.
- (4)
Improvements to the model made according to the actual situation.
Additionally, we recognize the critical role of sustainable energy solutions in enhancing the efficiency and autonomy of agricultural monitoring systems. As emphasized by Abidin et al. [
45], optimizing energy harvesting for low-power sensors in wireless sensor networks is essential for the long-term sustainability of such systems. By integrating the sustainable energy solutions proposed in their research, we can further improve the efficiency and reduce the environmental impact of our intelligent spraying system.
Since our trinocular vision system has some technical limitations, in order to further improve the system performance, we will draw on the advanced 3D imaging technology solutions proposed by Hu K [
46], Li X [
47], etc., and continue to optimize the system architecture by leveraging the capabilities of cutting-edge technology.
Looking ahead, we aim to refine the system’s limitations and minimize environmental influences to enhance its contribution to precision and smart agriculture. While the framework has been assessed in controlled settings, subsequent research should focus on deploying it in semi-controlled greenhouse environments over multiple growing seasons. This approach would offer valuable insights to optimize its adaptability and performance in a wider range of real-world scenarios.