Previous Article in Journal
Genotype-by-Environment Interaction and Stability of Canola (Brassica napus L.) for Weed Suppression through Improved Interference
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Robot Control Technology of Tomato Plant Lowering in Greenhouses

College of Engineering, China Agricultural University, Beijing 100083, China
*
Authors to whom correspondence should be addressed.
Agronomy 2024, 14(9), 1966; https://doi.org/10.3390/agronomy14091966 (registering DOI)
Submission received: 3 August 2024 / Revised: 25 August 2024 / Accepted: 27 August 2024 / Published: 30 August 2024
(This article belongs to the Section Precision and Digital Agriculture)

Abstract

:
Currently, tomato plant lowering is performed manually, which is both inefficient and costly. The manual process presents challenges in terms of efficiency and cost, creating a need for automated solutions in greenhouse environments. This paper addresses this issue by presenting the design and development of a tomato-plant-lowering robot utilizing machine vision and deep learning techniques. The study includes the design of an end effector optimized for plant-lowering operations based on the physical characteristics of tomato vines and roller hooks; precise positioning of roller hooks achieved through kinematic analysis and a custom dataset; integration of the RepC3 module from RT-DETR with YOLOv5s for enhanced object detection and positioning; and real-time camera feed display through an integrated application. Performance evaluation through experimental tests shows improvements in recognition accuracy, positioning precision, and operational efficiency, although the robot’s success rate in leaf removal needs further enhancement. This research provides a solid foundation for future developments in plant-lowering robots and offers practical insights and technical guidance.

1. Introduction

China, with an area of up to 11,572,000 m2 and an annual production exceeding 85.36 million tons, has the largest tomato growing area and production in the world [1,2]. Greenhouse-grown tomatoes make up 70.29% of the total vegetable production, with an area of 5.82 million m2 and an annual output exceeding 60 million tons [3].
The agriculture industry is facing unprecedented pressure due to the continuous growth of the global population and increasing food demand. To boost agricultural productivity, reduce labor costs, and ensure crop quality, agricultural automation and robotics have become crucial technologies [4,5,6,7,8,9,10]. In fruit and vegetable cultivation, supporting plants, lowering them, and separating are essential agronomic practices. These practices effectively improve crop growth environments, promote plant health, and increase fruit yields.
In greenhouse tomato cultivation, plant-lowering operations primarily rely on manual labor, which is labor-intensive and time-consuming. Additionally, staff must operate from climbing vehicles at elevated positions, which poses certain risks.
To address these challenges, there is an urgent need for efficient, labor-saving automated equipment to mechanize and automate the support, lowering and separation of vegetable plants.
In greenhouses, vine crops like cucumbers and long-season tomatoes can grow to over 10 m in height. Due to greenhouse height limitations, plants must be lowered when vines reach a certain height to increase crop yield [11]. Plant-lowering involves winding continuously growing stems downward and arranging them neatly to provide space for new growth and facilitate plant management [12]. Plant-lowering operations also involve adjusting the spacing of support wires to increase row distance during growth, preventing lower leaves from blocking light, which affects photosynthesis and air circulation, thus reducing disease. During operations, row distance is decreased to clear pathways for manual work. In actual planting processes, plant-lowering and separation are performed manually, addressing each support wire individually, which is time-consuming, labor-intensive, and inefficient.
Even in highly mechanized countries such as the Netherlands and Japan, cucumber plant-lowering remains unresolved [13,14]. But what about tomato plant lowering? Hou et al. [15] combined circular belt drive winder shaft rotation with closed-loop steel wire two-way pull thinning and achieved automatic thinning through horizontal movement and in situ rotation of the winder shaft. The development of single-plant support systems has led to the creation of corresponding manufacturing equipment [16]. Lang et al. [17] designed a plant-lowering device for greenhouse vine crops, based on the structure and working principle of a general plant-lowering device for crops. This device is compact and user-friendly. Its driving mechanism includes a DC brushless motor, a torque and speed sensor, a planetary reducer, a fixed shaft support, and a main transmission shaft. As the transmission shaft rotates, the rope winder attached to it releases the support ropes, ensuring uniform plant-lowering by maintaining consistent rope length for each plant.
Betti et al. [18] introduced YOLO-S, a lightweight and efficient YOLO model designed specifically for small object detection. YOLO-S enhances detection accuracy and speed through optimized network structure and feature fusion. The recognition principle for plant-lowering robots is similar to that for picking robots, differing only in the recognition target. Suo et al. [19] collected and classified 1160 kiwi images based on picking strategies and occlusion conditions, and then trained and tested these images with YOLOv3 and YOLOv4 models. Experimental results indicated that detailed annotation and classification of the dataset significantly improved the detection accuracy of the network models. Julian et al. [20] discussed applying the YOLO algorithm to small devices in smart glasses, utilizing low-power processors for efficient real-time object detection. This has significant implications for detecting small objects, such as roller hooks. Gomaa [21] propose a new, reliable semi-automatic method that combines a modified version of the detection-based CNN You Only Look Once V4 (YOLOv4) technique and background subtraction technique to perform unsupervised object detection for surveillance videos. Chen et al. [22] proposed methods to enhance YOLOv5s performance in small object detection through improved data augmentation and feature fusion strategies, thereby increasing both accuracy and speed. To meet the accuracy, lightweight, and quick response requirements for the plant-lowering process of greenhouse tomato vines, we developed a wheel hook detection algorithm using the improved YOLOv5s, combined with a depth camera. Xu et al. [23] propose a new method for removing specular highlights from a single grayscale image. Considering the similarity between the highlight image and diffuse image, we employ an attention-based submodule to generate a mask image, termed the “highlight intensity mask.” This mask locates pixels containing specular highlights and aids the skip-connected autoencoder in their removal. A pixel discriminator and Structural Similarity (SSIM) loss are utilized to ensure that more details are retained in the output images.
Yuan et al. [24] proposed an end effector with a reconfigurable multi-link mechanism and roller-type fingertip, designed to grasp and spread flaky, deformable objects. Peng et al. [25] simulated manual picking actions, proposing a ‘rotary pull-up’ clamping and ripping method and designing the corresponding actuation structure. Xing et al. [26] addressed the time-consuming and labor-intensive process of removing abnormal rapeseed plants by designing a clamping manipulator that meets agronomic requirements. For picking cluster-shaped lychee fruit, the Zhou team [27] from South China Agricultural University developed a picking robot with an end effector consisting of an end holder and a rotating cutter head. The robot uses a collision-free motion planning algorithm to make picking safer and more convenient. Based on the aforementioned end effectors and the physical characteristics of the roller hook, we developed an integrated end effector suitable for greenhouse applications.
The advancement of multi-joint robotic arms provides a new direction for agricultural robot research. The ArimaS team developed a cherry-picking robot equipped with a visual system to identify obstacles in the environment and perform path planning for the robotic arm [28,29]. In 2015, DanSteere developed an apple-picking robot with a four-degree-of-freedom robotic arm and an air suction end effector, noted for its high efficiency, short picking time, and wide working range [30]. Dr. Tokunawa proposed a continuum manipulator with a flexible structure as a technical solution. This manipulator has been proven to be safe and offers wide reachability, but it has a low payload capacity [31]. The collaboration between the end effector and the multi-joint robotic arm significantly enhances picking efficiency.
This paper introduces a novel plant-lowering robot specifically designed for roller hooks, based on a continuous robot structure, and presents the design of a dual-claw end effector. Additionally, this research offers a machine-learning model for roller hook identification. Based on kinematic analysis of the robotic arm and hand–eye calibration, we have proposed a control algorithm for plant lowering and separating. Ultimately, an experimental environment was established for testing, evaluating the robot’s efficiency through experiments with different positions and quantities of roller hooks. Based on these results, we analyzed failure cases to optimize and improve the control algorithm further.
Compared with the previous integrated plant-lowering device, the plant-lowering robot studied in this paper has the following advantages: the plant-lowering robot is more flexible and can operate tomato plant lowering within a specified range. The camera can detect the image in real time. The double-claw structure of the end effector can make the process of falling very smooth and reduce the damage of the branches and leaves.

2. Materials and Lowering and Separating Integrated End Effector

2.1. Greenhouse Environment and Hook Parameters

This study focuses on tomato trellis roller hooks and is conducted at the advanced greenhouse facilities of Beijing Hongfu Group. Cherry tomato plants are grown using a specialized method involving “coconut soil, nutrient solution, and hanging vines”. Plants are spaced about 600 mm apart and trained to grow at a 45° angle from the bottom left to top right. The vertical matrix height is around 800 mm, and adjacent planting ridges are spaced approximately 850 mm apart. A 680 mm-wide track is laid between the ridges to facilitate efficient movement and harvesting operations for robotic systems, as shown in Figure 1.
The packaging dimensions of the roller hooks are 280 × 180 × 66 mm, and they weigh 0.5 kg. The lifting rope can support a weight of 15–18 kg, with a line diameter of 3.0–4.0 mm. One complete rotation of the lifting rope measures 318 mm, although this length will gradually decrease with use. The rollers are manufactured from newly imported anti-aging materials to ensure durability and reliability. The accompanying brackets are made from hot-dip galvanized materials to enhance corrosion resistance in greenhouse environments, as shown in Figure 2.
Weighing the vines of two types of tomatoes in the greenhouse, it was observed that the weight of cherry tomato vines ranged from 2 to 3 kg, while the weight of large tomato vines ranged from 1 to 2 kg (Table 1).
When different weights are hung on the roller hook, the lifting force required to clamp the locking arm will vary. As shown in Figure 3. It can be observed that as the weight of the suspended object increases, the required force also increases.

2.2. Tomato-Plant-Lowering-Related Content

The stages of tomato growth include Sowing to Germination (Week 1–2), Seedling Stage (Week 2–6), Transplanting to Final Growing Site (Week 6–8), Early Vegetative Growth (Week 8–12), Flowering and Fruit Set (Week 12–16), Fruit Development and Ripening (Week 16–30), and Late Season (Week 30+). During the Early Vegetative Growth and Flowering and Fruit Set stages, the vine continues to grow indefinitely. Different tomato varieties have varying growing periods, which influence the timing of vine lowering. Seasonal variations affect light intensity, which in turn influences the growth rate of tomato vines and the frequency of plant lowering. Depending on the variety and cultivation environment, tomato vines can extend several meters in length throughout the growing season. Excessively long vines can hinder light penetration and air circulation, making regular vine lowering essential. When the vine reaches over 3 m in length, it is necessary to lower the entire vine to ensure it receives adequate sunlight and facilitate the vine-lowering operation. Additionally, the horizontal movement causes the vines to wrap around both sides of the cultivation ridge as the stems bend. At the end of the vine-lowering process, the leaves on the lower parts of the vine are left and the mature fruits are harvested.

2.3. Design of the End Effector

To address the issues of vine positioning and separation, this paper proposes an operational method based on the midpoint position of the roller. The end effector approaches the target roller along a predefined path, controlled by data sent via serial communication from the host computer, thereby completing the plant-lowering operations.
The gripper performs the plant-lowering function by quickly clamping the lock arm and vertical support, allowing the rope to drop rapidly under gravity. The gripper releases after 1–2 s. However, this method cannot control the length of the unwound line. The sudden release causes the hook to collide violently with the roller block, leading to severe vibrations in the vine and causing the tomatoes to fall off. Therefore, an additional lower clamp is needed to grip the rope and stabilize it through a guide rail, ensuring stable plant-lowering operations. Figure 4 shows the overall assembly diagram of the end effector.
Figure 5a shows the three-dimensional structure of the Upper Gripper, which primarily consists of a servo motor, connecting parts, and the gripper itself. The overall structure resembles an offset crank-slider mechanism. The host sends signals to the servo motor, causing the servo plate to rotate. This rotation drives both the cylindrical connecting part and the arc-shaped connecting part, moving the gripper horizontally to grip the lock arm and vertical arm. The servo motor’s maximum torque is 30 kg. This design ensures that the gripper’s horizontal clamping force is sufficient to lift the lock arm, facilitating the unwinding operation. A rubber pad is attached to the gripper to increase friction and prevent slipping.
Figure 5b shows the three-dimensional structure of the Lower Gripper, which primarily consists of an electric gripper and two gripping parts. Given the significant gripping force required to control the vine’s descent through friction, we selected the HITBOT Z-EFG-20P gripper. This gripper provides a total stroke of 20 mm and a gripping force of 30–80 N, meeting our requirements. The two gripper ends feature inward-facing inclined surfaces in the gripping direction. If these surfaces were horizontal, they would not securely grip the rope, potentially causing the vine to fall. The inclined surfaces enhance the grip on the rope. Additionally, the gripping surfaces are covered with a rubber pad to increase the friction coefficient with the rope, thereby improving the grip and preventing slipping.

2.4. The Workflow of the End Effector

The execution process of the end effector is shown in Figure 6. The key to using this end effector lies in accurately obtaining the pixel coordinates and depth values of the roller hook within the camera view. This information is determined by the robot’s visual system, which is closely related to the end effector. Therefore, the vision algorithm proposed in Section 3 is optimized based on the vine-dropping scheme and end effector design presented in this section. The working process of the end effector is shown in Figure 6. After the robotic arm moves the end effector to the predetermined position and enters the preparatory state (Figure 6a), the lower clamp is tightened (Figure 6b), followed by the tightening of the upper clamp (Figure 6c). The guide rail then moves downward to start releasing the line (Figure 6d), after which the upper clamp is loosened (Figure 6e), and finally, the lower clamp is loosened (Figure 6f), completing the vine-dropping operation.

3. Target Detection and 3D Location

3.1. Dataset Construction

Currently, there are few open-source datasets for agricultural hooks, particularly for roller hooks, and experimental data are nearly non-existent. In this study, experimental data were collected in both laboratory and actual greenhouse environments. The images are in JPEG format with resolutions of 1280 × 720 and 2K, and data collection was conducted under various lighting conditions to increase the dataset’s diversity.
The equipment used for target detection and calibration was the Intel RealSense D435i depth camera, which includes an RGB camera, infrared cameras, an infrared transmitter, and an IMU (depth resolution: 1280 × 720; RGB sensor resolution: 1920 × 1080). To enhance the model’s generalization ability—the capacity to predict new data—data augmentation techniques were employed. Supervised data augmentation schemes, which apply known image transformation rules to the dataset, were used to allow the model to learn more features and enrich the dataset. This, in turn, improves the model’s performance and stability. The data augmentation methods applied in this study include: (1) random cropping; (2) mirroring; (3) adding Gaussian noise; (4) converting to grayscale; and (5) rotating by 90°.
Each collected image was manually screened and classified using LabelImg 7.19 software. During the annotation process, a rectangular box was used to outline the minimum bounding rectangle around each hook and roller, and the category attribute was set to “device”. After annotation, the Txt format label file was automatically saved, containing the coordinates and category information of the rectangular box. Following the YOLO dataset format, a greenhouse roller hook dataset was created with a total of 1619 images, and the training-set-to-validation-set ratio was set to 7:3 (Figure 7).

3.2. Improved Yolo-v5s Detection Model

The experimental running environment was as follows: CPU: i5-12400F, 16 GB RAM; GPU: GAINWARD 3060ti, 6 GB VRAM; operating system: Windows 10; deep learning framework: Pytorch 1.8.0, using CUDA 11.1 and cudnn 8.0.4. The following running software was used: Pycharm 2024.1 Community Edition, Python 3.7.16. Considering the experimental hardware platform conditions and model detection accuracy, selecting the appropriate network depth and width is very important.
YOLOv5s is a lightweight object detection model based on the YOLO (You Only Look Once) architecture. YOLOv5s uses convolutional neural networks to detect objects in images in real time and effectively balances speed and performance. YOLOv5s mainly consist of the backbone, neck, and head. The backbone is the core part of YOLOv5, responsible for extracting features from the input image and converting the raw input image into multiple feature maps to support subsequent object detection tasks.
Introducing the RepC3 module from RT-DETR into YOLOv5s aims to overcome the limitations of the YOLO series and other Transformer-based detectors. By integrating fusion technology and residual structures, the RepC3 module optimizes the model’s inference speed while maintaining high performance, making it particularly suitable for real-time object detection tasks. With a refined structural design, RepC3 can achieve high detection accuracy at a lower computational cost. The improved network structure is shown in Figure 8. Its main components include the Focus module, Conv module, C3 module, SPPF module, and RepC3 module.
The Focus structure’s key aspect is segmenting the image into smaller feature maps, as shown in Figure 8a. The Conv module is a basic module commonly used in convolutional neural networks, mainly composed of convolutional layers, BN (Batch Normalization) layers, and activation functions. Adding a BN layer after the convolutional layer normalizes the output, accelerates the training process, improves the model’s generalization ability, and reduces the model’s dependency on initialization. The activation function is a nonlinear function that introduces nonlinearity into the neural network, enabling it to adapt to different types of data distributions.
The C3 module, by incorporating residual structures and partial convolutions, reduces computational costs while maintaining strong feature representation capabilities. This helps improve the model’s detection accuracy while reducing computational complexity, making it more suitable for resource-constrained devices.
The SPPF (Spatial Pyramid Pooling—Fast) is an improved module in YOLOv5 designed to enhance feature extraction efficiency. It captures information from different spatial ranges by pooling feature maps at different scales. The SPPF module performs multi-scale pooling on the feature maps and concatenates the results, enhancing the model’s ability to detect objects at multiple scales while maintaining low computational complexity. Compared to traditional spatial pyramid pooling, SPPF optimizes the model’s inference speed and performance while maintaining efficiency.
RepC3 is a module built on RepConv, which is a variant of the CSP (Cross Stage Partial) structure, commonly used in the bottleneck layer of neural networks. The principle of RepC3 is shown in Figure 9; RepC3 is a Reparameterization Convolution module that allows networks to use different structures during the training and inference phases. In the training phase, RepC3 can be represented as a standard convolutional layer, but in the inference phase, it can be reparametrized into a more efficient structure, which reduces the amount of computation and increases the speed of inference. This technique is particularly suitable for object detection models that require real-time processing, such as the YOLO family.
Based on the collected dataset and model, the network was trained with specific parameters, and the results are shown in Figure 10. Figure 10a presents the confusion matrix, which summarizes prediction results for classification problems. The confusion matrix displays counts of correct and incorrect predictions, broken down by each class, highlighting where the classification model tends to make errors. This detailed analysis helps to understand not only the errors made by the model but also the types of errors, overcoming the limitations of relying solely on classification accuracy. The horizontal axis represents the true labels, while the vertical axis represents the predicted labels. Each cell’s value indicates the number of samples where the true label matches the predicted label. The probability of correctly classifying the device category is 0.995, which rounds to 1.
Figure 10b shows the relationship between the F1 score and confidence thresholds. The F1 score is a critical metric for classification problems, representing the harmonic mean of precision and recall. The curve approaching 1 indicates that the model performs well on the training dataset. Figure 10c displays the precision–confidence threshold curve. As confidence increases, classification precision also improves, though some categories with lower confidence may be missed. Figure 10d shows the precision–recall (PR) curve, a common tool for evaluating multi-class classification performance. As illustrated, higher precision is associated with lower recall. The goal is to detect all categories while maintaining high precision, so the curve should approach the (1,1) point, indicating that the area under the mAP curve should be as close to 1 as possible.

4. Construction of the Plant-Lowering Robot System

4.1. Robot Hardware System Building

As shown in Figure 11, this robotic system is specifically designed for tomato plant lowering and comprises several modular components. These components include a sensing module with an Intel RealSense D435 RGB-D camera, a robotic arm module featuring a six-DOF industrial AUBO i5 robot arm, a control box, and an operation module equipped with an STM32F407 development board, motor controllers, servo motors, rails, and end effectors. The core controller of the system is the STM32F407, which receives data from the host via serial communication. The STM32F407 then outputs signals through its pins and utilizes RS-485 communication to control the servo motors and rails, executing operations in a predefined sequence to complete the plant-lowering process.

4.2. Kinematics of Robot Arm

The Denavit–Hartenberg parameters (D-H parameters) are four parameters associated with a specific convention for attaching the reference coordinate system to the linkage of the spatial motion chain or robotic manipulator arm. Using the improved D-H modeling method for kinematic modeling, the specific steps are as follows:
  • Coordinate systems are established to determine the positions between the links of the robot arm. The Z i axis coincides with the axis of the joint i . If the Z i axis and the Z i + 1 axis intersect, their intersection point serves as the origin of the coordinate system. If they do not intersect, the origin is the intersection of the common perpendicular of the two axes and the Z i axis. The X i axis is perpendicular to both the Z i axis and the Z i + 1 axis. If the Z i axis and the Z i + 1 axis do not intersect, the X i axis points from the Z i axis to the Z i + 1 axis. Once the Z i axis and X i axis are determined, the Y i axis can be established according to the right-hand rule. These steps allow for the determination of the link coordinate systems of a robot arm. To simplify the transformation between coordinate systems, the third and fourth coordinate systems have been offset, as shown in Figure 12.
  • The coordinate system between two adjacent connecting rods is transformed through translation and rotation. The implementation steps of the improved D-H modeling method are as follows: (1) rotating the coordinate system X i 1 , Y i 1 , Z i 1 around the X i 1 axis so that the Z i 1 axis is parallel to the Z i axis; (2) translating the coordinate system X i 1 , Y i 1 , Z i 1 along the direction of the X i 1 axis until the Z i 1 axis coincides with the Z i axis; (3) rotating the coordinate system X i 1 , Y i 1 , Z i 1 around the Z i axis so that the X i 1 axis is parallel to the X i axis; and (4) translating the coordinate system X i 1 , Y i 1 , Z i 1 along the Z i axis so that the X i 1 axis coincides with the X i axis. After the coordinate systems aligned, we established the D-H parameters (Table 2).
The formula for calculating the transformation matrix between adjacent link coordinate systems is presented in Equation (1):
T i i 1 = R ( X i 1 , α i 1 ) T ( X i 1 , α i 1 ) R ( Z i , θ i ) = cos ( θ i ) sin ( θ i ) 0 α i 1 cos ( α i 1 ) sin ( θ i ) cos ( α i 1 ) cos ( θ i ) sin ( α i 1 ) d i sin ( α i 1 ) sin ( α i 1 ) sin ( θ i ) cos ( θ i ) sin ( α i 1 ) cos ( α i 1 ) d i cos ( α i 1 ) 0 0 0 1
where T i i 1 is transformation matrices of i 1 link and i link.
3
The transformation matrix T 0 1 , T 2 1 , T 3 2 , T 4 3 , T 5 4 , T 6 5 between links is obtained using the D-H parameter table.
4
The transformation matrix between any two links is derived by successively multiplying the transformation matrices of adjacent links.
T m n = T n + 1 n T n + 2 n + 1 T n + 3 n + 2 T m m 1 w h e r e n < m .

4.3. Hand–Eye System Calibration

Accurate identification and grasping of objects by the robotic arm require precise calibration steps. This process consists of two parts: camera calibration and hand–eye calibration. Camera calibration aims to determine the camera’s intrinsic parameters and distortion coefficients, whereas hand–eye calibration establishes the mapping relationship between the camera and the robotic arm base.
This process involves handling transformations among four coordinate systems: the pixel coordinate system, image coordinate system, camera coordinate system, and world coordinate system, as illustrated in Figure 13. Figure 13 presents the translation matrix.
Hand–eye calibration primarily aims to establish the relationship between the camera coordinate system and the robot arm base coordinate system. By calculating the transformation matrix between these coordinate systems, the 3D coordinates of a target in the camera coordinate system can be converted to the robot arm base coordinate system. Depending on the camera’s position, there are two methods: “eyes in hand” and “eyes to hand,” as shown in Figure 14.
In this paper, the camera is configured using the “eyes in hand” setup. As shown in Figure 14a, during the calibration process, the calibration board is placed in a fixed position, and the robot changes its pose. H also stands for homogenous matrix, which is also known as the transformation matrix. The camera captures images of the calibration board from these different poses. H tool base is the transformation matrix from the tool coordinate system at the robot end effector to the robot base coordinate system, H tool cam is transformation matrix from the tool coordinate system at the robot end effector to the camera coordinate system, H cal cam is transformation matrix from the calibration board coordinate system to the camera coordinate system, and H cal base is transformation matrix from the calibration board coordinate system to the robot base coordinate system.
Closed-loop relationship:
H cam tool = H base tool × H cal base × H cam cal
Each time the robot moves to a new pose, a closed-loop relationship is established as described above. Since H cal base is invariant, it can be eliminated by selecting any two relationships obtained from different poses. Here, we choose two consecutive poses.
Eliminate H cal base using two consecutive poses:
H tool o base × H cam 0 tool × H cal 0 cam = H tool 1 base × H cam 1 tool × H cal 1 cam
Because H cal base is invariant.
H cam 0 tool = H cam 1 tool = H cam tool H base o tool × H cam tool × H cal 0 cam = H tool 1 base × H tool cam × H cal 1 cam
Multiply both sides of the equation on the left by H tool 1 base 1 and on the right by H cal 0 cam 1 .
H tool 1 base 1 × H tool o base × H cam tool = H cam tool × H cal 1 cam × H cal 0 cam 1
Convert to
A X = X B
Among them:
A = H tool 1 base 1 × H tool 0 base B = H cal 1 cam × H cal 0 cam 1 X = H cam tool
As shown in Figure 15a, when the calibration board is stationary, the camera captures 20 sets of simulated calibration data. This demonstrates the collection of calibration data from various positions and perspectives of the robotic arm. Similarly, as depicted in Figure 15b, with the camera fixed, simulated calibration data from the calibration board at 20 different depth positions illustrate the collection of images under various depths and orientations.
The transformation matrix is obtained by connecting the rotation matrix with the translation matrix in series. The rotation matrix and translation matrix for hand–eye calibration of the plant-lowering robot are presented in Table 3.

5. Experiment of Automatic Plant-Lowering

5.1. Plant-Lowering Robot Application

An application for a defoliation robot has been developed, as shown in Figure 16.
Figure 16a shows the interface when the application is first launched. Clicking the “Initialize” button moves the robotic arm to a designated initial position. Clicking the “Move Out” button activates the camera, as shown in Figure 16b. The camera identifies the device and outputs the coordinates of the nearest wheel detection frame’s center point. These coordinates are transmitted to the robotic arm, which then moves accordingly. Clicking the “Stop Action” button stops the robotic arm, and the camera freezes on the last frame. Clicking the “Exit” button closes the application, completing the entire operation process.
The working process of the robotic arm is illustrated in Figure 17. Figure 17a shows the initial posture of the robotic arm. Clicking the “Initialize” button in the application moves the robotic arm to its preset initial position. Figure 17b shows the pre-defoliation posture of the robotic arm. After the camera acquires the coordinates of the center point of the wheel detection frame, the robotic arm moves to the pre-defoliation position through a series of coordinate transformations. Figure 17c shows the robotic arm’s position when it begins the defoliation action.

5.2. Experimental Analysis of Visual Recognition

Part of the picture of visual detection is shown in Figure 18.
To verify the effectiveness of the vision module, we conducted multiple recognition experiments varying the number of hooks and the distance from the camera, recording the visual recognition success rate. These experiments focused on identifying roller hooks and positioning center points. The results from detection experiments with varying distances and numbers of roller hooks are shown in Table 4.
The table shows that the quantity of roller hooks has little impact on visual detection. This is mainly because, when multiple roller hooks appear in the picture, some are viewed from the side rather than head-on, causing recognition errors. However, these errors can be minimized by enhancing the dataset. Distance significantly affects visual detection, with the optimal detection occurring at 0.4–0.5 m. Beyond this range, accuracy declines. In the working environment, the effective distance is between 0.4 and 0.5 m.

5.3. Experimental Analysis of End Effector

To verify the end effector module, we conducted experiments on the upper and lower clamps to test their clamping effectiveness. The roller hook lock arm was divided evenly into six clamping points, as shown in Figure 19. The upper clamping jaws clamped each of the six points, observing whether they could secure the lock arm and release the self-locking state of the roller hook. The results are presented in Table 5.
The experimental results indicate that the success rate of clamping increases as the clamping point approaches the upper region. At point A, the success rate is 90%, meeting the clamping requirements. At point F, the success rate drops to 30%, which is insufficient for clamping. Therefore, in the falling vine experiment, the clamping point should be set above point D to meet the required standards. The success rate between points A and D is 20–30% points higher than between points E and F (Figure 20 and Table 6).
The experimental results show that for the upper jaw, the success rate increases as it approaches the upper arm of the roller hook. For the lower jaw, the success rate of clamping increases as the rope nears the inside of the gripper parts. Therefore, both factors must be considered when conducting the overall experiment to determine the working attitude and position of the end effector. We can clearly see that the success rate of points D–F is about 70% higher than that of A–C. So, when we perform the experiment, we should choose the position between points D and F.

5.4. Experimental Analysis of Lowering Plants

The various modules were combined to experiment with the complete operation of the plant-lowering robot, recording the success rate. During the operation of the Rotoman robot, the RGB camera in the visual recognition system first captures the image information of the target fruit string. It then obtains the target’s pixel coordinates in the image coordinate system and converts them into position coordinates in the robot arm base’s coordinate system through hand–eye calibration. The end effector is then moved to the corresponding position using the controller to direct the robot arm’s movement. Hand–eye calibration should be completed before the experiment. Roller hooks in different positions were placed within the robotic arm’s working range, and the experimental results are shown in Table 7.
During the experiments, failure cases were carefully observed and analyzed. As shown in Table 4, issues such as “Image not recognized” and “No depth value obtained” hindered the proper lowering of the end effector. This occurred because the roller hook’s pure white color affected depth perception and recognition accuracy under varying lighting conditions. “Did not raise the lock arm” and “The claw is not clamping the rope” were due to the robotic arm not reaching the designated position or disconnections in the STM32 wiring, preventing the end effector from moving correctly. “The claw is not clamping the rope” was the most frequent failure. Additionally, the sway of other hooks caused by the first roller hook’s completed lowering operation significantly disrupted camera data acquisition, leading to failures.
Through full-machine experiments, it was observed that the robot achieved an average success rate of 60% when handling a single roller hook, and 55% when handling two roller hooks. When the roller hook is positioned differently on the camera screen, the success rate of plant-lowering is 10–20% higher for hooks in the middle position compared to those on the left or right. Therefore, when a single roller hook is positioned in the center of the camera screen, the plant-lowering effect is optimal. Given the lack of existing robots for this task on the market, future efforts can focus on improving accuracy based on these findings.

6. Conclusions

Currently, specialized plant-lowering robots are lacking. This paper presents the design and successful testing of a tomato-plant-lowering robot using machine vision recognition and deep learning to improve lowering and vine-separating efficiency. We studied the physical characteristics of roller hooks and designed the end effector accordingly. A dedicated roller hook dataset was created, and the YOLOv5s detection model, combined with a depth camera, was used to output the 3D coordinates of the target hooks, accurately completing the object detection task.
After completing camera and hand–eye calibration, we collected images of the calibration plate from different angles and distances. These images were input into the Matlab calibration toolbox to obtain the camera’s distortion coefficients and internal parameter matrix, completing the camera calibration. By analyzing the relationship between the two fixed coordinate systems and using the least squares method for fitting, we obtained the hand–eye calibration matrix, successfully converting from the pixel coordinate system to the robot arm base coordinate system.
Analysis of simulation results and failure cases indicates that most failures are due to visual issues, with a few related to the end effector’s structure. Additionally, using the midpoint of the YOLOv5 detection box as the target point causes significant errors. The whiteness of the roller hooks greatly affects image recognition and depth value acquisition. In the future, we will develop a vision system suitable for a broader range of scenarios, find appropriate methods for determining point locations, and improve the recognition of white objects to enhance the accuracy of the target detection and plant-lowering system. Additionally, based on our research into agricultural characteristics and hook properties, we will continue to improve the mechanical structure and algorithms of the end effector to achieve better plant-lowering results. This is our next research goal.
However, this study only conducted simulation experiments and did not include real experiments under different lighting conditions. In the next step, we will further develop the vision system suitable for a wider range of application scenarios and develop supporting devices that can reduce recognition interference such as light source occlusion, so as to further improve the accuracy rate of the target detection system and the success rate of the lowering system. In addition, on the basis of studying the physical characteristics of the roller hook, we will further improve the mechanical structure and matching algorithm of the robot to achieve better lowering effects, which is also the goal of our next research.

Author Contributions

Conceptualization, S.X.; Methodology, S.X.; Software, S.X.; Validation, Z.X.; Formal analysis, H.Q. and X.A.; Investigation, S.X. and Z.X.; Resources, S.X.; Data curation, S.X.; Writing—original draft, S.X.; Writing—review & editing, B.Z.; Visualization, S.X.; Supervision, T.Y. and W.L.; Project administration, T.Y. and W.L.; Funding acquisition, T.Y. and W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China (NK202315020107).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank Hongfu Agricultural Tomato Production Park. The greenhouses were provided by the Hongfu Agricultural Tomato Production Park site in Beijing, China.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Xu, C.; Xiong, Z.; Jiang, X.P.; Deng, M.; Huang, G.C. Design and Research of the Cluster Tomato Picking Robot. Mod. Agric. Equip. 2021, 42, 15–23. [Google Scholar]
  2. Li, Y.L.; Wen, X.G. Analysis on the difference of greenhouse tomato production between China and the Netherlands. Appl. Eng. Technol. 2018, 38, 10–14. [Google Scholar]
  3. Wang, Z.H.; Xun, Y.; Wang, Y.K.; Yang, Q.H. Review of smart robots for fruit and vegetable picking in agriculture. Int. J. Agric. Biol. Eng. 2022, 15, 33–54. [Google Scholar]
  4. Wang, M.; Wang, B.; Zhang, R.; Wu, Z.; Xiao, X. Flexible Vis/NIR wireless sensing system for banana monitoring. Food Qual. Saf. 2023, 7, fyad025. [Google Scholar] [CrossRef]
  5. Emmi, L.; Fernández, R.; Gonzalez-De-Santos, P. An Efficient Guiding Manager for Ground Mobile Robots in Agriculture. Robotics 2023, 13, 6. [Google Scholar] [CrossRef]
  6. Cejudo, J.G.; Andrés, F.E.; Lujak, M.; Casamayor, C.C.; Fernandez, A.; López, L.H. Towards Agrirobot Digital Twins: Agri-RO5—A Multi-Agent Architecture for Dynamic Fleet Simulation. Electronics 2024, 13, 80. [Google Scholar] [CrossRef]
  7. Mail, M.F.; Maja, J.M.; Marshall, M.; Cutulle, M.; Miller, G.; Barnes, E. Agricultural Harvesting Robot Concept Design and System Components: A Review. AgriEngineering 2023, 5, 777–800. [Google Scholar] [CrossRef]
  8. D’acunto, F.; Marinello, F.; Pezzuolo, A. Rural Land Degradation Assessment through Remote Sensing: Current Technologies, Models, and Applications. Remote Sens. 2024, 16, 3059. [Google Scholar] [CrossRef]
  9. Otani, T.; Itoh, A.; Mizukami, H.; Murakami, M.; Yoshida, S.; Terae, K.; Tanaka, T.; Masaya, K.; Aotake, S.; Funabashi, M.; et al. Agricultural Robot under Solar Panels for Sowing, Pruning, and Harvesting in a Synecoculture Environment. Agriculture 2023, 13, 18. [Google Scholar] [CrossRef]
  10. Kumar, S.; Mohan, S.; Skitova, V. Designing and Implementing a Versatile Agricultural Robot: A Vehicle Manipulator System for Efficient Multitasking in Farming Operations. Machines 2023, 11, 776. [Google Scholar] [CrossRef]
  11. Zhou, H.Y.; Wang, X.; Au, W.; Kang, H.W.; Chen, C. Intelligent robots for fruit harvesting: Recent developments and future challenges. Precis. Agric. 2022, 23, 1856–1907. [Google Scholar] [CrossRef]
  12. Xu, Y.; Liu, Y.; Li, W.; Zhang, M.; Xiao, J. Effect of different reducing vines methods on agronomic traits of greenhouse cucumber. J. Chang. Veg. 2017, 10, 48–50. [Google Scholar]
  13. Vermeulen, C.J.; Hubers, C.; Vries, L.D.; Brazier, F. What horticulture and space exploration can learn from each other: The mission to mars initiative in The Netherlands. Acta Astronaut. 2020, 177, 421–424. [Google Scholar] [CrossRef]
  14. Yamanaka, R.; Kawashima, H. Development of cooling techniques for small-scale protected horticulture in mountainous areas in Japan. Jpn. Agric. Res. Q. 2021, 55, 117–125. [Google Scholar] [CrossRef]
  15. Hou, Y.; Li, K.; Wang, C.H.; Li, S. Development and application of automatic vine falling and thinning device for greenhouse fruits and vegetables. J. Agric. Eng. Technol. 2019, 41, 50–53. [Google Scholar]
  16. Zhang, Y.; Shang, X.; Yang, S. Design of equipment control system for making winder based on PLC. J. Chin. Agric. Mech. 2016, 37, 95–98. [Google Scholar]
  17. Lang, X.H.; Shi, Y.L.; Huang, X.P.; Li, T.H.; Wang, D.W.; Chen, M.D. Design of integral vine-falling device for solar greenhouse. J. Chin. Agric. Mech. 2023, 44, 78–84. [Google Scholar]
  18. Betti, A.; Tucci, M. YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Detection in Aerial Imagery. Sensors 2023, 23, 1865. [Google Scholar] [CrossRef]
  19. Suo, R.; Gao, F.F.; Zhou, Z.X.; Fu, L.S.; Song, Z.Z.; Dhupia, J.; Li, R.; Cui, Y.J. Improved multi-classes kiwifruit detection in orchard to avoid collisions during robotic picking. Comput. Electron. Agric. 2021, 182, 106052. [Google Scholar] [CrossRef]
  20. Moosmann, J.; Bonazzi, P.; Li, Y.W.; Bian, S.; Mayer, P.; Benini, L.; Magno, M. Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with Tinyissimo YOLO. arXiv 2023, arXiv:abs/2311.01057. [Google Scholar]
  21. Gomaa, A.; Abdalrazik, A. Novel Deep Learning Domain Adaptation Approach for Object Detection Using Semi-Self Building Dataset and Modified YOLOv4. World Electr. Veh. J. 2024, 15, 255. [Google Scholar] [CrossRef]
  22. Chen, H.; Yang, W.Q.; Wang, W.; Liu, Z.C. YOLO-TUF: An Improved YOLOv5 Model for Small Object Detection. Commun. Comput. Inf. Sci. 2023, 2058, 471–484. [Google Scholar]
  23. Xu, H.; Li, Q.; Chen, J. Highlight Removal from A Single Grayscale Image Using Attentive GAN. Appl. Artif. Intell. 2022, 36, 1988441. [Google Scholar] [CrossRef]
  24. Yuan, H.; Ren, G.C.; Su, X.Y.; Tian, W. A versatile end effector for grabbing and spreading of flaky deformable object manipulation. Mech. Sci. 2023, 14, 111–123. [Google Scholar]
  25. Xue, P.; Li, Q.; Fu, G.D. Design and Control Simulation Analysis of Tender Tea Bud Picking Manipulator. Appl. Sci. 2024, 14, 928. [Google Scholar] [CrossRef]
  26. Xing, Q.S.; Ding, S.M.; Xue, X.Y.; Cui, L.F.; Le, F.X.; Fu, J. Design and Testing of a Clamping Manipulator for Removing Abnormal Plants in Rape Breeding. Appl. Sci. 2023, 13, 9723. [Google Scholar] [CrossRef]
  27. Feng, Q.C.; Zou, W.S.; Fan, P.F.; Zhang, C.F.; Wang, X. Design and test of robotic harvesting system for cherry tomato. Int. J. Agric. Biol. Eng. 2018, 11, 96–100. [Google Scholar] [CrossRef]
  28. Arima, S.; Kondo, N.; Yagi, Y.; Monta, M.; Yoshida, Y. Harvesting robot for strawberry grown on table top culture. Part 1. Harvesting robot using 5 DOF manipulator. J. Soc. High Technol. Agric. 2001, 13, 159–166. [Google Scholar] [CrossRef]
  29. Arima, S.; Monta, M.; Namba, K.; Yoshida, Y.; Kondo, N. Harvesting robot for strawberry grown on table top culture (Part 2) harvesting robot with a suspended manipulator under cultivation bed. Shokubutsu Kojo Gakkaishi 2003, 15, 162–168. [Google Scholar] [CrossRef]
  30. Bontsema, J. Picking robot for peppers (interview met Jan Bontsema). Wagening. World 2011, 6. Available online: https://edepot.wur.nl/432261 (accessed on 26 August 2024).
  31. Takaaki, T.; Koichi, O.; Akinori, H. 1 segment continuum manipulator for automatic harvesting robot: Prototype and modeling. In Proceedings of the 2017 IEEE International Conference on Mechatronics and Automation, Takamatsu, Japan, 6–9 August 2017. [Google Scholar]
Figure 1. Greenhouse growing environment for cherry tomatoes.
Figure 1. Greenhouse growing environment for cherry tomatoes.
Agronomy 14 01966 g001
Figure 2. Tomato trellis roller hooks for greenhouse.
Figure 2. Tomato trellis roller hooks for greenhouse.
Agronomy 14 01966 g002
Figure 3. Force Required to Open Roller Hooks at Different Weights.
Figure 3. Force Required to Open Roller Hooks at Different Weights.
Agronomy 14 01966 g003
Figure 4. Three-dimensional diagram of the end effector.
Figure 4. Three-dimensional diagram of the end effector.
Agronomy 14 01966 g004
Figure 5. Gripper part. (a) Upper gripper; (b) lower gripper.
Figure 5. Gripper part. (a) Upper gripper; (b) lower gripper.
Agronomy 14 01966 g005
Figure 6. Operating process of the end effector. (a) Preparation; (b) the lower clamp tightens; (c) the upper clamp tightens; (d) the guide rail moves down; (e) the upper clamp releases; (f) the lower clamp releases.
Figure 6. Operating process of the end effector. (a) Preparation; (b) the lower clamp tightens; (c) the upper clamp tightens; (d) the guide rail moves down; (e) the upper clamp releases; (f) the lower clamp releases.
Agronomy 14 01966 g006
Figure 7. Dataset annotation.
Figure 7. Dataset annotation.
Agronomy 14 01966 g007
Figure 8. The network structure of YOLOv5. (a) The Focus structure slices the image into smaller feature maps; (b) the SPPF module.
Figure 8. The network structure of YOLOv5. (a) The Focus structure slices the image into smaller feature maps; (b) the SPPF module.
Agronomy 14 01966 g008
Figure 9. Principles of RepC3.
Figure 9. Principles of RepC3.
Agronomy 14 01966 g009
Figure 10. Performance evaluation. (a) Confusion matrix; (b) F1–confidence curve; (c) precision–confidence curve; (d) PR curve.
Figure 10. Performance evaluation. (a) Confusion matrix; (b) F1–confidence curve; (c) precision–confidence curve; (d) PR curve.
Agronomy 14 01966 g010aAgronomy 14 01966 g010b
Figure 11. Tomato-plant-lowering robot system.
Figure 11. Tomato-plant-lowering robot system.
Agronomy 14 01966 g011
Figure 12. The connecting rod coordinate system.
Figure 12. The connecting rod coordinate system.
Agronomy 14 01966 g012
Figure 13. Coordinate transformation.
Figure 13. Coordinate transformation.
Agronomy 14 01966 g013
Figure 14. Setting position of the camera. (a) Eyes in hand; (b) eyes to hand.
Figure 14. Setting position of the camera. (a) Eyes in hand; (b) eyes to hand.
Agronomy 14 01966 g014
Figure 15. Hand−eye calibration. (a) Visualization of robotic arm calibration positions; (b) visualization of camera calibration data.
Figure 15. Hand−eye calibration. (a) Visualization of robotic arm calibration positions; (b) visualization of camera calibration data.
Agronomy 14 01966 g015
Figure 16. Plant-lowering robot application. (a) Initial interface; (b) working interface.
Figure 16. Plant-lowering robot application. (a) Initial interface; (b) working interface.
Agronomy 14 01966 g016
Figure 17. Robotic arm posture: (a) initial posture; (b) per plant-lowering posture; (c) plant-lowering posture.
Figure 17. Robotic arm posture: (a) initial posture; (b) per plant-lowering posture; (c) plant-lowering posture.
Agronomy 14 01966 g017
Figure 18. Roller hook visual inspection results.
Figure 18. Roller hook visual inspection results.
Agronomy 14 01966 g018
Figure 19. Six clamping points on the roller hook lock arm.
Figure 19. Six clamping points on the roller hook lock arm.
Agronomy 14 01966 g019
Figure 20. Six clamping points on the gripper parts.
Figure 20. Six clamping points on the gripper parts.
Agronomy 14 01966 g020
Table 1. The weight of different types of tomato vines.
Table 1. The weight of different types of tomato vines.
NumberSpecieWeight (kg)
1Cherry tomatoes2.135
2Cherry tomatoes2.910
3Cherry tomatoes2.485
4Cherry tomatoes2.535
5Cherry tomatoes2.890
6Cherry tomatoes3.495
7Large tomatoes1.265
8Large tomatoes1.235
9Large tomatoes2.275
10Large tomatoes2.265
11Large tomatoes1.365
12Large tomatoes1.374
Table 2. AUBO I5 D-H parameter table.
Table 2. AUBO I5 D-H parameter table.
Link   i Length of
Connecting Rod (mm)
Torsion
Angle (°)
Setover of
Link (mm)
Joint Angle (°)
10098.5 π
20 π 2 121.5 π 2
3408 π 00
4376 π 0 π 2
50 π 2 102.50
60 π 2 940
Table 3. Rotation matrix and translation matrix for hand–eye calibration.
Table 3. Rotation matrix and translation matrix for hand–eye calibration.
Rotation MatrixTranslation Matrix
−0.2590−0.96520.035876.5719
0.8881−0.2525−0.384125.7065
0.3797−0.06770.9226296.1457
Table 4. Visual inspection results of different number of roller hooks at different distances.
Table 4. Visual inspection results of different number of roller hooks at different distances.
Number of HooksNumber of ExperimentsThe Distance from the Camera (m)Percent (%)
1400.3–0.460
1400.4–0.595
1400.5–0.680
1400.6–0.770
2400.3–0.460
2400.4–0.590
2400.5–0.680
2400.6–0.765
3400.3–0.455
3400.4–0.585
3400.5–0.670
3400.6–0.755
Table 5. Clamp experiment with upper claw.
Table 5. Clamp experiment with upper claw.
Gripper PointNumber of Roller HooksThe Percent of Upper Claw (%)
A3090
B3080
C3080
D3070
E3060
F3030
Table 6. Clamp experiment with lower claw.
Table 6. Clamp experiment with lower claw.
Gripper PointNumber of Roller HooksThe Percent of Upper Claw (%)
A300
B300
C3030
D3070
E3090
F3095
Table 7. Plant lowering experimental data statistics.
Table 7. Plant lowering experimental data statistics.
Number of HooksNumber of ExperimentsThe Position in the CameraTimes (s)Percent (%)
130Left34.2160
130Mid30.0470
130Right33.5450
230Left72.7855
230Mid70.5660
230right71.3550
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, B.; Xu, S.; Xiong, Z.; Qin, H.; Ai, X.; Yuan, T.; Li, W. Research on Robot Control Technology of Tomato Plant Lowering in Greenhouses. Agronomy 2024, 14, 1966. https://doi.org/10.3390/agronomy14091966

AMA Style

Zhang B, Xu S, Xiong Z, Qin H, Ai X, Yuan T, Li W. Research on Robot Control Technology of Tomato Plant Lowering in Greenhouses. Agronomy. 2024; 14(9):1966. https://doi.org/10.3390/agronomy14091966

Chicago/Turabian Style

Zhang, Bin, Shuhao Xu, Ziming Xiong, Hao Qin, Xinyi Ai, Ting Yuan, and Wei Li. 2024. "Research on Robot Control Technology of Tomato Plant Lowering in Greenhouses" Agronomy 14, no. 9: 1966. https://doi.org/10.3390/agronomy14091966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop