1. Introduction
Cucumber, which is among the top ten vegetables worldwide, is a highly significant and commercially advantageous crop. Cucumbers are highly nutritious, including significant amounts of vitamin B2, vitamin C, carotene, and other beneficial compounds. They are known for their beauty and skincare benefits [
1]. Based on data from the Food and Agriculture Organization of the United Nations (FAO), the global cucumber production area in 2020 was 2,261,000 hectares, with a total production of 91,258,000 tonnes. China continually holds the top position globally in terms of the cucumber cultivation area and yield, making it a crucial industry for adjusting the industrial structure, generating farmer income, and promoting rural economic growth. Nevertheless, the extensive cultivation area and high productivity result in labor expenses comprising more than 50% of the overall production costs, emphasizing the practical importance of investigating precise operations of agricultural robots [
2,
3]. The precise and swift identification of fruit targets is a fundamental focus of agricultural robotics research. However, the intricate nature of the natural development environment, difficulties with near-color backgrounds, occlusion concerns, and variations in illumination conditions provide significant challenges in recognizing fruit targets [
4].
The rise of information technology has led to a growing utilization of computer vision for intricate target identification. Extracting fruit color is a relatively simple task as long as there is a noticeable difference in color between the fruit being targeted and its background. In specific circumstances, color characteristics are employed to identify fruits when the colors of the fruit and the background are alike, as seen with green apples, unripe green tomatoes, and cucumbers. Vitzrabin [
5] initially partitioned the target image into roughly equal rectangular sub-images. They computed the thresholds for each color component of the RGB image to achieve a uniform region of illumination. Subsequently, they transformed it into a three-dimensional spatial image using the natural difference index (NDI) and determined three distinct thresholds for each NDI dimension in every sub-image. This approach enabled the detection of red bell peppers with a high success rate, even in situations with significant variations in illumination. Moghimi [
6] successfully segmented green peppers by merging the H1 and S1 components of the HSI spatial model in the smoothing region, along with the ultra-green operator EXG (EXG = 2G-R-B). Zhang [
7] constructed the H1 component of the super green operator EXG, the normalized g component, and the S1 component in both the RGB and HSV color spaces. The integration of a support vector machine and threshold classification enabled them to attain recognition of green apples. Li [
8] substituted the green component in the RGB color space with the hue component (H2) in the HSV color space to create the composite image (RBH). This modification increased the distinction between the green tomato and its surrounding environment. The researchers employed a thresholding segmentation method to accomplish the initial segmentation of the green tomato.
The color of fruits in natural settings can be readily influenced by various lighting conditions, resulting in color deficit. On the other hand, form features, which are another important characteristic of fruits, can be used to accurately separate and identify fruits. However, it is more difficult to describe and extract these features. Researchers have used shape descriptors, such as the histogram of oriented gradient (HOG), deformable part model (DPM), and Fourier descriptors, in target recognition systems. Bao [
9] performed an examination of the morphology and developmental stage of cucumbers, which resulted in the creation of a repository of templates encompassing various dimensions and bending angles. Subsequently, the team utilized a template matching technique to accurately identify cucumbers.
To enhance the amount of image data, researchers have introduced hyperspectral cameras, infrared cameras, and thermal imagers for image acquisition [
10]. Okamoto [
11] utilized a hyperspectral camera with 60 spectral bands ranging from 369 nm to 1042 nm to capture images of green citrus. Through spectral analysis, they selected specific bands with distinctive features. They then applied a thresholding technique to remove bright and dark backgrounds in the images. Finally, by integrating all the bands, they established a linear discriminant to accurately segment green leaves and green citrus. Wendel [
12] employed a handheld hyperspectral camera to capture images of mangoes in an orchard. They used two different methods, namely, PLS (partial least square) and simple CNN (convolutional neural network), for mango recognition and ripeness estimation, respectively. Despite the ability of the hyperspectral camera to capture extensive color data, its operation is intricate, the settings it operates in are challenging, and the process of acquiring images is highly demanding. The conditions are demanding, and the procedure of acquiring the photograph is time-consuming.
For fruit recognition, three-dimensional picture features are employed as supplementary attributes. Both Tian [
13] and Liu [
14] have exploited depth photographs of fruits and vegetables to aid in fruit recognition. The fruit’s location and center point were identified by rotating the center of the vortex’s gradient vector of the depth image, which was mapped to a 2D space. They subsequently used shape fitting to successfully recognize targets, with a particular focus on round fruits like oranges and apples. Barnea [
15] incorporated the detection of bell peppers to identify various 3D image features, including 3D surface normal features, 3D plane symmetry features, and elliptical surface highlight features. Kusumam [
16] employed a Kinect V2 camera to capture RGBD images of ripe broccoli in the field. They then utilized the VFH (viewpoint feature histogram) feature operator to describe the target and employed an SVM classifier to classify the feature vectors and locate the head of the ripe broccoli. Nyarko [
17] introduced the CTI (convex template instance) operator to represent the 3D shape of fruits for the purpose of identifying convex fruits like tomatoes and apples. The KNN (K-nearest neighbor) feature operator was employed to describe the fruits, and then, the SVM classifier was used to classify the feature vectors, enabling the detection and localization of ripe broccoli heads.
Typically, fruit recognition research primarily concentrates on a single category of picture characteristics to identify the desired object. Nevertheless, many studies have employed more extensive techniques to apply specific visual characteristics to fruit identification. Tao [
18] introduced an algorithm for extracting texture features based entirely on color-related local binary patterns. Through the integration of color and texture characteristics and the utilization of the nearest neighbor classifier, the researchers successfully accomplished fruit and vegetable categorization, resulting in a minimum 5% improvement in recognition accuracy.
Promising results have been achieved in the application of deep learning technology for fruit recognition in complicated situations [
19,
20,
21,
22]. Gené-Mola [
23] utilized a Kinect camera to capture fruit images, employing RGB images, depth images, and reflected signal intensity maps. These images were merged into a 5-channel image and used as input for the Faster RCNN algorithm to identify Red Fuji apples. Koirala [
24] improved the YOLO V3 backbone network by combining the characteristics of the compact YOLO V2 network with its few layers and high speed and the accurate residual structure of the original YOLO V3 network. This modification resulted in enhanced detection speed and accuracy.
Current agricultural mobile robots require uncomplicated algorithms, minimal energy usage, affordability, and a user-friendly operation to enhance their utilization. They must also ensure precise and efficient identification of fruit and vegetable targets [
25]. Deep learning algorithms outperform traditional vision algorithms in recognizing targets in complex environments, but they need extensive data for training. Insufficient data limits their performance. Moreover, training with a large dataset consumes more energy and demands significant computational resources and time, leading to high hardware requirements and increased costs. Deep learning models are often considered “black boxes” due to their complex internal workings and decision-making processes. Additionally, these models may overfit the training data, leading to subpar performance on new or unseen data, and are susceptible to noise or minor variations in the input data.
This paper employs conventional vision algorithms for target recognition based on color and shape features. These algorithms are straightforward to implement and have minimal computational and feature extraction requirements, low hardware costs, high interpretability, and good recognition accuracy with limited data. Additionally, they are energy-efficient, making them well suited for agricultural mobile robots.
2. System Architecture
This study integrates the color characteristics and shape characteristics of the target in order to achieve target recognition. The method follows the precise steps outlined below:
Initially, we employ an industrial camera to gather target photos directly at the greenhouse location and select a substantial quantity of cucumber fruit samples. In order to simplify the process of extracting cucumber form characteristics, the cucumber samples are photographed on white A4 paper to acquire sample photos.
Next, the image data undergo preprocessing. This involves applying image preprocessing techniques to all sample images to extract shape features. Normalized Fourier shape descriptors are then used to perform shape description and reconstruction. This process eliminates shape information that exhibits significant changes in shape features. The result is the template contour, which represents the standard shape of a cucumber fruit. First, the template image of a cucumber is obtained by computing multiple scales and angles. Next, color enhancement is applied to all target images. The target is then segmented using a threshold in the HSV color space, using color features to reduce interference. The resulting segmented image is used to extract edge information using an edge detection algorithm. This edge information is then combined with the template image to perform shape matching using multiple templates.
The target recognition experiments are divided into two parts in the experimental design. First, target images are acquired using various traditional vision algorithms to conduct computer simulation experiments for multi-scale target recognition. The target recognition accuracy and real-time performance of different algorithms are compared, highlighting the superiority of the algorithm proposed in this paper. Subsequently, the proposed algorithm is implemented on hardware equipment for real-time target recognition experiments in greenhouse environments, further demonstrating its superiority and applicability.
Figure 1 illustrates the precise procedure.
6. Conclusions
This work focuses on the difficulty of precisely and efficiently recognizing targets for agricultural harvesting robots. This challenge arises mainly from the presence of backgrounds with similar colors, complicated growing settings, and irregular lighting conditions, all of which make it harder to separate fruit targets from their surroundings. A novel target recognition method is developed to precisely identify fruit targets, utilizing color and shape information. The research subject chosen for this study is cucumber. The method exploits the distinctive form characteristics of the fruit, which are markedly dissimilar to those of the stem and leaves. The study presents a multitude of fruit target picture templates and outlines the process of reconstructing their borders to produce the standard template contour using the normalized Fourier descriptor. Additionally, in order to accommodate the nonuniform growth of cucumber, the scale and rotation angle are computed to enhance the precision of matching. Integrating multi-scale template matching results in an abundance of template data, hence causing an increase in the time required for matching. In order to enhance the accuracy of real-time matching, the image undergoes preprocessing using an algorithm that segments colors based on the HSV color space. This procedure effectively reduces the area to be matched, resulting in improved operational efficiency of the method.
Recognition studies are conducted using 200 photos obtained from the field, and are compared with four classic recognition methods. The method presented in this research demonstrates high accuracy and efficiency in target recognition, particularly in the context of identifying targets within greenhouse environments. The method is straightforward, the necessary hardware resources are cost-effective and adaptable, and it may be effectively utilized in agricultural robots, making a substantial contribution to the field of agricultural robot vision.
Future research should prioritize enhancing the algorithm by transferring and rectifying it on hardware devices, as well as expanding its applicability to the practical domain of fruit recognition.