1. Introduction
Intelligent agriculture has attracted more and more attention around the world. Fruit harvesting robots are being rapidly developed due to their enormous potential efficiency. The first critical step for harvesting robots is detecting fruits autonomously. However, it is very difficult to develop a vision system that is as intelligent as humans for fruit detection. There are many reasons, such as uneven illumination, nonstructural fields, occlusion, and other unpredictable factors [
1].
Intensive efforts have been made in vision system research for harvesting robots. Bulanon et al. [
2] proposed a color-based segmentation method for apple recognition using the luminance and red color difference in the YCbCr model. Mao et al. [
3] used the Drg-Drb color index to segment apples from their surroundings. The L*a*b* color space was employed to extract ripe tomatoes [
4]. These methods use only color features for fruit detection and heavily rely on the effectiveness of the color space used. However, it is difficult to select the best color model for color image segmentation in real cases [
5]. Furthermore, relying only on color features causes the loss of much of the other visual information in the image, which was proven to be very efficient for object recognition [
6].
Kurtulmus et al. [
7] proposed a green citrus detection method for use in natural outdoor conditions by combining Circular Gabor Texture features and Eigen Fruit. They reported a 75.3% accuracy. This method uses several fixed thresholds for detection. A method using feature image fusion was utilized for tomato recognition [
8]. The a*-component image from the L*a*b* color space and the I-component image from the YIQ color space were fused by wavelet transformation, and then an algorithm based on an adaptive threshold was used to implement the detection.
Researchers have attempted to use various of sensors for fruit detection to overcome the problems of illumination variation and occlusion [
9,
10,
11,
12]. To locate cherries on a tree, Tanigaki et al. [
10] used red and infrared laser scanning sensors, which prevented the influence of sunlight. Thermal and visible images were fused to improve the detection of oranges by Bulanon et al. [
11]. Xiang et al. [
12] employed a binocular stereo vision system for tomato recognition, and 87.9% of the tomatoes were recognized correctly. These techniques usually provided better results than conventional methods based on RGB color image. This is mainly due to the fact that similar reflectances in the visible light frequency band may show different results in the non-visible band. Nevertheless, the high cost of the sensors makes such methods difficult to commercialize.
More and more researchers are using machine learning in computer vision tasks, including fruit detection [
1]. Ji et al. [
13] proposed a classification algorithm based on an SVM for apple recognition, and the success rate of recognition reached 89%. An AdaBoost ensemble classifier was combined with Haar-like features and employed for tomato detection in greenhouse scenes [
14]. A color analysis method was used to reduce false detections. Tomato fruits were detected using image analysis and decision tree models, and 80% of the tomatoes were detected [
15]. Kurtulmus et al. [
16] conducted comparison experiments for peach detection in natural illumination with different classifiers including several statistical classifiers, a neural network, and an SVM classifier, which were combined with three image scanning methods. An SVM classifier and a bag-of-words model were used for pepper detection [
17].
The Histograms of Oriented Gradients (HOG) descriptor was proposed for pedestrian detection [
18]. The HOG features behaved better than other features in detecting pedestrians. Motivated by the HOG features and machine learning methods, the goal of this study is to develop an approach to detect mature tomatoes in regular color images by combining an SVM classifier [
19] and the HOG features. This study extends previous work [
20]. Firstly, all the datasets are preprocessed through an illumination enhancement method. Then, the HOG features extracted from the training sets are used to train the SVM. In the detection stage, a coarse-to-fine scanning method is proposed to detect tomatoes in the entire image with different resolutions. Next, a False Color Removal (FCR) method is used to eliminate the false positive results. Finally, the Non-Maximum Suppression (NMS) method is applied to merge the overlapped detections.
The remainder of this paper is organized as follows.
Section 2 presents the theoretical background.
Section 3 describes the proposed tomato detection methods.
Section 4 discusses the experimental results, and
Section 5 presents the conclusions.
3. Materials and Methods
3.1. Image Acquisition and Preprocessing
To develop and evaluate the proposed algorithm, images of tomatoes in a greenhouse were acquired in late December 2017 and April 2019 in Vegetable High-tech Demonstration Park, Shouguang, China. A total of 247 images were captured using a color digital camera (Sony DSC-W170) with a resolution of 3648 × 2056 pixels. The photographs were taken at distances of 500–1000 mm, which is in accordance with the best operation distance for the harvesting robot. As shown in
Figure 3, the growing circumstances of the tomatoes vary and include separated tomatoes; multiple overlapped tomatoes; and tomatoes occulted by leaves, stems, or other non-tomato objects. To speed up the image processing, all of the images were resized to 360 × 202 pixels using a bicubic interpolation algorithm. The dataset has been made publicly available [
23].
An illumination enhancement method was used to decrease the effect of uneven illuminations. The image was first converted from RGB space to Hue-Saturation-Intensity (HSI) space. The I layer was then split, and a natural logarithm function was applied to each pixel. Next, the Contrast Limited Adaptive Histogram Equalization (CLAHE) method [
24] was applied to the transformed I component. Finally, the H, S, and processed I layers were combined to obtain the final enhanced image. This procedure was performed on all the images as a preprocessing step before training the classifier. An example of image enhancement is shown in
Figure 4.
3.2. The Dataset
A total of 247 images were used for the experiment. To train the SVM classifier, 100 images were randomly selected from the captured images, 72 images were used for validation set, and the remaining 75 images were used for the test. From the training images, 207 tomato samples and 621 background samples were manually cropped to construct a training set. The training samples were augmented with random rotations of 0°–360°. This doubles the size of the training set (1656 samples in all). All of the cropped samples were resized to 64 × 64 pixels to unify the size. The tomato samples contained a margin of about 5 pixels on all the sides. The background samples were randomly cropped to contain leaves, stems, strings, and other objects, and all the samples were separately labeled, 1 for the tomatoes and
for the backgrounds. Some examples for the datasets are shown in
Figure 5.
3.3. Overview of the Detection Algorithm
Figure 6 and
Figure 7 show a systematic view and flowchart of the developed algorithm. The process can be summarized in the following steps:
- (1)
Extracting the HOG features of the training samples
- (2)
Training an SVM classifier using the extracted features and corresponding labels
- (3)
Extracting the Region-of-Interest (ROI) on the test image using a pretrained Naive Bayes classifier
- (4)
Sliding a sub-window on the ROI of the image with different resolutions using an image pyramid
- (5)
Extracting the HOG features of each sub-window
- (6)
Recognizing tomatoes within the pretrained classifier
- (7)
Performing FCR to remove any false positive detections
- (8)
Merging the detection results using the NMS method
3.4. Image Scanning Method
After training the SVM classifier using the training set, a coarse-to-fine detection framework is used to detect tomatoes. The pseudo code and detailed detection process are described in Algorithm 1.
All the pixels are classified as belonging to tomatoes or the background using a Naïve Bayes classifier (NB) trained on color features. Since mature tomatoes are red, three color transformations are performed to distinguish the fruits from background: , , and . After classification, a binary image is obtained, in which white pixels represent the potential tomatoes and black pixels represent the potential background.
Algorithm 1: The pseudo code of the scanning method. |
![Sensors 19 02023 i001]() |
Next, a morphological processing is applied to the binary image, and the Region-of-Interest (ROI) is extracted. A sliding window is applied to the ROI and slides with a fixed step. At each step, the sub-window is input to the pretrained SVM classifier to be classified as a tomato or not a tomato. If the sub-window is classified as a tomato, then FCR is used to implement further classification. After the sliding window slides all over the ROI, the image is downscaled by a fixed scaling factor, followed by the same sliding process until a defined minimum size is reached. The sliding window size is
based on the size of the tomatoes in the images. The sliding step and minimum size of the scaled image are set to 16 and
, respectively. The image scaling factor is 1.1, which downscales the image by 10% at each step. A sketch map of the sliding window and image pyramid is shown in
Figure 8.
3.5. False Color Removal
All sub-windows of the image could be classified using the SVM classifier. However, there are some false positive detections after the classification, and a false positive elimination method is needed to reduce them. Color features play an important role in fruit detection, especially when the fruits have a different color from the background. A False Color Removal (FCR) method is proposed for false detection elimination. The sub-window image is binarized using a color feature which is derived as shown below, and then, the ratio of the number of white pixels to the number of all pixels in the sub-window is calculated. If the ratio exceeds a threshold of 0.3, the sub-window is classified as a tomato. Otherwise, it is classified as the background.
The cost function minimization [
19] was applied as follows to obtain the color feature for binarization. A total of 897 samples including tomatoes and the background were chosen as the training set. The R, G, and B components of the RGB color model were extracted, and the mean value of each component over all the pixels of each sample was calculated to represent the sample. The tomato samples were labeled as 1, and the background samples were labeled as
. Motivated by Cortes [
19], a separating plane in Equation (
2) is needed to separate tomatoes and background in the R-G-B coordinates:
where
is the feature vector
.
and
b are the weight vector and bias of the separating plane, respectively.
It is derived by minimizing the cost function
L in Equation (
3):
where
and
are the feature vector
and label of the sample
i, respectively.
M is the number of samples, and
is the slack variable of sample
i, which is used to deal with the outliers.
The color feature derived for sub-window binarization is , and the threshold is 0.
3.6. Experimental Setup
In this study, all experiments of the developed algorithm were performed on Python version 3.5 with an Intel
® Core
TM i5-4590 CPU@3.30 GHz. Several experiments were conducted to validate the performance of the developed method. The datasets used in the experiments are listed in
Table 1. Some examples of the results in each step are shown in
Section 4.2,
Section 4.3,
Section 4.4 and
Section 4.5. Three indexes were used to evaluate the performance of the proposed algorithm and recently developed algorithms: recall, precision, and F
1 score, which are defined by Equations (
4)–(
6):
5. Conclusions and Future Work
An algorithm was proposed to overcome the difficulties that harvesting robots face in fruit detection. The method used color images captured by a regular color camera. Compared with single-feature detection methods, the proposed method used a combination of features for fruit detection, including shape, texture, and color information. This approach can reduce the influence of illumination and occlusion factors. HOG descriptors were adopted in this work. An SVM classifier was used to implement the classification task. In the scanning stage, a coarse-to-fine framework was applied, and then, an FCR method was used to eliminate the false positives. Lastly, NMS was used to obtain the final results.
Several experiments were conducted to evaluate the efficiency of the proposed method. A total of 510 samples were used to validate the classification efficiency of the SVM classifier. The recall was 96.85%, and the precision was 98.40%. The results showed that the classifier with only HOG features can distinguish tomatoes from backgrounds very well. When it comes to detection, the correct identification rate is 90.67% in sunny conditions and 89.33% in shaded conditions. Similar results showed that the proposed method could reduce the influence of various illumination levels in the greenhouse environment. The correct identification rate was 100% for separated tomatoes, 91.14% for overlapped tomatoes, and 84.00% for occluded tomatoes, and a reasonable false positive rate was maintained. The missed tomatoes were mainly due to the area largely being blocked by other tomatoes or the background by over 50%. If the blocked area was less than 50%, most of the tomatoes could be detected correctly. Compared with other methods, the proposed method gave the best results. As a reference, the average processing time of one image was about 0.95 s.
However, there are still some problems in the proposed method. The accuracy is not satisfactory for the overlapped and occluded tomatoes, especially when the blocked area exceeds 50%. Another limitation is that the experiment was carried out in the harvesting stage. Therefore, most of the tomatoes of the experiment were ripen well and fully red. The authors believe that the detection of tomatoes at other stages including green and breaking red is also needed for the harvesting robot. Our future research will focus on further improving the detection accuracy and extension to other stages of tomatoes. Transfer learning [
29,
30] can also be applied with an extension of the datasets in the future.