Tender Leaf Identification for Early-Spring Green Tea Based on Semi-Supervised Learning and Image Processing

Yang, Jie; Chen, Yong

doi:10.3390/agronomy12081958

Open AccessEditor’s ChoiceArticle

Tender Leaf Identification for Early-Spring Green Tea Based on Semi-Supervised Learning and Image Processing

by

Jie Yang

and

Yong Chen

^*

College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China

^*

Author to whom correspondence should be addressed.

Agronomy 2022, 12(8), 1958; https://doi.org/10.3390/agronomy12081958

Submission received: 24 July 2022 / Revised: 8 August 2022 / Accepted: 18 August 2022 / Published: 19 August 2022

(This article belongs to the Special Issue Advanced Technologies in Precision Processing and Modeling in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Tea is one of the most common beverages in the world. Automated machinery that is suitable for plucking high-quality green tea is necessary for tea plantations and the identification of tender leaves is one of the key techniques. In this paper, we proposed a method that combines semi-supervised learning and image processing to identify tender leaves. Both in two-dimensional and three-dimensional space, the three R, G, and B components of tender leaves and their backgrounds were trained and tested. The gradient-descent method and the Adam algorithm were used to optimize the objective function, respectively. The results show that the average accuracy of tender leaf identification is 92.62% and the average misjudgment rate is 18.86%. Our experiments have shown that green tea tender leaves in early spring can be identified effectively using the model based on semi-supervised learning, which has strong versatility and perfect adaptability, so as to improve the problem of deep learning requiring a large number of labeled samples.

Keywords:

early-spring green tea; tender leaves; semi-supervised learning; image processing

1. Introduction

Tea is one of the most common beverages in the world [1]. Among them, green tea has the largest production, export, and consumption rates in China. Tea can be harvested in spring, summer, and autumn. Finished tea made from tender leaves and plucked in early spring (Figure 1) has the best quality and the highest economic value [2]. At present, such tender leaves can only be plucked manually across the world. However, hand-plucking tea is very laborious and time-consuming. Therefore, automated machinery that is efficient and suitable for plucking high-quality green tea is necessary for tea plantations. The intelligent identification of tender leaves is one of the key techniques.

Research on the image processing of green tea has already been carried out, mainly for the identification of tender leaves. Wang et al. [3] divided tea shoots in images based on their color and regional growth, extracted the edges of the tea shoots with the help of edge detection, and performed three-dimensional modeling of the tea shoots. Tang et al. [4] proposed a texture-extraction method combining a non-overlap window local binary pattern (LBP) and a Gray-Level Co-occurrence Matrix (GLCM) for the classification of tea leaves. Chen et al. [5] used machine vision to extract color features to recognize tea shoots under natural conditions. Zhang et al. [6] used an improved BG algorithm, the median filter algorithm, and the Otsu algorithm for image processing and the Bayesian discriminant principle was used to establish a model for fresh tea collection status. Mukhopadhyay et al. [7] proposed a new novel approach to detecting tea leaf diseases based on image processing technology.

In recent years, machine learning has shown excellent performance for identifying objects on complex backgrounds [8,9]. Verbraken et al. [10] investigated the predictive power of a number of Bayesian Network algorithms, ranging from the Naive Bayes classifier to General Bayesian Network classifiers. Yamamoto et al. [11] proposed a new method using image analysis and machine learning methods to detect and count immature and ripe tomato fruits on tomato trees. Tripathi et al. [12] summarized that the most popular methods are the use of machine learning, image processing, and classification-based methods to identify and detect diseases in agricultural products. Zhao et al. [13] used image processing and an AdaBoost classifier to identify tomatoes and an average pixel value (APV)-based color analysis approach to improve the accuracy of detection. Ki et al. [14] used image processing and deep learning technology to automatically classify vegetable grades. To classify cucumbers into three classes, SVM, CNN, and VGGNet were used. Hu et al. [15] used the support vector machine (SVM) method to segment the disease spots on tea leaf images by extracting color and texture features, and a VGG16 deep learning model was trained to identify diseases of tea. Nyalala et al. [16] acquired depth images of tomatoes from different directions to extract features and developed five regression models to predict the quality and volume of cherry tomatoes based on 2D and 3D image features. Sun et al. [17] proposed a new algorithm for combining SLIC (simple linear iterative clustering) with an SVM (support vector machine) to improve the extraction of a tea leaf disease saliency map with complex backgrounds. Xie et al. [18] proposed a well-designed network structure for single-image super-resolution based on deep learning considering the merits of convolutional sparse coding (CSC) and deep convolutional neural networks (CNN). Fan et al. [19] proposed a deep learning architecture based on convolutional neural networks (CNN) and a cost-effective computer-vision module to detect defective apples. Osako et al. [20] evaluated litchi fruit shapes using elliptic Fourier descriptors and fine-tuned a pre-trained VGG16 to a cultivar discrimination model. Duong et al. [21] adopted EfficientNet and MixNet, two families of deep neural network, to build an expert system that can accurately and swiftly identify fruits.

Deep convolutional neural networks (DCNNs) have succeeded in various applications [22,23]. For example, recent studies have shown that deep learning can be used to diagnose coronavirus disease [24], help predict seizure recurrence [25], perform high-accuracy three-dimensional optical measurement [26], predict the activity of potential drug molecules [27], analyze particle accelerator data [28,29], detect defects in industry wood veneer [30], achieve the efficient classification of green plum detects [31], detect and discriminate weeds growing in turfgrass [32], and reconstruct brain circuits [33].

However, the above methods of tender leaf identification from image processing are suitable for single species under controlled conditions. The background of tea plantations is complex and there are some uncontrollable factors, including light intensity, the varieties of tea, leaf reflective areas, etc. Deep learning needs to rely on a large number of datasets for training and the tender leaves in an image are small and numerous, which is a challenge for image annotation. In addition, deep learning requires a large amount of calculation and high-performance computing equipment for support. Therefore, these methods may not be the best. In this paper, we proposed a method that combines semi-supervised learning and image processing to identify tender leaves in early spring of different varieties, different regions, different light intensities, etc., which has strong versatility and perfect adaptability.

2. Materials and Methods

2.1. Image Acquisition

Aiming to identify the tender leaves of high-quality green tea in early spring, this paper selected two varieties of green tea, which were Longjing and Yuhua tea. Images of green tea were acquired multiple times during April 2019, March 2020, and April 2022 using a digital camera (EOS Rebel T7i, Canon, Tokyo, Japan). The images were taken in tea plantations in Nanjing, Jiangsu, China (32°06′ N, 118°85′ E), Changzhou, Jiangsu, China (31°78′ N, 119°42′ E) and Hangzhou, Zhejiang, China (30°42′ N, 120°29′ E). Both the training and the testing samples were taken from Longjing tea and Yuhua tea, including those grown in strong and weak light conditions.

2.2. Training and Testing

2.2.1. Data Acquisition

The original image could be divided into two parts, which were the tender leaves and the backgrounds. There were a total of 453 images of Yuhua tea and Longjing tea, each with multiple tender leaves. The background area mainly included old leaves, leaf reflective areas, leaf shadow areas, etc. Before categorizing the tender leaves and the backgrounds, the images were cropped to 600 × 400 pixels to reduce the time taken to identify the tender leaves. We selected 5 × 5 tender leaf areas (shown in the green box in Figure 2) and 5 × 5 background areas (shown in the black box in Figure 2) in these images randomly.

The images were saved in RGB format as the default. A 5 × 5 tender leaf area or background area consisting of 25 pixels and the above 906 images were converted into data in .csv format. There were 22,650 pixels in total, half of which were the tender leaves or the backgrounds. A total of 11,325 tender leaf pixels and background pixels were selected for the training dataset. The remaining 11,325 tender leaf pixels and background pixels were used as the testing dataset. The tender leaves were set as 0 and the backgrounds were set as 1, which converted the identification of tender leaves into the classification of tender leaves and backgrounds.

2.2.2. Training and Testing in Two-Dimensional Space

This paper was based on the TensorFlow GPU version 2.1 built under Anaconda; the GPU was Nvidia GeForce GTX 1060 and the computing power was 6.1. The Nvidia runtime libraries corresponding to TensorFlow 2.1 were CUDA10.1 and cuDNN for CUDA 10.1.

In a two-dimensional space, the three components R, G, and B were combined in pairs, which were R-G, R-B, and G-B, to find out the objective function used to segment tender leaves and backgrounds.

The loss function and objective function optimization algorithms are two important concepts in semi-supervised learning. The focus of the loss function is to summarize the optimal law in the overall average sense through limited training data and to use a limited sample to approximate the global optimal probability distribution. The function of the loss function is to measure the degree of approximation. Therefore, an excellent loss function not only defines a clear quantitative index for the question, but also improves the training optimization speed.

The goal of semi-supervised learning is to have a small generalization error that means the loss becomes smaller and smaller. The two components of R-G are taken as an example; the objective function is defined as

z = w_{0} + w_{1} R + w_{2} G = 0

. If z > 0, the pixel composed of R-G components belongs to the tender leaf part; if z < 0, it is the background part. The Sigmoid cross-entropy loss function was used in this article. It is a sigmoid function with a value range of (0,1), as shown in Equation (1):

y = \frac{1}{1 + e^{- z}}

(1)

The average cross-entropy loss function is:

Loss = - \frac{1}{n} \sum_{i = 1}^{n} [y_{i} l n \hat{y_{i}} + (1 - y_{i}) \ln (1 - \hat{y_{i}})]

(2)

In the above Equation, i: the index of the sample; n: the total number of samples;

y_{i}

: the label value of the i-th sample, 0, or 1; and

\hat{y_{i}}

: the predicted probability of the i-th sample.

Binary classification problems can be transformed into binary logistic regression problems. The Equation is

y = σ (z) = σ (w_{0} + w_{1} R + w_{2} G)

. If y < 0.5, it is the tender leaf part; if y > 0.5, it is the background part, as shown in Figure 3.

On the other hand, the optimization problem of the objective function is also one of the keys to semi-supervised learning. In research and practical applications, in addition to the most commonly used stochastic gradient-descent method [34], the momentum optimization algorithm [35] and the Adam algorithm [36] have also been proposed. These algorithm strategies not only speed up the solution, but also reduce the influence of the hyperparameters (such as learning rate) on the solution process and simplify the training process. The gradient-descent algorithm was used in this section as follows:

Step 1: Load the data and process the data, extract the attributes and tags, default the tender leaves to 0 and the backgrounds to 1, and visualize the samples. Normalization during data processing can put all the attributes in the same range and the same order of magnitude, and can converge to the optimal solution faster to improve the accuracy of the learner. Linear normalization is used to map all the data to [0, 1], as shown in Equation (3).

x^{*} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(3)

Here, x is the sample value,

x_{m i n}

is the minimum value in the sample,

x_{m a x}

is the maximum value in the sample, and

x^{*}

is the sample value after scaling.

Step 2: Set the hyperparameters and the initial value of the model parameters: learning rate η = 0.2, and number of training iterations k = 1000. The settings of the initial value W = (

w_{0}, w_{1}, w_{2}

) are used from the random number module of the Numpy database np.random.randn

(d_{0}, d_{1}, \cdot \cdot \cdot, d_{n})

, which can generate an array of standard normal distributions.

Step 3: Train the model and update the model parameters. The model parameter update algorithm is the gradient-descent algorithm:

W^{(k + 1)} = W^{(k)} - l e a r n_r a t e \times \frac{\partial L o s s (W)}{\partial W}

(4)

Here, k is the number of iterations, and

\frac{\partial L o s s (W)}{\partial W}

is the derivative of loss to W, which is the gradient.

Step 4: Visualize. Draw the loss rate, the accuracy rate change curve, and the decision boundary.

a c c u r a c y = \frac{n u m b e r o f c o r r e c t l y c l a s s i f i e d s a m p l e s}{t o t a l n u m b e r o f s a m p l e s}

(5)

Step 5: Use the testing dataset to load the data, use the model, and obtain the output of the results.

Similarly, the relationship between the two components of R-B and G-B can be solved as described above.

2.2.3. Training and Testing in Three-Dimensional Space

In the three-dimensional space, the three components of R, G, and B were combined to find the hyperplane

Z = w_{0} + w_{1} R + w_{2} G + w_{3} B = 0

to divide the tender leaves and the backgrounds. The loss function shown in Formula (2) was used. From the gradient-descent algorithm in Section 2.2.2, it can be seen that the selection of the learning rate plays a key role in the effect of the algorithm. On the one hand, if the learning rate is too large, the value of loss may fluctuate in a wave shape and cannot converge; on the other hand, if the learning rate is too small, the weight parameter W will move very slowly and the convergence process will be very time-consuming. The stochastic gradient-descent algorithm is simple and has a fast convergence speed, but it has the disadvantages of easily falling into the local optimum and difficulty in obtaining the optimal solution. There are two reasons. The same learning rate is used for the gradient update on the model parameters and the learning rate parameter is completely input by humans. Therefore, in the three-dimensional space, the Adam optimization algorithm was used, which used the number of iterations as parameters to correct the gradient mean and the gradient mean variance. It can automatically adjust the learning rate, so the prediction of gradient changes may be more accurate.

The binary classification problem was transformed into a ternary logistic regression problem,

y = σ (z) = σ (w_{0} + w_{1} R + w_{2} G + w_{3} B)

, as shown in Figure 4.

The specific steps were similar to Figure 4. The Adam optimization algorithm was used to optimize the objective function, so steps 2 and 3 were appropriately adjusted. The following is a detailed description:

Step 2: Set the hyperparameters and the initial values of the model parameters: learning rate α = 0.09, the threshold to stop iteration = 0.0001, the number of iterations t = 4000, the default value

β_{1} = 0.9

,

β_{2} = 0.999

,

ε = 10^{- 8}

; the initial values of

m_{0}

and

v_{0}

are set to 0, and the initial value of W = (

w_{0}, w_{1}, w_{2}, w_{3}

) also uses the random number module of the Numpy database np.random.randn(

d_{0}, d_{1}, \cdot \cdot \cdot, d_{n})

.

Step 3: Train the model and update the model parameters.

g_{t} = \frac{\partial L o s s (W^{(t - 1)})}{\partial W^{(t - 1)}}

(Obtain the gradient of the loss function at the weight parameter

W^{(t - 1)}

.)

m_{t} = β_{1} \cdot m_{t - 1} + (1 - β_{1}) \cdot g_{t}

(Estimate the gradient mean.)

v_{t} = β_{2} \cdot v_{t - 1} + (1 - β_{2}) \cdot g_{t}^{2}

(Estimate the mean squared gradient;

g_{t}^{2}

represents the square of the vector element

g_{t}

.)

\hat{m_{t}} = \frac{m_{t}}{1 - β_{1}^{t}}

(Consider the number of iterations to correct the gradient mean;

β_{1}^{t}

represents

β_{1}

to the power of t.)

\hat{v_{t}} = \frac{v_{t}}{1 - β_{2}^{t}}

(Consider the number of iterations to correct the mean square of the gradient;

β_{2}^{t}

represents

β_{2}

to the power of t.)

W_{t} = W_{t - 1} - \frac{α \cdot \hat{m_{t}}}{\sqrt{\hat{v_{t}}} + ε}

(Update the weight parameters.)

The other steps are the same as those in the two-dimensional space, so we do not elaborate on them here.

2.3. Image Processing

In the plucking process, according to the plucking standards of high-quality green tea, only the tender leaves that meet the plucking requirements need to be retained and small tender leaves can be temporarily ignored. In addition, taking the influence of external factors such as light intensity into consideration, the color of old leaves and the reflective area of the leaves may be similar to the color of the tender leaves. It is necessary to filter out the small areas of the unopened buds and uninteresting areas in the image processed by the above model. So, we used a combination of median filtering and area filtering to filter and denoise this section. Finally, a circumscribed rectangle was used to select the connected area. Recognition accuracy is a measure of the accuracy of identifying tender leaves, through the above model and the filtering algorithm, in tea images taken under arbitrary conditions. The identification was determined by predicting the ratio of the identified tender leaf bounding box

p_{t}

and the actual tender leaf bounding box

a_{t}

, which was calculated using Equation (6):

identification precision = \frac{p_{t}}{a_{t}}

(6)

The misjudgment rate is a measure of the background of a tender leaf in a tea image, taken under any conditions, through the above model and filtering algorithm. The false-positive rate was determined by the ratio of the false predicted new-shoot bounding box

f_{t}

and the actual-shoot bounding box

a_{t}

, which was calculated using Equation (7):

false positive rate = \frac{f_{t}}{a_{t}}

(7)

3. Results and Discussion

3.1. Visualization and Objective Function in Two-Dimensional Space

Based on semi-supervised learning and the stochastic gradient-descent algorithm, the first step was to process the two components of R-G (Figure 5). The distribution of the R-G components of the tender leaf part and the background part from the training dataset and the testing dataset are shown. The green dots are the tender leaves and the black dots are the backgrounds. In order to converge to the optimal solution faster and improve the accuracy of the learner, the original data were linearly normalized and all the data were mapped to [0, 1]; the value range of the horizontal and vertical coordinates was [0, 1]. As the model parameters were updated, the accuracy and loss rate of the R-G component in the logistic regression continued to update (Table 1). As the number of iterations increased, the accuracy was infinitely close to or reached 100% and the loss rate gradually approached 0 or 10%. The objective function of the R-G component in the training dataset and the testing dataset are shown in Figure 5 as a red line. The segmentation results of the R-G component of the training dataset and the testing dataset were successful. There was only one wrong segmentation point; two points were on the red line and the others were segmented accurately, as shown in Figure 5b. The output model parameters were

w_{0}

=

4.1126776

,

w_{1}

=

- 5.5131683

, and

w_{2}

=

- 4.7532825

, and the objective function of the model was

z = w_{0} + w_{1} R + w_{2} G = 4.1126776 - 5.5131683 R - 4.7532825 G = 0

. It can be seen from the formula that the R and G components have a negative correlation with the object function, and the weights of the R components are larger.

Next, the two components of R-B were processed (Figure 6). The distribution of R-B components of the tender leaf part and the background part in the training dataset and testing dataset are shown. Similarly to above, the horizontal and vertical coordinates were [0, 1]. The accuracy and loss rate of the R-B component in the logistic regression are shown in Table 2. It can be seen from the figure that the accuracy reached 100% and the loss rate was infinitely close to 0 as the number of iterations increased. The object functions of the R-B component in the training dataset and the testing dataset are shown in Figure 6 as a red line. The segmentation result of the R-B component of the training dataset and the testing dataset were perfect and there was no point of wrong segmentation. The final output model parameters were

w_{0}

=

- 0.23365301

,

w_{1}

=

- 5.9260917

, and

w_{2}

=

4.9247804

, and the objective function was

z = w_{0} + w_{1} R + w_{2} B = - 0.23365301 - 5.9260917 R + 4.9247804 B = 0

. It can be seen from the formula that the R components have a negative correlation and the B components have a positive correlation with the object function, and the weights of the R components are larger.

Finally, the two components of G-B were processed (Figure 7). The G-B components of the tender leaf part and the background part in the training dataset and testing dataset are shown. Similarly to above, the horizontal and vertical coordinates were [0, 1]. The accuracy and loss rate of the G-B component in the logistic regression are shown in Table 3. The accuracy reached 100% and the loss rate was infinitely close to 0 as the number of iterations increased. The objective function of the G-B component in the training dataset and the testing dataset are shown in Figure 8 as a red line. The segmentation result of the R-B component of the training dataset and the testing dataset was perfect and there was no point of wrong segmentation. The final output model parameters were

w_{0}

=

- 0.18182087

,

w_{1}

=

- 5.9490047

, and

w_{2}

=

4.8426056

and the objective function was

z = w_{0} + w_{1} G + w_{2} B = - 0.18182087 - 5.9490047 G + 4.8426056 B = 0

. It can be seen from the formula that the G components have a negative correlation and the B components have a positive correlation with the object function, and the weights of the G components are larger.

The three components were combined into two pairs and processed in two-dimensional space. Compared with R-G, R-B and G-B had a better segmentation result. The components of R and G have the same correlation with the object function, and the components of R-B and G-B are the opposite. This may lead to the differences in the results. Overall, the segmentation result was perfect. Then, we used a similar method to see the segmentation result as a whole in three-dimensional space.

3.2. Visualization and Objective Function in Three-Dimensional Space

In the three-dimensional space, based on semi-supervised learning and Adam algorithms, the three components of R-G-B were processed (Figure 8). The distribution of the R-G-B components of the tender leaf part and background part in the training dataset and testing dataset are shown. Intuitively, there was not much overlap between the two parts. The background part mainly occupied the middle and upper-left corner area and the tender leaf part mainly occupied the lower-right corner area. The accuracy and loss rate of the R-G-B component in the logistic regression as the model parameters were updated are shown in Table 4. The accuracy was infinitely close to 96% and the loss rate was close to 0 or 18%. The decision boundary of the R-G-B component in the training dataset and the testing dataset are shown in Figure 8 as a red hyperplane, respectively. The segmentation result of the testing dataset was better than that of the training dataset. The final output model parameters were

w_{0}

=

272.85028

,

w_{1}

=

- 33.024113

,

w_{2}

=

12.006566

, and

w_{3}

=

23.403887

, and the objective function of the model was

Z = w_{0} + w_{1} R + w_{2} G + w_{3} B = 272.85028 - 33.024113 R + 12.006566 G + 23.403887 B = 0

. In this three-dimensional model, the greatest model weights were R, B, and G in that order, with R standing out.

3.3. Identification Result on Tender Leaves

Based on the model of the decision boundary function

Z = 272.85028 - 33.024113 R + 12.006566 G + 23.403887 B = 0

, the Sigmoid function was used as the classifier, and the images of Longjing tea and Yuhua tea mentioned in Section 2.1 were processed; these were taken at different time periods under different light intensities.

Sample images of the predicted and actual tender leaf bounding box images generated by the model are shown in Figure 9. An image of Longjing tea tree taken in the morning in late March under strong light conditions is shown in Figure 9a. Compared with the actual tender leaves, the predicted tender leaves were identified accurately and ensured the integrity of the tender leaf. An image of Yuhua tea tree taken in the afternoon of early April was also under strong light (Figure 9c). Because of the inclination of the camera angle, the image showed that the tender leaves of the parts to be plucked were completely predicted. Because of the strong light exposure, the leaves were automatically divided into the background part. As shown in the sample images, under the condition of a complex background, the strong light had little influence on the result. An image of Longjing tea tree taken under weak light conditions is shown in Figure 9b. Additionally, an image of Yuhua tea tree taken under weak light conditions is shown in Figure 9d. There were old leaf edges that were similar in color to the tender leaves. However, overall, the parts of the tender leaves to be plucked were accurately predicted and the model ensured the integrity of the tender leaves.

After 453 images were processed by the model, the average accuracy was 92.62%, and the average misjudgment rate was 18.86%. In summary, this model, which has strong versatility and perfect adaptability, can realize the identification of the early-spring green tea tender leaves of different varieties, of different regions, grown under different light intensities, and at different growth stages.

4. Conclusions

In this paper, semi-supervised learning is used, which greatly alleviates the problem of large, labeled samples in deep learning. Because of the background complexity and uncertainty in the actual scene, the fixed threshold and the classic adaptive threshold method are not good for segmentation. So, we proposed a method that combines semi-supervised learning and image processing to identify tender leaves. Based on the objective function of the model

Z = 272.85028 - 33.024113 R + 12.006566 G + 23.403887 B = 0

, the Sigmoid function was used as the classifier. The early-spring green tea tender leaves of different varieties, from different regions, grown under different light intensities, and at different growth stages could be accurately identified, with an average identification accuracy of 92.62%. For images of tea tree with relatively complex backgrounds, this model can also accurately identify tender leaves with an average misjudgment rate of 18.86%. The parts that were wrongly identified as tender leaves were mostly the edges of old leaves and some old leaves similar in color to the tender leaves. Overall, the model we proposed, which has strong versatility and perfect adaptability, is suitable for the identification of green tea tender leaves in early spring of different varieties, from different regions, grown under different light intensities, and at different growth stages.

Author Contributions

Conceptualization and methodology, J.Y.; software, J.Y; validation and formal analysis, J.Y.; investigation, resources, and data, J.Y.; writing—original draft preparation, J.Y.; writing—review and editing, Y.C.; supervision, project administration, and funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 32072498), the Key Research and Development Program of Jiangsu Province (Grant No. BE2021016) and the Jiangsu Agricultural Science and Technology Innovation Fund (Grant No. CX(21)3184).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study did not report any data.

Acknowledgments

We are thankful to the National Natural Science Foundation of China, the Key Research and Development Program of Jiangsu Province, and the Jiangsu Agricultural Science and Technology Innovation Fund.

Conflicts of Interest

The authors declare no conflict of interest.

References

Graham, H.N. Green tea composition, consumption, and polyphenol chemistry. Prev. Med. 1992, 21, 334–350. [Google Scholar] [CrossRef]
Liu, J.; Zhang, Q.; Liu, M.; Ma, L.; Shi, Y.; Ruan, J. Metabolomic Analyses Reveal Distinct Change of Metabolites and Quality of Green Tea during the Short Duration of a Single Spring Season. J. Agric. Food Chem. 2016, 64, 3302–3309. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zeng, X.; Liu, J. Three-Dimensional Modeling of Tea-Shoots Using Images and Models. Sensors 2011, 11, 3803–3815. [Google Scholar] [CrossRef] [PubMed]
Tang, Z.; Su, Y.; Er, M.J.; Qi, F.; Zhang, L.; Zhou, J. A local binary pattern based texture descriptors for classification of tea leaves. Neurocomputing 2015, 168, 1011–1023. [Google Scholar] [CrossRef]
Chen, J.; Chen, Y.; Jin, X.; Che, J.; Gao, F.; Li, N. Research on a Parallel Robot for Green Tea Flushes Plucking. In Proceedings of the 2015 International Conference on Education, Management, Information and Medicine, Shenyang, China, 24–26 April 2015; pp. 22–26. [Google Scholar]
Zhang, L.; Zhang, H.; Chen, Y.; Dai, S.; Li, X.; Imou, K.; Liu, Z.; Li, M. Real-time monitoring of optimum timing for harvesting fresh tea leaves based on machine vision. Int. J. Agric. Biol. Eng. 2019, 12, 6–9. [Google Scholar] [CrossRef]
Mukhopadhyay, S.; Paul, M.; Pal, R.; De, D. Tea leaf disease detection using multi-objective image segmentation. Multimed. Tools Appl. 2021, 80, 753–771. [Google Scholar] [CrossRef]
De Marsico, M.; Petrosino, A.; Ricciardi, S. Iris recognition through machine learning techniques: A survey. Pattern Recognit. Lett. 2016, 82, 106–115. [Google Scholar] [CrossRef]
Suto, J. Plant leaf recognition with shallow and deep learning: A comprehensive study. Intell. Data Anal. 2020, 24, 1311–1328. [Google Scholar] [CrossRef]
Verbraken, T.; Verbeke, W.; Baesens, B. Profit optimizing customer churn prediction with Bayesian network classifiers. Intell. Data Anal. 2012, 18, 3–24. [Google Scholar] [CrossRef]
Yamamoto, K.; Yoshioka, Y.; Ninomiya, S.; Culsp. Detection and counting of intact tomato fruits on tree using image analysis and machine learning methods. In Proceedings of the 5th International Conference, TAE 2013: Trends in Agricultural Engineering 2013, Prague, Czech Republic, 2–3 September 2013.
Tripathi, M.K.; Maktedar, D.D. Recent machine learning based approaches for disease detection and classification of agricultural products. In Proceedings of the 2016 International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 12–13 August 2016; pp. 1–6. [Google Scholar]
Zhao, Y.; Gong, L.; Zhou, B.; Huang, Y.; Liu, C. Detecting tomatoes in greenhouse scenes by combining AdaBoost classifier and colour analysis. Biosyst. Eng. 2016, 148, 127–137. [Google Scholar] [CrossRef]
Kim, J.; Cho, W.; Na, M.; Chun, M. Development of Automatic Classification System of Vegetables by Image Processing and Deep Learning. J. Korean Data Anal. Soc. 2019, 21, 63–73. [Google Scholar] [CrossRef]
Hu, G.; Wu, H.; Zhang, Y.; Wan, M. A low shot learning method for tea leaf’s disease identification. Comput. Electron. Agric. 2019, 163, 104852. [Google Scholar] [CrossRef]
Nyalala, I.; Okinda, C.; Nyalala, L.; Makange, N.; Chao, Q.; Chao, L.; Yousaf, K.; Chen, K. Tomato volume and mass estimation using computer vision and machine learning algorithms: Cherry tomato model. J. Food Eng. 2019, 263, 288–298. [Google Scholar] [CrossRef]
Sun, Y.; Jiang, Z.; Zhang, L.; Dong, W.; Rao, Y. SLIC_SVM based leaf diseases saliency map extraction of tea plant. Comput. Electron. Agric. 2019, 157, 102–109. [Google Scholar] [CrossRef]
Xie, C.; Liu, Y.; Zeng, W.L.; Lu, X.B. An improved method for single image super-resolution based on deep learning. Signal Image Video P. 2019, 13, 557–565. [Google Scholar] [CrossRef]
Fan, S.; Li, J.; Zhang, Y.; Tian, X.; Wang, Q.; He, X.; Zhang, C.; Huang, W. On line detection of defective apples using computer vision system combined with deep learning methods. J. Food Eng. 2020, 286, 110102. [Google Scholar] [CrossRef]
Osako, Y.; Yamane, H.; Lin, S.-Y.; Chen, P.-A.; Tao, R. Cultivar discrimination of litchi fruit images using deep learning. Sci. Hortic. 2020, 269, 109360. [Google Scholar] [CrossRef]
Duong, L.T.; Nguyen, P.T.; Di Sipio, C.; Di Ruscio, D. Automated fruit recognition using EfficientNet and MixNet. Comput. Electron. Agric. 2020, 171, 105326. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef]
Ni, C.; Wang, D.; Vinson, R.; Holmes, M.; Tao, Y. Automatic inspection machine for maize kernels based on deep convolutional neural networks. Biosyst. Eng. 2019, 178, 131–144. [Google Scholar] [CrossRef]
Saood, A.; Hatem, I. COVID-19 lung CT image segmentation using deep learning methods: U-Net versus SegNet. BMC Med. Imag. 2021, 21, 19. [Google Scholar] [CrossRef]
Geng, D.V.; Alkhachroum, A.; Bicchi, M.A.M.; Jagid, J.R.; Cajigas, I.; Chen, Z.S. Deep learning for robust detection of interictal epileptiform discharges. J. Neural Eng. 2021, 18, 056015. [Google Scholar] [CrossRef]
Yao, P.; Gai, S.; Chen, Y.; Chen, W.; Da, F. A multi-code 3D measurement technique based on deep learning. Opt. Lasers Eng. 2021, 143, 106623. [Google Scholar] [CrossRef]
Ma, J.S.; Sheridan, R.P.; Liaw, A.; Dahl, G.E.; Svetnik, V. Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships. J. Chem Inf. Model. 2015, 55, 263–274. [Google Scholar] [CrossRef]
Ciodaro, T.; Deva, D.; De Seixas, J.; Damazio, D. Online particle detection with Neural Networks based on topological calorimetry information. In Proceedings of the14th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT), Brunel Univ, Uxbridge, UK, 5–9 September 2011; IOP Publishing: Bristol, UK, 2012; Volume 368. [Google Scholar]
Azhari, M.; Abarda, A.; Ettaki, B.; Zerouaoui, J.; Dakkon, M. Higgs Boson Discovery using Machine Learning Methods with Pyspark. Procedia Comput. Sci. 2020, 170, 1141–1146. [Google Scholar] [CrossRef]
Shi, J.; Li, Z.; Zhu, T.; Wang, D.; Ni, C. Defect detection of industry wood veneer based on NAS and multi-channel mask R-CNN. Sensors 2020, 20, 4398. [Google Scholar] [CrossRef]
Zhou, H.; Zhuang, Z.; Liu, Y.; Liu, Y.; Zhang, X. Defect Classification of Green Plums Based on Deep Learning. Sensors 2020, 20, 6993. [Google Scholar] [CrossRef]
Jin, X.; Bagavathiannan, M.; Maity, A.; Chen, Y.; Yu, J. Deep learning for detecting herbicide weed control spectrum in turfgrass. Plant Methods 2022, 18, 94. [Google Scholar] [CrossRef]
Helmstaedter, M.; Briggman, K.L.; Turaga, S.C.; Jain, V.; Seung, H.S.; Denk, W. Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature 2013, 500, 168–174. [Google Scholar] [CrossRef]
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the COMPSTAT 2010, Paris, France, 22–27 August 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar]
Liu, W.; Chen, L.; Chen, Y.; Zhang, W. Accelerating Federated Learning via Momentum Gradient Descent. IEEE Trans. Parallel Distrib. Syst. 2020, 31, 1754–1766. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. 2014. Available online: https://arxiv.org/abs/1412.6980 (accessed on 30 January 2017).

Figure 1. Image of tea tree in tea plantation and schematic diagram of tender leaf.

Figure 2. Selection of tender leaf area (the green box) and background area (the black box).

Figure 3. Diagram of binary logistic regression.

Figure 4. Schematic diagram of ternary logistic regression.

Figure 5. Schematic diagram of R-G in logistic regression. (a) Schematic diagram of training dataset. (b) Schematic diagram of testing dataset.

Figure 6. Schematic diagram of R-B in logistic regression. (a) Schematic diagram of training dataset. (b) Schematic diagram of testing dataset.

Figure 7. Schematic diagram of G-B in logistic regression. (a) Schematic diagram of training dataset. (b) Schematic diagram of testing dataset.

Figure 8. Schematic diagram of R-G-B in logistic regression. (a) Schematic diagram of training dataset. (b) Schematic diagram of testing dataset.

Figure 9. The model-generated bounding box predictions and actual tender leaf bounding box using RGB format. (a,b) The prediction of Longjing tea tree under strong and weak light conditions. (c,d) The prediction of Yuhua tea tree under strong and weak light conditions.

Table 1. The accuracy and loss rate of training dataset and testing dataset in R-G logistic regression.

i	Training Accuracy	Training Loss	Testing Accuracy	Testing Loss
0	0.500000	0.525362	0.500000	0.573693
100	1.000000	0.230712	1.000000	0.326605
200	1.000000	0.141746	0.995000	0.244365
300	1.000000	0.101258	0.995000	0.203138
400	1.000000	0.078506	0.995000	0.178050
500	1.000000	0.064027	0.995000	0.160973
600	1.000000	0.054032	0.990000	0.148480
700	1.000000	0.046729	0.990000	0.138872
800	1.000000	0.041164	0.990000	0.131209
900	1.000000	0.036786	0.990000	0.124924
1000	1.000000	0.033252	0.990000	0.119654

Table 2. The accuracy and loss rate of training dataset and testing dataset in R-B logistic regression.

i	Training Accuracy	Training Loss	Testing Accuracy	Testing Loss
0	0.500000	0.597738	0.500000	0.635488
100	1.000000	0.172305	1.000000	0.234500
200	1.000000	0.096634	1.000000	0.150561
300	1.000000	0.066277	1.000000	0.113217
400	1.000000	0.050207	1.000000	0.091922
500	1.000000	0.040329	1.000000	0.078049
600	1.000000	0.033663	1.000000	0.068235
700	1.000000	0.028871	1.000000	0.060890
800	1.000000	0.025264	1.000000	0.055165
900	1.000000	0.022453	1.000000	0.050564
1000	1.000000	0.020201	1.000000	0.046776

Table 3. The accuracy and loss rate of training dataset and testing dataset in G-B logistic regression.

i	Training Accuracy	Training Loss	Testing Accuracy	Testing Loss
0	0.500000	0.593888	0.500000	0.627445
100	1.000000	0.169267	1.000000	0.225570
200	1.000000	0.094983	1.000000	0.143722
300	1.000000	0.065238	1.000000	0.107660
400	1.000000	0.049486	1.000000	0.087197
500	1.000000	0.039795	1.000000	0.073910
600	1.000000	0.033250	1.000000	0.064531
700	1.000000	0.028542	1.000000	0.057524
800	1.000000	0.024995	1.000000	0.052070
900	1.000000	0.022228	1.000000	0.047691
1000	1.000000	0.020011	1.000000	0.044090

Table 4. The accuracy rate and loss rate of training dataset and testing dataset in R-G-B logistic regression.

i	Training Accuracy	Training Loss	Testing Accuracy	Testing Loss
0	0.510000	0.590339	0.832500	0.556631
500	0.936500	0.159910	0.941000	0.203834
1000	0.940000	0.138236	0.942500	0.201895
1500	0.946000	0.127746	0.943500	0.201970
2000	0.949000	0.119383	0.943500	0.199923
2500	0.953000	0.112069	0.944000	0.196742
3000	0.954500	0.105656	0.944000	0.193673
3500	0.957000	0.100039	0.944500	0.191125
4000	0.959000	0.095104	0.946000	0.189094

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, J.; Chen, Y. Tender Leaf Identification for Early-Spring Green Tea Based on Semi-Supervised Learning and Image Processing. Agronomy 2022, 12, 1958. https://doi.org/10.3390/agronomy12081958

AMA Style

Yang J, Chen Y. Tender Leaf Identification for Early-Spring Green Tea Based on Semi-Supervised Learning and Image Processing. Agronomy. 2022; 12(8):1958. https://doi.org/10.3390/agronomy12081958

Chicago/Turabian Style

Yang, Jie, and Yong Chen. 2022. "Tender Leaf Identification for Early-Spring Green Tea Based on Semi-Supervised Learning and Image Processing" Agronomy 12, no. 8: 1958. https://doi.org/10.3390/agronomy12081958

APA Style

Yang, J., & Chen, Y. (2022). Tender Leaf Identification for Early-Spring Green Tea Based on Semi-Supervised Learning and Image Processing. Agronomy, 12(8), 1958. https://doi.org/10.3390/agronomy12081958

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Tender Leaf Identification for Early-Spring Green Tea Based on Semi-Supervised Learning and Image Processing

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. Training and Testing

2.2.1. Data Acquisition

2.2.2. Training and Testing in Two-Dimensional Space

2.2.3. Training and Testing in Three-Dimensional Space

2.3. Image Processing

3. Results and Discussion

3.1. Visualization and Objective Function in Two-Dimensional Space

3.2. Visualization and Objective Function in Three-Dimensional Space

3.3. Identification Result on Tender Leaves

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI