Self-Supervised Clustering for Leaf Disease Identification

Monowar, Muhammad Mostafa; Hamid, Md. Abdul; Kateb, Faris A.; Ohi, Abu Quwsar; Mridha, M. F.

doi:10.3390/agriculture12060814

Open AccessArticle

Self-Supervised Clustering for Leaf Disease Identification

by

Muhammad Mostafa Monowar

^1,*

,

Md. Abdul Hamid

¹

,

Faris A. Kateb

¹

,

Abu Quwsar Ohi

²

and

M. F. Mridha

³

¹

Department of Information Technology, Faculty of Computing & Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

²

Department of Computer Science & Engineering, Bangladesh University of Business & Technology, Dhaka 1216, Bangladesh

³

Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh

^*

Author to whom correspondence should be addressed.

Agriculture 2022, 12(6), 814; https://doi.org/10.3390/agriculture12060814

Submission received: 12 May 2022 / Revised: 31 May 2022 / Accepted: 1 June 2022 / Published: 5 June 2022

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Plant diseases have been one of the most threatening scenarios to farmers. Although most plant diseases can be identified by observing leaves, it often requires human expertise. The recent improvements in computer vision have led to introduce disease classification systems through observing leaf images. Nevertheless, most disease classification systems are specific to diseases and plants, limiting method’s usability. The methods are also costly as they require vast labeled data, which can only be done by experts. This paper introduces a self-supervised leaf disease clustering system that can be used for classifying plant diseases. As self-supervision does not require labeled data, the proposed method can be inexpensive and can be implemented for most types of plants. The method implements a siamese deep convolutional neural network (DCNN) for generating clusterable embeddings from leaf images. The training strategy of the embedding network is conducted using AutoEmbedder approach with randomly augmented image pairs. The self-supervised embedding model training involves three different data pair linkage scenarios: can-link, cannot-link, and may-link pairs. The embeddings are further clustered using k-means algorithm in the final classification stage. The experiment is conducted to individually classify diseases of eight different fruit leaves. The results indicate that the proposed leaf disease identification method performs better than the existing self-supervised clustering systems. The paper indicates that end-to-end siamese networks can outperform well-designed sequentially trained self-supervised methods.

Keywords:

deep learning; clustering; self-supervised learning; convolutional neural network

1. Introduction

One of the most important aspects of precision agriculture is dealing with the disease. Plants often suffer from numerous unrecognized diseases, which may reduce the overall yield and can even decrease production quality and quantity [1]. Therefore early plant disease detection is necessary for increased production [2]. In case of orchards, manually treating every plant often becomes unmanageable, inaccurate, and time-consuming. Hence automated plant disease detection is usually desired.

Numerous plant diseases can be identified by observing the leaves [3]. However, recognizing such conditions by perceiving plant leaves often require expertise. As deep learning methods are currently performing excellent in computer vision tasks, numerous methods are being implemented to identify plant diseases by leaf images [4]. Moreover, wide availability of smartphones instigates to use image-based detection systems.

Compared to the increasing demand for vision-based plant disease detection systems, the current detection systems are plant and disease specific [5,6,7]. Therefore the implementation of similar strategies for other plants and disease detection is complicated. Although there exist some universal plant disease detection systems, which are tested on most plants (i.e., [8,9]), such methods require labeled datasets for training. Labeling disease datasets require expertise and therefore it can be considered costly. Hence, the overall process of building an automated universal plant disease detection system can be expensive and challenging.

The challenge of data labeling can be solved using unsupervised learning strategies. Currently, self-supervised learning is gaining popularity in the field of computer vision [10]. Self-supervised learning is a type of unsupervised learning that requires data without labeling. Therefore, it is functional in numerous domains as it can learn latent features from data and identify dissimilarities between them. Currently, contrastive loss [11] is a well-implemented self-supervised learning strategy that mostly deals with augmented image triplets. Such strategies generally use a siamese network with three parallel inputs: an anchor (actual image), a positive (augmented actual image), and a negative image (image of a different class) [12]. Similar to the triplet (three parallel input) strategies pairwise strategies are also popular, which resembles the training with can-link of similar image pair and cannot-link for non-similar image pairs [13]. The success of siamese architectures in self-supervision has increased the research caliber, which further leads to strategies that do not depend on cannot-link pairs. BYOL [14] can be trained only with augmented can-link pairs, which is possible due to the momentum encoder. SimSiam [15] can be similarly trained, where the architecture requires a separate encoder and predictor network. Also, the backpropagation for SimSiam network is only carried out on half of the network training pair.

The ability of self-supervised learning is being observed in numerous domains: face recognition [16], X-ray anomaly detection [17], agricultural images [18], and so on. The most reasonable advantage of self-supervised methods is that they can reduce an intelligent system’s development by reducing the data labeling cost. Also, mixing self-supervision with supervised methods can improve a system’s performance if there is extensive unlabeled data with a small number of labeled data [19].

Self-supervision has been also observed in plant disease detection. Yang et al. [20] proposed a self-supervised visual categorization of tomato leaves for disease detection. The method consists of three networks: a location network, a feedback network, and a classification network. Initially, the location network proposes essential regions from the input image. Then, the feedback network evaluates the proposed regions by the location network concerning the output class. Finally, the classification network predicts the final class based on the region proposals of the location network. Although the method is self-supervised, it is explicitly built for tomato disease detection, limiting the usability of other plant diseases.

CIKICS [9] introduced a cross iterative clustering approach for self-supervised clustering of plant disease leaf images. The approach consists of multiple layers of process, firstly involving feature extraction using CNN followed by t-SNE. Then the method performs iterations breaking larger clusters into smaller ones. Later, the smaller clusters are used as a training set of a CNN model, and the classes for the largest cluster are predicted after completing training. Based on the pseudo-labels, a siamese network is trained using the pseudo label similarity and dissimilarity for achieving quality embedding projection. Finally, a classifier is used to identify the final class from the embeddings. The training process of CIKICS is not end-to-end. Therefore, the quality of performance depends on the processing of each step. The CIKICS requires feature extraction using a pre-trained CNN model. Hence, the overall system’s performance largely depends on the CNN model’s feature extraction quality and pre-training dataset. Moreover, t-SNE may not properly resemble the embedding maps from the inputs and maintain proper cluster properties. While using a large dataset for training, the number of smaller clusters may increase, resulting in a certain drop in performance, also increasing the training time. Therefore, CIKICS might not be the best option for an industry-grade system.

Yang et al. [20] introduced a self-supervised method for categorizing tomato diseases. Although the method reached a decent accuracy, it required an equal number of images for each disease class. Hence, the self-supervised architecture requires human specialization for data balancing. Moreover, the combination of three separate models is heavily parameterized, resulting in high computation.

AR-GAN [21] is a generative adversarial network built for an unsupervised image translation system to enhance the data augmentation policy for training deep learning models. The mentioned method is specifically designed for plant disease recognition. However, the model does not directly solve the unsupervised classification task.

Siamese network architecture is currently dominating the self-supervision domain. Training with such methods may require contrastive loss. Triplet networks require three parallel inputs: a key, a similar image to the key, and a dissimilar image to the key, whereas pairwise networks require two parallel inputs: a key and an image that can be either similar or dissimilar w.r.t. the key [22].

Comparatively, the current improvements indicate that siamese networks can be trained without contrastive learning (i.e., without the necessity of similar/dissimilarities). In such a case, only similar image pairs are used for training. SimSiam [15] and BYOL [14] can be trained using similar image pairs. However, BYOL achieves comparatively better accuracy than SimSiam. Also, such methods use cosine distance instead of euclidean distance (l2 norm). Therefore the final embeddings generated by the models are not appropriately clusterable. Our proposed method is similar to contrastive learning that implements AutoEmbedder [23] architecture. AutoEmbedder architectures can solve recognition and retrieval-related tasks with comparatively better accuracy.

This paper introduces a self-supervised method for plant disease classification from leaf data that can work for most plant disease recognition without a labeled dataset. The proposed system is implemented using a pairwise loss structure. The proposed method solves the challenges found in the existing leaf disease identification system. Table 1 illustrates a comparison of the current leaf disease identification systems compared to the proposed one. Our proposed method overcomes the training challenges as it is an end-to-end approach and does not require pre-trained weights. The model is also stable in the case of unbalanced datasets. However, the performance of the leaf disease detection approach is comparatively lower than supervised learning. Yet, the performance of our self-supervised approach is better compared to the existing self-supervised approaches. The overall contribution of the paper includes:

We propose a self-supervised leaf disease clustering method that can be used to identify leaf diseases for any set of leaf images.
We introduce mix-link and strong can-link pairs to improve the self-supervised classification process.
We compare the proposed method with recognized self-supervised learning strategies and validate that our proposed method works better as a leaf disease clustering and identification system.

The rest of the section is organized as follows: Section 2 explains the proposed strategy. Section 3 exhibits the experiments conducted to evaluate the model. Finally Section 4 concludes the paper.

2. Method

The proposed leaf disease clustering method is implemented using AutoEmbedder [23] strategy with improvements in linking and training criteria. Initially, the model requires training on an unlabelled dataset. After completing the training phase, the model would produce clusterable embeddings. Finally, the generated embeddings can be further clustered using the k-means algorithm and can be used for classification. Figure 1 illustrates the training strategy of the embedding model.

In Section 2.1 the general AutoEmbedder approach is discussed. Section 2.2 introduces mix-link pairs that inaugurate a new cluster linkage policy. Section 2.3 introduces strong can-link pairs, which are a high confidence can-link pair search using the embedding model. Section 2.4 emphasizes the usage of moving average weights of the model that improves the stability of the model. Section 2.5 explains the architecture of the embedding model. Section 2.6 discusses the pseudocode of the training strategy along with the augmentation approaches.

2.1. AutoEmbedder Architecture

In general, pairwise contrastive loss architectures are trained using can-link and cannot-link pairs. Can-link pairs contain similar image pairs, Whereas cannot-link pairs contain dissimilar image pairs. The can-link pairs are generated by augmenting a base image from the dataset. In contrast, cannot-link pairs are generated by randomly selecting two image pairs. However, for a given dataset with an arbitrary number of classes, the possibility of randomly selected images being of the same class is ignorable [24].

AutoEmbedder architecture requires a similar number of can-link and cannot-link image pairs for a given training batch. Given a pair of training data (x and

x^{'}

), the AutoEmbedder architecture (

S (\cdot, \cdot)

) can be trained as follows:

S (x, x^{'}) = R e L U (∥ E_{ϕ} (x) - E_{ϕ} (x^{'}) ∥, α) = R_{\leq α}^{+}

(1)

The

R e L U (\cdot, \cdot)

function used in Equation (1) is a thresholded ReLU function, such that,

R e L U (x, α) = \{\begin{matrix} x & if 0 \leq x < α \\ α & if x \geq α \end{matrix}

(2)

The objective of the model is to generate similar embeddings for a given can-link data pair. Embedding pairs are considered similar if their Eucledian distance is zero. Similarly, the model is instructed to generate an Eucledian distance of embeddings of at least

α

for the cannot-link pairs. The Equation (2) instructs the AutoEmbedder model to limit the similarity and dissimilarity distance within

[0, α]

.

2.2. Mix-Link Pairs

In addition to the can-link and cannot-link pairs, a mix-link image pair is introduced in this system that instructs a continuous value for linkage residing

(0, α)

. Mix-link introduces partially linked data pairs through special augmentations. Mix-link pairs are generated using MixUp [25] and CutMix [26] augmentation strategy. Both strategies mix two separate images and represent a new image with a mixture of the labels. Given a pair of images x and

x^{'}

, the linkage can be calculated as

λ x + (1 - λ) x^{'}

. Here

λ

is the volume of the mixture to the first image given the second image. Similarly, the distance metric for the mix-link pair is measured as

λ α + (1 - λ) α

explaining the magnitude of impurity/mixture concerning the first image that the model has observed in the given image pair. Figure 2 illustrates an example of mix-link pair. Apart from the can-link and cannot-link image pairs, mix-link pairs instruct the embedding model to generate continuous outputs. Therefore, the model tends to extract more feature similarity, which improves the robustness of the model, which is validated in Section 3.

2.3. Strong Can-Link Pairs

The can-link image pairs are generated by augmenting a single image from the dataset. This is true for most of the present siamese-network architectures [14,15]. Although augmenting similar images alters the image, the image representation lacks diverse features compared to similar class images. Therefore, strong augmentation strategies are required to train models properly.

Hence to introduce a versatile feature similarity, an image similarity search is conducted. For each data embedding, a set of five closest data embeddings are fetched and assumed that they belong to the same data class. The set of data classes is used for generating the can-link pairs to train the self-supervised method. Figure 3 illustrates a scenario of strong can-link pairs. Each data point is connected with five data points. Therefore, a linkage network is assigned among the embedding points. Training the model with such a linkage network would cause creating a cluster center point for similar data, where the neighboring embedding points would converge.

2.4. Moving Average Weights

The self-supervision method bootstraps for optimal cluster representation. Instead of introducing a new model weighs in every epoch, moving the average of the model weights is used for smoother learning. For a given model weight (

W

) at t timestep, the moving average can be calculated as follows,

W_{t} = γ^{1} W_{t - 1} + γ^{2} W_{t - 2} + \dots + γ^{n} W_{t - n}

(3)

Here

γ

is a decay rate controlling the ratio previous information to remember. Moving average weights provide smooth learning of model and avoid rampant precisions due to inaccurate data pairs. Moving average is currently being used in siamese networks for self-supervised learning and has been validated to give a better result [14].

2.5. Embedding Model

For our proposed architecture, the embedding model is implemented using ResNet [27] architecture due to its simplicity and scalability. ResNet implements identity mapping connections which is the improved version of residual units. The embedding model inherits the top convolution layers of ResNet, and the linear classification layers are excluded from the ResNet model. PReLU [28] activation is used inside the model to avoid dead neurons. Figure 4 illustrates a basic flow diagram of the embedding model. The output is passed directly from a batch normalization layer similar to SimSiam [15]. Adding batch normalization at the end of the output gives better stability in cluster purity and learning. An l2 weight regularization of

5 \times 10^{- 4}

is used while training. Moving average weight is implemented with a decay rate of 0.4. Adam [29] with a learning rate of

0.03

is used as an optimizer.

2.6. Leaf Disease Clustering Algorithm

The overall algorithm emphasizes building a data pipeline for training the model. Listing 1 illustrates a python-like pseudo-code for training the self-supervised model. The can-link, may-link, and cannot-link pairs are kept equal for each batch as it is validated that equal distribution boosts the performance of the AutoEmbedder architecture [23].

The strong can-link pairs are selected in half of the times of the can-link pairs. Therefore, the model can learn the data distribution from augmented image pairs and can also learn strong similarities of features while training.

The augment() function of Listing 1 performs augmentation on a given image. The selection of augmentation strategies can alter the performance of self-supervised strategies. Our approach implements some basic augmentation strategies. Moreover, color-shifting augmentations are avoided (i.e., channel shifting, random negative, etc.). Random brightness and contrast are used to implement the diversity of lightning. Random rotation and horizontal/vertical flip are utilized to show leaf images from different directions. Random zoom at most 30% of the image is implemented to focus on different positions of leaf images. Gaussian noise is also mixed with the leaf images to distort some of the features of the leaf images.

Listing 1. Python style training pseudocode.

# model: embedding model
# state: a state variable to track three different link pairs
# alpha: a constant distance metric for non-similar image pairs
# train_batch: a list of data pairs for minibatch training
# optimizer: moving average optimizer for training
# x0, x1: randomly augmented images
# _x: a temporary randomly selected image from dataset
# y: target metric output of the model

state = −1
optimizer = moving_average(adam(learning_rate=0.01))
# Embedding model compiling with optimizer
model.compile(optmimizer)

# Load a minibatch ’batch’ with N samples
for batch in dataset:
  # For each data ’x’ in minibatch
  for x in batch:
    # Changing state for each data
    state = (state + 1) % 3
    train_batch = []

    # Generate can-link pair
    if state == 0:
      x0 = augment(x)                                            # A randomly augmented version
      # Randomly decide wheather to pick a closer pair to augment
      if rand_int()%2 == 0:
       x1 = augment(fetch_closer_node(x, 5))    # Randomly augment a close selected image
      else:
        x1 = augment(x)                                          # Another randomly augmented version
      y = 0                                                                # Zero output distance
      train_batch.append((x0, x1, y))

    # Generate may-link pair
    elif state == 1:
      x0 = augment(x)                                           # A randomly augmented version
      _x = random_pick(dataset)                        # Randomly pick a data from dataset
      # Randomly perform either cutmix or mixup augmentation
      x1, y = random_perform(cutmix, mixup, (x, _x, alpha))       # Output distance inclusively between 0 and alpha
      train_batch.append((x0, x1, y))

    # Generate cannot-link pair
    else:
      x0 = augment(x)                                          # A randomly augmented version
      _x = random_pick(dataset)                       # Random pick a data from dataset
      x1 = augment(_x)                                       # Another randomly augmented version
      y = alpha                                                     # Alpha output distance
      train_batch.append((x0, x1, y))

   model.fit(train_batch)      # Fitting model on the modified training batch

3. Experiment

3.1. Evaluation Metrics

Three metrics are used for measuring the clustering performance. The metrics are discussed below:

Cluster Accuracy (CACC): Cluster accuracy (CACC) is one of the most widely used clustering metrics for measuring cluster purity [30]. CACC is computed by finding the best one-to-one matching of clusters and ground labels [23].
Adjusted Random Index (ARI): Adjusted random index is calculated using a contingency table [23]. The ARI value outputs in the range [0, 1], and the higher value represents better clustering performance.
Accuracy (ACC): Accuracy is measured by adding a linear/dense layer with a softmax activation function with the embedding model. The resulting model is trained on the labeled dataset in a supervised manner while keeping the embedding weights fixed. Such accuracy measure is popularly used in the current self-supervised models [13,14,15]. A higher score represents better performance.

CACC and ARI are the evaluation metric specifically for identifying clustering accuracy and purity of clustering. For computing CACC and ARI, the clusters are determined using the k-means algorithm. For validation, k-means algorithm is implemented without kernel functions, and it is calculated using Eucledian distance metric. Hence CACC and ARI reflects linearly seperable clustering performance. In contrast, ACC provides performance evluation based on non-linearly seperable datapoints.

3.2. Datasets

PlantVillage [31] and Citrus Disease Dataset (CDD) [32] are used for benchmarking the proposed method. In the experiment, seven different plant leaf images are used from the PlantVillage dataset. We focus on identifying diseases for single plant leaves. Hence the experiments were conducted on the subset of the dataset. Table 2 exhibits the statistics of the datasets. Figure 5 further illustrates leave images used for training.

3.3. Evaluation Baselines

Some of the popular self-supervised methods are used for benchmarking. BYOL [14] and SimSiam [15] are popular siamese networks for self-supervised learning. BYOL and SimSiam are trained only with can-link constraints. CIKICS [9] is built for clustering leaf images for disease classification. For all the models, the optimal parameters are used for training.

3.4. Comparison

For comparison, firstly, the different proposals for the leaf disease clustering algorithm are reviewed. Figure 6 illustrates a graphical comparison of the proposed model while excluding different improvements to the proposed model. Figure 6A shows the training metrics (CACC and ARI) of the proposed method. Comparatively, Figure 6B exhibits the training evaluation of the proposed model while excluding the moving average (MA) weight update. MA weight update provides smoothness to the training and slightly improves the model performance. Mean averaging prevents losing previous weights of the learned model. Therefore, the model achieves stability in weight updates in each batch, helping the model remembering previous information.

Figure 6C further exhibits a training graph of the proposed model excluding the mix-link constraint. The resulting metrics indicate that the model struggles to learn sufficient cluster information in the first 50 epochs. Consequently, the performance of the model does not improve adequately. The model has to rely on basic can/cannot-link pairs, which limits the feature complexity of the given data.

Figure 6D further illustrates a scenario without the strong can-link pairs. The model swiftly improves its accuracy in the first 50 epochs of the training, as it does not rely on its own inaccurate data link pair. However, the peak accuracy of the model falls behind compared to the proposed model.

The proposed training phase of the model contains erroneous data linkages from the following possible linkages: (a) cannot-link, (b) mix-link pairs, and (c) strong can-link pairs. The dissimilar image pairs for both cannot and mix-linkages are picked randomly from the dataset. In contrast, the strong can-link pairs are suggested by the model itself. Hence, the total errorneous pairs are not constant through the training. Figure 7 illustrates a visualization of the errornous pair ratio while conducting training on the PlantVillage dataset. The cumulative ratio of errornous pairs stays below 30% over the model training. The cannot and mix-link pairs shows a constant number of error due to randomized selection of data while training. Comparatively, the ratio of strong can-link pair start with a ratio of 0.10 and gradually decreases while the model learns the latent features from the dataset. As the number of correct linkage pairs are greater than the number of errornous linkage pairs, the model continues to learn the latent features and maintains stability.

Table 3 exhibits a comparison of the proposed architecture with different self-supervised learning strategies. The current self-supervised strategies tend to perform better while adding non-linear layers (dense layers) at the end of the embedding model (BYOL and SimSiam). The extended layers work as a classification function requiring supervised training. Moreover, such classifier layers (dense layer with softmax activation) have nonlinearity which can fit by backpropagation. However, clustering strategies do not adapt well to such non-linearly separable data while functioning satisfactorily with linearly separable data. Therefore, in the above comparison, architectures except our proposed one receive low cluster-based scores (CACC and ARI) while performing better on non-linear classification tasks (ACC). The proposed method works better for both linear and non-linear classification tasks because clusters can be separated using linear and non-linear functions. Comparatively, CIKICS achieves better clustering performance than other self-supervised classifiers. However, CIKICS does not reach the best clustering result, and the performance degrades with increasing number of classes. CIKICS generates an increased number of small cluster regions when the number of classes is raised. Therefore, the performance of CIKICS drops significantly with the high number of small clusters used in training.

The proposed method adequately classifies multiple diseases from leaf images. Figure 8 illustrates a comparison of the accuracy metrics while considering the number of disease classes in the dataset. The model maintains stability while increasing the number of classes in the dataset. However, highly increasing number of classes may decrease the model performance, similar to most other classification/clustering algorithms.

Compared to other methods, the proposed strategy achieves better CACC and ARI scores. As CACC and ARI scores indicate the quality of clusters, it can be validated that the proposed method performs better in clustering leaf disease data. Also, the ACC score remains competitive for most classes. Increasing the number of classes does not significantly affect the model’s performance. Hence, the benchmarks validate that the proposed method performs better.

4. Conclusions

The paper introduces a self-supervised leaf image clustering method appropriate for leaf disease identification. The approach implements the AutoEmbedder framework that can generate clusterable embedding. Further, the paper proposes mix-link and strong can-link pair strategy that improve the framework’s stability while training without data labels. The proposed method also includes moving average weights, which is newly introduced in the AutoEmbedder framework. The benchmark validates that the mix-link, strong can-link, and moving average weights improve the model performance. The proposed method also performs superior to the existing self-supervised methods and leaf clustering methods. The proposed self-supervised method in leaf disease identification is class-independent and can be applied in any unlabeled plant disease dataset. Furthermore, the approach reduces leaf identification systems’ production costs as it does not rely on data labeling. Hence, the contribution of the paper would recede the current limitations of self-supervised learning and leaf disease identification. In conclusion, prospective researchers should focus on improving self-supervised models to perform better than supervised methods.

Author Contributions

Conceptualization, M.M.M., M.A.H., A.Q.O. and M.F.M.; Formal analysis, F.A.K., A.Q.O. and M.F.M.; Funding acquisition, M.M.M. and F.A.K.; Investigation, M.A.H., A.Q.O. and M.F.M.; Methodology, F.A.K. and A.Q.O.; Project administration, M.M.M.; Resources, F.A.K.; Supervision, M.M.M. and M.F.M.; Validation, M.A.H. and F.A.K.; Writing—original draft, A.Q.O.; Writing—review & editing, M.M.M. and M.A.H. All authors have read and agreed to the published version of the manuscript.

Funding

The Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia has funded this project, under grant no. (KEP-7-611-42).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DCNN	Deep Convolutional Neural Network
GAN	Generative Adversarial Network

References

Martinelli, F.; Scalenghe, R.; Davino, S.; Panno, S.; Scuderi, G.; Ruisi, P.; Villa, P.; Stroppiana, D.; Boschetti, M.; Goulart, L.R.; et al. Advanced methods of plant disease detection. A review. Agron. Sustain. Dev. 2015, 35, 1–25. [Google Scholar] [CrossRef] [Green Version]
Rumpf, T.; Mahlein, A.K.; Steiner, U.; Oerke, E.C.; Dehne, H.W.; Plümer, L. Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance. Comput. Electron. Agric. 2010, 74, 91–99. [Google Scholar] [CrossRef]
Sankaran, S.; Mishra, A.; Ehsani, R.; Davis, C. A review of advanced techniques for detecting plant diseases. Comput. Electron. Agric. 2010, 72, 1–13. [Google Scholar] [CrossRef]
Barbedo, J.G. Factors influencing the use of deep learning for plant disease recognition. Biosyst. Eng. 2018, 172, 84–91. [Google Scholar] [CrossRef]
Wu, Q.; Chen, Y.; Meng, J. DCGAN-based data augmentation for tomato leaf disease identification. IEEE Access 2020, 8, 98716–98728. [Google Scholar] [CrossRef]
Ma, J.; Du, K.; Zheng, F.; Zhang, L.; Gong, Z.; Sun, Z. A recognition method for cucumber diseases using leaf symptom images based on deep convolutional neural network. Comput. Electron. Agric. 2018, 154, 18–24. [Google Scholar] [CrossRef]
Amara, J.; Bouaziz, B.; Algergawy, A. A deep learning-based approach for banana leaf diseases classification. In Proceedings of the Datenbanksysteme für Business, Technologie und Web (BTW 2017)-Workshopband, Stuttgart, Germany, 6–10 March 2017. [Google Scholar]
Cruz, A.C.; Luvisi, A.; De Bellis, L.; Ampatzidis, Y. Vision-based plant disease detection system using transfer and deep learning. In Proceedings of the 2017 ASABE Annual International Meeting, Spokane, WA, USA, 16–19 July 2017; p. 1. [Google Scholar]
Fang, U.; Li, J.; Lu, X.; Gao, L.; Ali, M.; Xiang, Y. Self-supervised cross-iterative clustering for unlabeled plant disease images. Neurocomputing 2021, 456, 36–48. [Google Scholar] [CrossRef]
Jing, L.; Tian, Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4037–4058. [Google Scholar] [CrossRef]
Jaiswal, A.; Babu, A.R.; Zadeh, M.Z.; Banerjee, D.; Makedon, F. A survey on contrastive self-supervised learning. Technologies 2020, 9, 2. [Google Scholar] [CrossRef]
Ohi, A.Q.; Mridha, M.; Hamid, M.A.; Monowar, M.M. Deep speaker recognition: Process, progress, and challenges. IEEE Access 2021, 9, 89619–89643. [Google Scholar] [CrossRef]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 15750–15758. [Google Scholar]
Arachchilage, S.W.; Izquierdo, E. SSDL: Self-supervised domain learning for improved face recognition. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 8117–8124. [Google Scholar]
Bozorgtabar, B.; Mahapatra, D.; Vray, G.; Thiran, J.P. SALAD: Self-supervised aggregation learning for anomaly detection on X-rays. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; pp. 468–478. [Google Scholar]
Güldenring, R.; Nalpantidis, L. Self-supervised contrastive learning on agricultural images. Comput. Electron. Agric. 2021, 191, 106510. [Google Scholar] [CrossRef]
Tomasev, N.; Bica, I.; McWilliams, B.; Buesing, L.; Pascanu, R.; Blundell, C.; Mitrovic, J. Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet? arXiv 2022, arXiv:2201.05119. [Google Scholar]
Yang, G.; Chen, G.; He, Y.; Yan, Z.; Guo, Y.; Ding, J. Self-supervised collaborative multi-network for fine-grained visual categorization of tomato diseases. IEEE Access 2020, 8, 211912–211923. [Google Scholar] [CrossRef]
Nazki, H.; Yoon, S.; Fuentes, A.; Park, D.S. Unsupervised image translation using adversarial networks for improved plant disease recognition. Comput. Electron. Agric. 2020, 168, 105117. [Google Scholar] [CrossRef]
Hoffer, E.; Ailon, N. Deep metric learning using triplet network. In Proceedings of the International Workshop on Similarity-Based Pattern Recognition, Copenhagen, Denmark, 12–14 October 2015; pp. 84–92. [Google Scholar]
Ohi, A.Q.; Mridha, M.F.; Safir, F.B.; Hamid, M.A.; Monowar, M.M. Autoembedder: A semi-supervised DNN embedding system for clustering. Knowl.-Based Syst. 2020, 204, 106190. [Google Scholar] [CrossRef]
Monowar, M.M.; Hamid, M.A.; Ohi, A.Q.; Alassafi, M.O.; Mridha, M. AutoRet: A Self-Supervised Spatial Recurrent Network for Content-Based Image Retrieval. Sensors 2022, 22, 2188. [Google Scholar] [CrossRef]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Mridha, M.F.; Ohi, A.Q.; Monowar, M.M.; Hamid, M.; Islam, M.; Watanobe, Y. U-vectors: Generating clusterable speaker embedding from unlabeled data. Appl. Sci. 2021, 11, 10079. [Google Scholar] [CrossRef]
Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar]
Rauf, H.T.; Saleem, B.A.; Lali, M.I.U.; Khan, M.A.; Sharif, M.; Bukhari, S.A.C. A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning. Data Brief 2019, 26, 104340. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The AutoEmbedder architecture evenly receives three possible linkage constraints while training. The target of the model is to predict the accurate distance given the input image pairs. Through learning the similarity and dissimilarity, the model learns class discriminative features and generates clusterable embeddings.

Figure 2. Mix-link pairs are generated by combining a single image with a mixture of MixUp [25] or CutMix [26] augmented images.

λ

being the ratio of the first image in the augmented image, the distance for the mix-link pairs is calculated by

λ α + (1 - λ) α

.

Figure 2. Mix-link pairs are generated by combining a single image with a mixture of MixUp [25] or CutMix [26] augmented images.

λ

being the ratio of the first image in the augmented image, the distance for the mix-link pairs is calculated by

λ α + (1 - λ) α

.

Figure 3. The brown and grey points indicate the embedding of two different classes. The strong can-link pairs are generated by calculating the nearest neighbors of each data point. Strong can-link pairs introduce diverse features to the embedding model and can increase cluster density. Due to strong link pairs, the closer embedding points (brown/grey) move towards a dense region forming individual clusters.

Figure 4. The figure illustrates the building blocks of the embedding model.

Figure 5. Some leaf images from the PlantVillage dataset. The name of the leaves and the diseases are given for each leaf. The visual features of diseased images are mostly similar, which makes the decision process challenging.

Figure 6. The figure illustrates the CACC (solid line) and ARI (dashed line) of the model while training. The graphs highlight the training benchmark without moving average (MA) weights (B), without mix-links (C), without strong can-link pairs (D), and the proposed model (A). The horizontal and vertical axis represents epochs and metric performance, respectively.

Figure 7. The figure illustrates the erroneous data ratio passed to the model while training. The horizontal and vertical axis represents epochs and error ratio, respectively.

Figure 8. Comparison of the performance of the proposed model with respect to the different number of classes. (A–C) exhibits benchmark results based on CACC, ARI, and ACC metrics, respectively. The horizontal axis represents the number of classes, and the vertical axis represents the scores of the corresponding metric.

Table 1. Comparison of different methods.

Method	Advantages	Disadvantages	Classes
Yang et al. [20]	Fine-grain feature extraction from leaf images. Can identify unusual leaf keypoints.	Requires an equal number of disease classes. Dataset labels must be known. Requires pre-trained model. Training is end-to-end.	Tomato
CIKICS [9]	Training does not require any labeled data. Can work on most leaf datasets.	Training is not end-to-end. Memory and computation intensive training. May generate an immense number of clusters on large leaf datasets.	Apple, Grape, Peach, Strawberry, Citrus
AR-GAN [21]	Unsupervised image translation for improved plant disease recognition. Can enhance the quantity of small datasets by generating synthetic leaf images.	The classification method requires supervised training.	Tomato
Ours	The training is end-to-end. Does not require any labeled data. Does not require any pre-trained model.	Accuracy is negligibly low compared to supervised methods.	Apple, Corn, Grape, Peach, Pepper, Potato, Tomato, Citrus

Table 2. Quantitative analysis of the datasets.

Dataset	Plant	# of Disease Classes	# of Images
PlantVillage [31]	Apple	4	3171
	Corn	4	3852
	Grape	4	4062
	Peach	2	2657
	Pepper	2	2475
	Potato	3	2152
	Tomato	10	18,160
CDD [32]	Citrus	5	609

Table 3. Comparison of different models. The best results are marked in bold.

Plant	# of Classes	Model	CACC	ARI	ACC
Apple	4	BYOL [14]	0.273	0.082	0.887
		SimSiam [15]	0.212	0.081	0.877
		CIKICS [9]	0.782	0.662	0.837
		Ours	0.852	0.781	0.884
Corn	4	BYOL [14]	0.207	0.078	0.845
		SimSiam [15]	0.219	0.093	0.793
		CIKICS [9]	0.793	0.671	0.846
		Ours	0.853	0.761	0.889
Grape	4	BYOL [14]	0.292	0.109	0.889
		SimSiam [15]	0.218	0.074	0.873
		CIKICS [9]	0.801	0.674	0.857
		Ours	0.843	0.774	0.887
Peach	2	BYOL [14]	0.271	0.127	0.875
		SimSiam [15]	0.258	0.104	0.871
		CIKICS [9]	0.827	0.694	0.874
		Ours	0.848	0.721	0.906
Pepper	2	BYOL [14]	0.263	0.162	0.893
		SimSiam [15]	0.249	0.114	0.888
		CIKICS [9]	0.813	0.684	0.867
		Ours	0.887	0.781	0.926
Potato	3	BYOL [14]	0.244	0.109	0.883
		SimSiam [15]	0.213	0.071	0.861
		CIKICS [9]	0.791	0.661	0.825
		Ours	0.843	0.791	0.899
Tomato	10	BYOL [14]	0.141	0.058	0.879
		SimSiam [15]	0.136	0.047	0.818
		CIKICS [9]	0.474	0.318	0.535
		Ours	0.725	0.747	0.856
Citrus	5	BYOL [14]	0.157	0.087	0.889
		SimSiam [15]	0.139	0.076	0.875
		CIKICS [9]	0.582	0.369	0.641
		Ours	0.851	0.766	0.893

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Monowar, M.M.; Hamid, M.A.; Kateb, F.A.; Ohi, A.Q.; Mridha, M.F. Self-Supervised Clustering for Leaf Disease Identification. Agriculture 2022, 12, 814. https://doi.org/10.3390/agriculture12060814

AMA Style

Monowar MM, Hamid MA, Kateb FA, Ohi AQ, Mridha MF. Self-Supervised Clustering for Leaf Disease Identification. Agriculture. 2022; 12(6):814. https://doi.org/10.3390/agriculture12060814

Chicago/Turabian Style

Monowar, Muhammad Mostafa, Md. Abdul Hamid, Faris A. Kateb, Abu Quwsar Ohi, and M. F. Mridha. 2022. "Self-Supervised Clustering for Leaf Disease Identification" Agriculture 12, no. 6: 814. https://doi.org/10.3390/agriculture12060814

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Self-Supervised Clustering for Leaf Disease Identification

Abstract

1. Introduction

2. Method

2.1. AutoEmbedder Architecture

2.2. Mix-Link Pairs

2.3. Strong Can-Link Pairs

2.4. Moving Average Weights

2.5. Embedding Model

2.6. Leaf Disease Clustering Algorithm

3. Experiment

3.1. Evaluation Metrics

3.2. Datasets

3.3. Evaluation Baselines

3.4. Comparison

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI