An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline

Haruna, Yunusa; Qin, Shiyin; Mbyamm Kiki, Mesmin J.

doi:10.3390/app13031346

Open AccessArticle

An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline

by

Yunusa Haruna

^1,*

,

Shiyin Qin

¹ and

Mesmin J. Mbyamm Kiki

²

¹

School of Automation and Electrical Engineering, Beihang University, Beijing 100191, China

²

School of Electronics and Information Engineering, Beihang University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(3), 1346; https://doi.org/10.3390/app13031346

Submission received: 27 November 2022 / Revised: 15 December 2022 / Accepted: 29 December 2022 / Published: 19 January 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

The lack of large balanced datasets in the agricultural field is a glaring problem for researchers and developers to design and train optimal deep learning models. This paper shows that using synthetic data augmentation outperforms the standard methods on object detection models and can be crucially important when datasets are few or imbalanced. The purpose of this study was to synthesize rice leaf disease data using a Style-Generative Adversarial Network Adaptive Discriminator Augmentation (SG2-ADA) and the variance of the Laplacian filter to improve the performance of Faster-Region-Based Convolutional Neural Network (faster-RCNN) and Single Shot Detector (SSD) in detecting the major diseases affecting rice. We collected a few unbalanced raw samples of rice leaf diseases images grouped into four diseases namely; bacterial blight (BB), tungro (TG), brown-spot (BS), and rice-blast (RB) with 1584, 1308, 1440, and 1600 images, respectively. We then train StyleGAN2-ADA for 250 epochs whilst using the variance of the Laplacian filter to discard blurry and poorly generated images. The synthesized images were used for augmenting faster-RCNN and SSD models in detecting rice leaf diseases. The StyleGAN2-ADA model achieved a Fréchet Inception Distance (FID) score of 26.67, Kernel Inception Distance (KID) score of 0.08, Precision of 0.49, and Recall of 0.14. In addition, we attained a mean average precision (mAP) of 0.93 and 0.91 for faster-RCNN and SSD, respectively. The learning curves of loss over 250 epochs are 0.03 and 0.04 for Faster-RCNN and SSD, respectively. In comparison to the standard data augmentation, we achieved a t-test p-value of

9.1 \times 10^{- 4}

and

8.3 \times 10^{- 5}

. Hence, the proposed data augmentation pipeline to improve faster-RCNN and SSD models in detecting rice leaf diseases is significant. Our data augmentation approach is helpful to researchers and developers that are faced with the problem of fewer imbalanced datasets and can also be adopted by other fields faced with the same problems.

Keywords:

data augmentation; object detection models; rice leaf disease detection; StyleGAN2-ADA; laplacian filter

1. Introduction

Rice is an important cereal grain that has largely contributed to global food security over the last half-century [1,2]. The recent climate change, rapid population growth, rampant degradation of the ecosystem, pests, and rice diseases are a threat to global food security [3]. Nonetheless, rice leaf diseases have one of the most devastating effects on rice production; this is so because 37% of global rice production is lost through diseases during growth and harvest periods [4]. However, the devastating impact of the disease can be reduced or prevented if correctly detected; this is true because rice leaf disease detection provides a visual indication of the need for precise treatment before they spread further. The absence of a precise disease detection system poses a serious threat to agriculture. As such, accurate detection of rice leaf disease is crucially requisite for high yield and food security [5,6]. Whilst the existing methods for rice leaf disease detection require agricultural experts to visually examine and diagnose the disease which is prone to human error, time-consuming, labor-intensive, low supply, costly, and could be more challenging in large fields where visual detection affects the spatial estimation of disease spread for control. The recent advancement in computer vision (CV) has made it possible to detect plant leaf disease using machine learning (ML) models.

Over the decade, there have been numerous studies using various CV algorithms to detect rice leaf diseases, such as image processing, support vector machine (SVM), image classification, object detection, and pattern recognition with various degrees of success [7,8,9,10,11,12,13,14]. Joshi & Jadhav [15] proposed a system for detecting rice leaf diseases using a Neighbor and Minimum Distance classifier (k-Nearest) while Jiang et al. [10] used an SVM and deep learning (DL) to detect rice leaf disease. A more advanced approach is the use of object recognition and detection algorithms. Kiratiratanapruk et al. [8] used Faster-Region-Based Convolutional Neural Network (faster-RCNN), RetinaNet, You Only Look Once (YOLOv3), and Mask-Region-Based Convolutional Neural Network (Mask-RCNN) detection models to detect rice leaf diseases using 6330 training images with an accuracy of 75.92%, 70.96%, and 36.11%, respectively. In addition, Chen et al. [14] used a transfer learning (TL) approach to detect and classify various rice leaf diseases using DenseNet and Inception models with 500 images. However, these existing methods could be optimized to achieve better efficacy for detecting crop disease. It is quite evident that most researchers use quite a few datasets for training deep neural networks, which as a result affects the model generalization and detects poorly in real scenarios.

Moreover, DL models require considerable data to perform at an optimum rate, otherwise, they run into an overfitting problem and generalize poorly [16]. In addition, labeling a considerably large amount of plant leaf disease data is tedious and time-consuming. The absence of a diverse large dataset is a significant setback for developing a model that is fit to detect rice leaf disease. Previous studies have used standard augmentation methods to create random new samples to increase the size of the image dataset [17], while others used an adaptive method in generating new samples [18]. Moreover, these methods artificially expand the same training datasets by modifying their positions and colors without introducing new “unique” data to the model. Hence, the model has already seen these data but in a different state of position and color which does not impact much on the model generalization [19]. To enhance the diversity of the dataset and also to solve the problem of a small dataset, new data can be created using a Generative Adversarial Network (GANs). GAN is a generative ML framework that uses two neural networks, a generator, and a discriminator [20]. The generator generates a high-dimensional perceptual object from a latent space while the discriminator solves a classification task by distinguishing real objects from the input dataset from fake ones generated by the generator. Intuitively, the training sets an adversarial game between two players, then proceeds in three steps of each round and iterates as many rounds as required. Recently, the StyleGAN2 Adaptive Discriminator Augmentation (SG2-ADA) architecture performs better than most of GAN’s methods and can generate high-quality objects with fewer training datasets and low computational cost by implementing ADA and TL, respectively [21]. Furthermore, the class imbalance issue is prevalent in plant leaf disease data [22] i.e., the sample data of one pathology may be far more widespread than other sample data which often results in biased models and there is a lack of studies that utilize SG2-ADA to generate a quality new set of data for rice leaf disease detection system.

In this study, we propose a pipeline for synthetic augmentation of rice leaf disease dataset using SG2-ADA to increase the efficacy of deep neural networks in detecting rice leaf disease, we adopted the standard faster-RCNN and the Single Short Detector (SSD) as our deep convolutional neural networks. While GAN augmentation takes more time and resources compared to the standard augmentation techniques, our work shows that the GAN technique can improve the efficacy of the rice leaf disease detection model better than the standard approach, which is central in high-stake agricultural decision-making. Furthermore, there have been several studies conducted to examine the efficacy of GAN-based data augmentation, albeit these studies are few in contrast to the effectiveness of the GAN approach over the standard augmentation approach. We aim to fill the research gap by exploring the problem of small datasets and class disease imbalances by evaluating the performance of standard and GAN-based data augmentation methods. Hence, to the best of our knowledge, this is the first work that uses SG2-ADA to synthetically augment the rice leaf disease dataset in order to enhance the performance of rice leaf disease detection models.

2. Relative Works and Challenges

The objective of this section is to understand the research progress and developmental trends of generative models and some key issues while also illustrating the various DL models that have adopted the GANs to generate augmented images and the significance of the augmentation process.

Since the introduction of GAN architecture by Goodfellow et al. [23], many related architectures have subsequently been proposed with some impressive results. While Deep Convolutional Generative Adversarial Network (DCGAN) and Conditional Generative Adversarial Network (cGAN) could be referred to as extensions [24]. The advanced GANs are Wasserstein Generative Adversarial Network (WGAN), Big Generative Adversarial Network (BigGAN), Cycle-Consistent Generative Adversarial Network (CycleGAN), Style-Based Generative Adversarial Network (SG), and SG2-ADA [25]. The DCGAN extends the foundational GAN by implementing Convolutional Neural Network (CNN) settings which provides stability to the training process but bring about an issue of mode collapse whereby the model produces a single type or small set of output [26]. In addition, the DCGAN’s have a vanishing gradient, i.e., when the generator fails to learn due to information starvation which results in a poor generator and a robust discriminator [27]. WGAN resolved the issues of mode collapse, vanishing gradient, and stability during training but prolonged the training session and, at times, produced poor output [28]. The BigGAN offers more stability during training and has better results than WGAN but requires more extensive data samples and time [29,30]. SG architecture provides an advanced solution by improving the traditional GAN architectural model and training process by redesigning the generator normalization, modifying progressive growth, and regularizing the generator [31]. They provided an advanced method for high-resolution image synthesis, but sometimes they generate uneven parts inside an image and require a considerable amount of quality datasets [32]. SG2 traces the unnatural blob-like artifact caused by SG by stabilizing the high-resolution training; moreover, using small datasets leads to discriminator overfitting [33]. The SG2-ADA produces excellent results with a small dataset while achieving a better result [34].

In addition, SG2-ADA has been adopted in augmenting training datasets in various fields of DL applications, such as generating images for facial recognition, medical image synthesis for a brain tumor, generating synthetic datasets for liver lesion classification, generating images for manufacturing components, and generating datasets for credit card fraudulent transaction predictions [35,36,37,38,39]. Recently, studies have been aimed at automating plant leaf disease detection based on DL methods with various degrees of success using different model architectures and pre-trained models, such Fast-RCNN, Faster-RCNN, Region-based Convolutional Neural Networks (R-CNN), Histogram of Oriented Gradients (HOG), YOLO, Region-based Fully Convolutional Network (R-FCN), Single Shot Detector (SSD) and Spatial Pyramid Pooling (SPP-net) [40]. However, a common problem in the field of plant leaf disease detection is the lack of sufficient and diverse training datasets, which often leads to an undesirable effect on the model’s performance while making the model run into an overfitting problem because of the way DL models learn, i.e., DL model requires a considerable amount of data to perform well [41].

The standard augmentation methods artificially expand the same training datasets by modifying their positions and colors to create more data while the GANs create a new set of diverse data. Hence, the former method does not impact much on the model generalization because it is not introducing new datasets to the object detection model but instead modifies their positions and colors [19,42], this point proves to be significant in improving the accuracy and generalization of DL models. Lastly, Table 1 [43] shows the major limitations and advantages of the standard augmentation method.

3. Methodology and Tools

This section discusses the research scheme, methods, GAN, SG2-ADA, advantages, and its performances.

3.1. Research Scheme

The approach proposes a semi-supervised data augmentation pipeline that leverages generative adversarial networks’ ability to generate high-quality synthetic images by utilizing the SG2-ADA model to generate synthetic images. Sequentially, the best-quality synthetic images are filtered using some image processing techniques and combined with the original raw data. This in turn amplifies the dataset and is used to solve the issue with the class imbalance and limited dataset. Subsequently, the combined annotated dataset is fed into a CNN network for training an object detection model and the results are evaluated using mean average precision metrics. Lastly, we compared the resulting analysis of the GAN-based augmentation with the standard augmentation technique.

3.2. Research Method

The proposed GAN-based augmentation pipeline trains an SG2-ADA model using a limited dataset to generate high-quality synthetic images. The goal here is to reproduce a statistical pattern and property of the original dataset through SG2-ADA by modeling its probability distribution over the data space. After training the generative model until it reaches convergence. The GAN model generates new data with the same characteristics as the original data using a truncation factor of (

ψ

= 0.25,

ψ

= 0.50,

ψ = 0.75

, and ψ = 1.0); noting that the increase in truncation helps to further diversify the generated images by shrinking the sampling region of the latent space. Part of the generated images come with some defects, such as blurriness and noise distortion which could significantly degrade the performance of the CNN models when applied [44]. As such, we applied an image processing technique to detect the amount of blurriness in the generated images, then filter and discard some of the blurry and other poorly generated images by using the variance of the Laplacian method (see Equations (1)–(5)). The method is implemented by passing a single channel of an image and convolving it with a

3 \times 3

kernel while taking the variance of the response. If the variance is

\leq 100

then the image is considered blurry; otherwise, the image is not blurry. (See Figure 1 and Figure 2).

Laplacian can be given as:

\nabla^{2} f = \frac{\partial^{2} f}{\partial^{2} x} + \frac{\partial^{2} f}{\partial^{2} f}

(1)

Partial derivative with respect to x:

\frac{\partial^{2} f}{\partial^{2} x} = f (x + 1, y) + f (x - 1, y) - 2 f (x, y)

(2)

Partial derivative with respect to y:

\frac{\partial^{2} f}{\partial^{2} y} = f (x, y + 1) + f (x, y - 1) - 2 f (x, y)

(3)

Combined the partial derivatives of x, y.

\nabla^{2} f = [f (x + 1, y) + f (x - 1, y) + f (x, y + 1) + f (x, y - 1)] - 4 f (x, y)

(4)

Then forms a 3 × 3 filters for

(x, y)

coordinates.

\begin{matrix} x - 1, y - 1 & x, y - 1 & x + 1, y - 1 \\ x - 1, y & x, y & x, y + 1 \\ x - 1, y + 1 & x, y + 1 & x + 1, y + 1 \end{matrix} \to \begin{matrix} 0 & 1 & 0 \\ 1 & - 4 & 1 \\ 0 & 1 & 0 \end{matrix}

(5)

Then, the filtered generated images are combined with the existing raw images. This can be represented mathematically. (See Equation (6)) shows a union of existing and generated synthetic data.

A \cup B = {x : x \in A o r x \in B}

(6)

where

$A = {x, e x i s t i n g r a w d a t a}$
$B = {x, g e n e r a t e d s y n t h e t i c d a t a}$

Lastly, the combined balanced data is passed forward to Faster-RCNN or SSD model for training using the proposed augmentation pipeline, while also applying the standard augmentation technique on faster-RCNN and SSD model, (see Figure 3). Hence, the results of the two methods are compared based on the average precision and loss metrics.

The following Algorithm 1 shows a step-by-step implementation of the proposed augmentation pipeline; our method follows a simple optimization framework that involves generating synthetic images to solve the problem of a small and imbalanced dataset. We start off by declaring the inputs;

A_{i} \in ℝ^{n \times m \times 3}

is original images,

B_{i} \in ℝ^{n \times m \times 3}

is synthetic images generated by a pre-trained SG2-ADA generator

G (A_{i} \in ℝ^{n \times m \times 3})

,

l f

is the Laplacian filter,

D_{t o t a l}

, and

C_{a u g}

is a set of an empty array. The next step follows

f u n c t i o n A U G (B_{i})

, which detects and discards blur images by measuring the variance

σ {(B_{i + 1})}^{2}

. Lastly, the highest quality synthetic images are annotated and labeled accordingly and used to augment the dataset to solve the problem of small, imbalanced datasets whilst improving the efficacy for detecting rice leaf disease in deep learning models.

Algorithm 1 Step-by-step implementation of the proposed augmentation pipeline

Start
$A_{i} \in ℝ^{n \times m \times 3} r a w i n p u t i m a g e s$
$B_{i} ℝ^{n \times m \times 3} synthetic images from pretrained generator G (A_{i}) w i t h S G 2 - A D A$
$l f i s L a p l a c i a n f i l t e r \leftarrow \begin{matrix} 0 & 1 & 0 \\ 1 & - 4 & 1 \\ 0 & 1 & 0 \end{matrix}$
$D_{t o t a l} \leftarrow \emptyset i n i t i a l i z e e m p t y s e t$
$input : A_{i} = {a_{1,} a_{2} a_{3,} \dots, a_{n}}, B_{i} = {b_{1,} b_{2} b_{3,} \dots, b_{n}}, G (.), l f$
function $A U G (B_{i})$
$C_{a u g} \leftarrow \emptyset i n i t i a l i z e e m p t y s e t$
for $B_{i} \in B d o = > iterate over all the synthetic images$
$B_{i + 1} = c o n v o l u t i o n (B_{i}, l f)$
$i f σ {(B_{i + 1})}^{2} < 100 t h e n$
$d i s c a r d \leftarrow T r u e = > Blurry images$
if not discarded then
$C_{a u g} \leftarrow C_{a u g} \cup B_{i} = > Good quality synthetic images$
end for
$r e t u r n C_{a u g}$
$D_{t o t a l} \leftarrow A_{i} \cup A U G (B_{i})$
$D_{t r a i n}, D_{t e s t}, D_{e v a l} \leftarrow s p l i t a n d l a b e l D_{t o t a l}$
$C N N M o d e l (D_{t r a i n}, D_{t e s t}, D_{e v a l})$
$s c o r e \leftarrow e v a l u a t e C N N M o d e l$
End

3.3. GAN

GAN is a generative model capable of creating high dimensional perceptual plausible objects similar in characteristics to its training data [45]. It employs two neural networks, a generator and a discriminator pitting them against each other to create new objects. The generator generates a high-dimensional perceptual object, while the discriminator distinguishes real images from the datasets or fake items created by the generator. As part of an adversarial network, training is set up as an adversarial game between two players, and training proceeds in three steps for each round and iterates as many rounds as possible [46]. The phases are explained below while Figure 4 shows the graphical illustration of the whole process.

First-phase: the generator pulls some relatively small number of dimensions from a simple distribution, which is a latent noise vector, such as a multi-dimensional uniform distribution from

0 to 1 or 1 to - 1,

which is used as an input to the generator that produces a sample; then, the sample is taken which is passed to the discriminator. And so the network back propagates from the loss through the output of the discriminator, but because this is a generator training step, the discriminator weight values are held and fixed (no updates). Then, the network computes the derivatives with respect to the inputs. Weights are applied to the generator such that it improves image generation. Figure 4 below illustrates the process.

Intuitively, noise

(z)

is passed to a Generator neural network

(G)

that provides an output of

G (z)

with a cost function (see Equation (7)).

\frac{1}{m} \sum_{i = 1}^{m} \log (1 - D (G (z^{i})))

(7)

Second-phase: the generator creates some number of noise vectors from the latent space, then they are fed into the generator to produce some number of synthetic samples. The samples are then passed to the discriminator which distinguishes real from fake with the same notion of classification loss except that instead of trying to fool the discriminator the loss is expected to be high i.e., the model is doing a bad job if it cannot tell real from fake. The stochastic gradient descent is applied and the neural network is back-propagated to the discriminator with the weights being updated in the direction of better able to distinguish real from fake and then it stops without updating the generator Figure 5 below illustrates the process.

Intuitively, the discriminator is a classifier that trains to classify fake input when

(P = 0)

with a denotation of output as

D (G (z))

and real input when

(P = 1)

with a denotation of output as

D (x)

, with a cost function (see Equation (8)).

\frac{1}{m} \sum_{i = 1}^{m} [\log D (x^{(i)}) + \log (1 - D (G (z^{(i)})))]

(8)

Third-phase: samples are pulled from real images and then passed to the discriminator to make sure the discriminator is able to tell that real images are real, rather than just fake images are fake, this phase marks an end to a training round process. Hence, the whole phases are repeated over and over until eventually, the generator gets to a point where the discriminator really cannot tell real from fake at which point the training session stops. That means the generator can produce high-dimensional perceptual plausible objects. Therefore, the final cost function is (see Equation (9)). See (Figure 6 and Figure 7) for the GAN and the intuitive GAN process, respectively.

\min_{G} \min_{D} V (D, G) = E_{x ~ P d a t a (x)} [\log D (x)] + E_{z ~ P z (z)} [\log (1 - D (G (z)))]

(9)

The following algorithm shows a summary implementation of the generative adversarial nets proposed in 2014 by Goodfellow et al. [23]. Algorithm 2 Mini-batch stochastic gradient descent training GAN. The number of steps to apply to the discriminator, k, is a hyper-parameter and k = 1.

Algorithm 2 A summary implementation of the generative adversarial nets proposed in 2014 by Goodfellow et al. [23]

for training iterations do
for k steps do
$mini - batch m noise samples {z^{1}, \dots, z^{(1)}} from noise prior P_{g} (z)$
$mini - batch m noise {x^{1}, \dots, x^{(m)}} from data generating distribution P_{d a t a} (x)$
$\nabla_{θ_{d}} \frac{1}{m} \sum_{i = 1}^{m} [\log D (x^{(i)}) + \log (1 - D (G (z^{(i)})))]$
end for
$mini - batch m noise samples {z^{1}, \dots, z^{(m)}} from noise prior P_{g} (z) .$
update the generator by descending it is stochastic gradient:

$\frac{1}{m} \sum_{i = 1}^{m} \log (1 - D (G (z^{i})))$

9.: end for

3.4. Standard Data Augmentation

In computer vision, standard data augmentation method is used as a tool to artificially expand the training dataset by modifying the existing training sets through several transformations. The reason for this is when existing datasets are small or imbalanced, preventing model overfitting or improving model performance. We carried out the following transformation on the datasets. The following techniques form our standard augmentation datasets.

Geometric transformation on the existing datasets: random crop, flip, rotate;
Color transformations: blur, sharpen and noise;
Random erasing [47];
Mixing images [48].

3.5. SG2-ADA and Its Performances

The SG2-ADA is a state-of-the-art generative model that could generate high-quality synthetic images with fewer training datasets in contrast to other types of GAN that requires a large number of data to perform well, though a large dataset could significantly improve its performances [21]. The model was designed to work with a limited available dataset using an innovative data augmentation technique called ADA, see Figure 5, which is similar to the standard augmentation method but the augmentation process is not fixed but adaptive. The standard augmentation technique causes the generated images to acquire some negative properties of the augmentation parameter, for instance, cropping an image will result in cropped generated images [21]. The ADA works based on an improved balanced consistency regularization (bCR) that applies augmentation shown to the discriminator in a controlled and adaptive fashion based on the level of overfitting [21]. This makes the SG2-ADA perform well in the absence of unlimited data and faster during training without compromising performance. Figure 8 visually illustrate the SG2-ADA architecture.

4. Detection Algorithm and Its Applicability

Object detection algorithms allow a computer system to “see” its environs by detecting instances of objects belonging to certain classes in digital visual images or videos. Moreover, these algorithms are classified into two main types: one-stage and two-stage detection algorithms. The former combines all the processes in a single go by predicting bounding boxes around an object making them faster and structurally simple, such as SSD, YOLO, and RetinaNet, while the latter divides the task into two stages, by firstly making a region proposal of objects using deep features and lastly classifying objects with their respective bounding box regression for each object, such as Faster-RCNN, Mask-RCNN, and granulated-RCNN. This makes the two-staged object detection algorithm slower, and structurally complex with better accuracy. However, object detection algorithms have been applied across various spectrums with success, such as autonomous driving, medical feature detection in healthcare, pedestrian detection, theft detection, performance assessment in sports, farm automation, and plant disease detection [49,50,51,52,53,54,55].

4.1. Implementing Algorithm and Its Procedures

Our study utilized both one-staged and two-staged object detection algorithms, namely, SSD and faster-RCNN, respectively.

4.1.1. SSD Model

We used an input image of size

300 \times 300

pixel for implementing the SDD model. Then, we used a VGG16 net as its backbone network without its fully connected (FC) layers. Six auxiliary convolutional layers were added with distinct kernel sizes that aid in detecting our target objects at multiple scales. However, the convolutional layers decreased the feature map sizes, while increasing the depth progressively (see Figure 9).

Our feature maps are of sizes

38 \times 38, 19 \times 19, 10 \times 10, 5 \times 5, 3 \times 3

, and

1 \times 1

38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3, and 1 × 1 while computing both the location and the class scores using a

3 \times 3

convolution filter. For the loss function, regression (localization) loss, and classification (confidence scores) loss (see Equations (10), (11) and (12), respectively).

L (x, c, l, g) = \frac{1}{N} (L_{c o n f} (x, c) + \propto L_{l o c} (x, l, g))

(10)

L_{l o c} (x, l, g) = \sum_{i \in P o s}^{N} \sum_{m \in {c x, c y, w, h}} x_{i j}^{k} s m o o t h_{L 1} (l_{i}^{m} - {\hat{g}}_{j}^{m})

(11)

L_{c o n f} (x, c) = - \sum_{i \in P o s}^{N} x_{i j}^{p} l o g ({\hat{c}}_{i}^{p}) - \sum_{i \in N e g} \log ({\hat{c}}_{i}^{p})

(12)

where

${\hat{c}}_{i}^{p} = \frac{e x p (c_{i}^{p})}{\sum_{p} e x p (c_{i}^{p})}$
$N = n u m b e r o f p o s i t v e m a t c h e s$
$\propto = w e i g h t f o r t h e l o c a l i z a t i o n l o s s$
$l = p r e d i c t e d b o u n d i n g b o x$
$g = g r o u n d t r u t h b o u n d i n g b o x$
${\hat{c}}_{i}^{p} = s o f t m a x a c t i v a t e d c l a s s s c o r e f o r d e f a u l t b o x i a n d c a t e g o r y p$
$x_{i j}^{p} = m a t c h i n g i n d i c a t o r b e t w e e n d e f a u l t b o x i a n d g r o u n d t r u t h b o x j$
$o f c a t e g o r y p$
$x_{i j}^{k} = m a t c h i n g i n d i c a t o r b e t w e e n d e f a u l t b o x i$
$a n d g r o u n d t r u t h b o x j o f c a t e g o r y k$

4.1.2. Faster RCNN

Faster-RCNN is a state-of-the-art object detection algorithm that consists of a Region Proposal Network (RPN) and fast-RCNN. The RPN generates region proposals that are fed to Fast-RCNN to detect classes of objects, (see Figure 10). Our study used an input image of size 256 × 256 pixels which generates an

~ 2000

anchors per image by

(\frac{256}{16}) * (\frac{256}{16}) * 9

with 3 aspect ratios of (1:1, 1:2 and 2:1). We then implemented the Non-Maximum Suppression (NMS) based on the image classification and

I o U of 7

. Positive labels are assigned to an anchor with the highest

I o U

with a ground truth box or

I o U

that overlaps higher than a 0.7 score with any ground truth. The loss function for the RPN is:

L ({p_{i}}, {t_{i}}) = 1 / N_{c l s} (\sum_{i} L_{c l s} (p_{i}, p_{i}^{*})) + λ / N_{r e g} (\sum_{i} p_{i}^{*} \times L_{r e g} (t_{i}, t_{i}^{*}))

(13)

where

$p_{i} = p r e d i c t e d p r o p a b i l i t y o f a n c h o r s c o n t a i n i n g a c l a s s o f o b j e c t o r n o t$
$p_{i}^{*} = a n c h o r s g o u n d t r u t h v a l u e s c o n t a i n i n g a c l a s s o f o b j e c t o r n o t$
$t_{i} = c o o r d i n a t e s o f p r e d i c t e d a n c h o r s$
$t_{i}^{*} = g r o u n d t r u t h c o o r d i n a t e s r e l a t e d w i t h t h e b o u n d i n g b o x e s$
$L_{c l} = c l a s s i f i e r l o s s$
$L_{r e g} = r e g r e s s i o n l o s s$
$N_{c l s} = n o r m a l i z a t i o n p a r a m o f m i n i - b a t c h s i z e$
$N_{r e g} = n o r m a l i z a t i o n p a r a m o f r e g r e s s i o n$
$λ = 10, o r d e r t o m a k e n = b o t h l o s s p a r a m e t e r e q u a l l y w e i g h t e d r i g h t .$

Figure 10. Faster-RCNN model (Cha et al. [57]).

We then implemented the second stage of faster-RCNN, which is the object detection network. Zeiler and Fergus (ZF) Net [58] was used as the backbone network, which uses the

R o I

pooling layer for making region proposals generated from RPN (first stage) into a fixed-size feature map with an output of size (

7 \times 7 \times D) (where D = 256 for ZF)

. In addition, the generated fixed-size feature maps were sent to two fully connected layers, where they were flattened and then sent as outputs with two distinct tasks assigned to them. The first layer predicts the region proposal using a softmax of

N + 1 output params

while the second layer determines the bounding box location of the object in the given image using a bounding box regression with

4 \times N output params

.

4.2. Performance Measurement

We adopted de jure standard metrics for evaluating the performances of object detection and generative models, such as mean average precisions, loss values, Fréchet Inception Distance (FID), Precision and Recall (P&R), and Kernel Inception Distance (KID). The following explains these metrics in details.

4.2.1. Mean Average

To evaluate the performances of the proposed augmentation pipeline and the standard method on the objection detection models, we used the mean average precision (mAP) metric (see Equation (14)). The evaluation was carried out for the faster-RCNN and SSD object detection models. The metric was conducted in two-level groups; the proposed GAN-based augmentation datasets, and the standard augmentation datasets. We set the intersection over union (IoU) to 0.7, which implies that only values greater than or equal to the set values are considered positive detection. In addition, we also evaluated the loss classification of our model. A higher mAP indicates better detection model.

m A P @ \propto = \frac{1}{n} \sum_{i = 1}^{n} A P_{i} f o r n c l a s s e s

(14)

4.2.2. Fréchet Inception Distance (FID)

The generated synthetic images are close to real images and could be difficult to distinguish them by human visual perception. FID is one of the most recommended and widely used metrics for calculating the similarities between real and synthetic images. It does so by calculating the Wasserstein-2 distance between the two images, real and synthetic, using the Inception-v3 neural network feature space. A lower FID value means higher quality and diverse image.

F I D = {| | μ_{r} - μ_{g} | |}^{2} + T r (\sum_{r} + \sum_{g} - 2 ({(\sum_{r} \sum_{g})}^{\frac{1}{2}}))

(15)

where

μ_{r}

and

μ_{g}

represent the mean of the real and synthetic images, respectively.

\sum_{r} a n d \sum_{g}

represent the covariance matrices of the real and synthetic images, respectively.

4.2.3. Kernel Inception Distance (KID)

This metric was proposed by Binkowski et al. [59] to replace FID because the former has no unbiased estimator, is computationally heavy, and performs poorly on small datasets. The metric measures image generation quality by determining the difference in the generated image and the training distributions in the representation space of an InceptionV3 net based on a pre-trained ImageNet.

4.2.4. Precision & Recall (P&R)

Meanwhile, Sajjadi et al. [60] used a different approach and proposed a quality evaluation method that utilizes two different factors: Precision and Recall. While Precision shows how close the generated image is to the real image, Recall measures the differences in the distribution of the generated and real images.

5. Experimental Results and Comparative Analysis

This section describes the dataset in detail, the preprocessing of the data, the experimental scheme, and the comparative analysis of our results.

5.1. Experimental Datasets Description

Our study used Rice Leaf Disease Image Samples from Mendeley Data [61]; it is an open-source dataset with 5932 images of rice leaf disease. The images are grouped into four rice leaf diseases as Bacterial blight (BB), Tungro (TG), Brown Spot (BS), and Rice Blast (RB). The category of BB, TG, BS, and RB have 1584, 1308, 1440, and 1600 images, respectively. The raw images come in different sizes ranging from 150 pixels to 300 pixels.

5.2. Experimental Scheme and Process

The proposed approach was implemented on a Linux-based system with Intel Core i7 8700k, 2 NVIDIA Titan XP 12GB, and 32GB of RAM. Pytorch 1.10 framework was adopted for implementing the StyleGAN2-ADA, while Tensorflow 2.7 was adopted for implementing Faster-RCNN and SSD models. We trained faster-RCNN and SSD models from scratch separately using two instances of the datasets to detect four different pathologies of rice leaf diseases to examine the model performances. We implemented two types of data augmentation methods (GAN-based augmentation and standard augmentation). We adopted the binary cross-entropy loss for 200 epochs for better performance. Lastly, we updated the hyper-parameters for every 50 epochs except for the batch size.

Learning rate: {1 $\times 10^{- 2}, 3 \times 10^{- 3}$ }
Batch size: {1}
Optimizer: {Adagrad, Adam}

5.3. Image Data Preprocessing

This study carried out two images pre-processing for training the GAN and object detection models. For training, the GAN model complied with the requirement of SG2-ADA model by converting the sizes of the raw data into a uniform scale of

256 \times 256

pixels. We used the Python Imaging Library (PIL) to implement the resizing tasks [62]. While for the object detection models, we labeled the datasets and annotated them based on bounding boxes using LabelImg [63]. Then, we converted the dataset to tf-record formats which were divided into 70% for training, 15% for evaluation, and 15% for testing the performances of the model.

5.4. Synthetic Data Generation

Our experiment with the proposed GAN-based augmentation pipeline used an SG2-ADA model, which shows that we can generate high-quality refined rice leaf disease images with a few available training datasets which can be used directly for data augmentation to improve Faster-RCNN and SSD models for rice disease detection. See Figure 11 for real images and Figure 12 for synthetic images.

The visual quality of the synthetic images generated by the proposed pipeline is good and would be difficult to distinguish between real and fake images by a non-expert.

The model generated 50,001 images of 4 classes of rice leaf disease, which include BL, TG, BS, and RB though 46% of the synthetic images were blurry and distorted, which was detected and filtered out using the variance of the Laplacian method. Afterimage preprocessing, we produced 26,694 synthetic images. Table 2, shows the evaluation results of the network using the FID, KID, and P&R metrics. The network achieved an FID score of 26.67, KID score of 0.08, Precision of 0.49, and Recall of 0.14 after training for 144 h. It was also observed that the FID value decreased during the course of training.

5.5. Comparison of Mean Average Precisions (mAP)

Table 3, shows the comparisons of mAP scores between the standard augmentation method and the GAN-based augmentation method. On Faster-RCNN, we observed that the GAN-based augmentation method improves the model performance with a difference of 0.09 mAP score on the validation datasets. On the SSD model, we also see that the GAN-based augmentation method improves the model performance with a difference of 0.1 mAP score.

Furthermore, the result showed that RB achieved the best average precision (AP) score of 0.95 for Faster-RCNN and 0.93 for SSD model for the GAN-based augmentation pipeline method, while TG achieved the best result with an AP score of 0.89 for Faster-RCNN and a 0.85 AP score for the SSD model for the standard augmentation method, while the result showed that BB achieved the lowest average precision (AP) score of 0.90 for Faster-RCNN and 0.89 for the SSD model for the GAN-based method. BB also achieved the lowest AP score of 0.78 for Faster-RCNN and 0.77 AP score for the SSD model for the standard augmentation method.

5.6. Comparison of Learning Curves

In Figure 13, we compared training and validation learning curves alongside a standard augmentation and GAN-based augmentation on the Faster-RCNN model. We observed that GAN-based augmentation improves the model efficacy. In the final steps, we observed a 0.18 difference between training losses of standard augmentation and the GAN-based augmentation model. We also see a 0.204 difference between the validation losses of standard augmentation and the GAN-based augmentation model. Each curve shows the averages across five runs under an augmentation type. This gives a clear sense of variability in these curves and the significance of the differences between the two types of augmentation methods.

In Figure 14, we compared training and validation learning curves alongside a standard augmentation and GAN-based augmentation on the SSD model. We observed that GAN-based augmentation improves the model efficacy. In the final steps, we observed a 0.25 difference between training losses of standard augmentation and the GAN-based augmentation model. We also see a 0.22 difference between the validation losses of standard augmentation and the GAN-based augmentation model. Each curve shows the averages across five runs under an augmentation type. This gives a clear sense of variability in these curves and the significance of the differences between the two types of augmentation methods.

5.7. K-Fold Validation

We carried out a k-fold validation, splitting the entire datasets into k-folds where k = 5, and at a specific iteration, one block is held for validation while the other block is to train the network. The final score is computed by taking the average scores across k-folds. Figure 15 below shows the k-fold dataset iterations representation, groups, and classes.

In addition, we compared the object classification accuracies of faster-RCNN and SSD based on the GAN and the standard method on five datasets (k = 5). The last row is the total average accuracy of the four methods on k = 5. In the majority of the datasets, the accuracy of the proposed augmentation pipeline is higher than the standard augmentation on all k-folds. The classifier with our proposed method is found to result in the best prediction average accuracy, which was 91.83%. The other results are found to be 83.78%, 78.92%, 84.47%, 88.71%, 77.93%, and 83.37% (See Table 4)

5.8. Comparison of Losses with Boxplots

The boxplot in Figure 16 visualizes the difference in the minimum validation and training losses across all our samples. We achieved further comparative insight regarding standard augmentation and GAN-based augmentation methods across the training and validation losses using boxplots.

The study hypothesis is to show that the minimum validation and training loss differs significantly between the proposed augmentation pipeline and the standard augmentation method in classifying rice leaf diseases. Therefore, at a significance level of <= 0.05 (see Table 5), our analysis confirms our hypothesis that the minimum validation and training loss is significantly higher in the standard augmentation when compared to the proposed augmentation pipeline across the two object detection models. Hence, our result implies that the proposed augmentation pipeline for improving the performances of an object-detection model when faced with a small or imbalanced dataset is effective and therefore significantly improves the models’ efficacy.

6. Concluding Remarks

The recent advancement in DL methods has made it possible to efficiently detect rice leaf disease automatically, albeit a large, diverse, and balanced dataset is required to achieve an optimum result. Currently, there are few balanced rice leaf disease datasets available in open source when compared to millions of other datasets freely available. Previous studies have used the standard augmentation methods to create new samples to increase the size of the image dataset by modifying the positions (rotation, etc.) and colors (brightness, etc.) without introducing “new” diverse data to the model which does not impact much on the model efficacy and generalization. We proposed a data augmentation pipeline using an SG2-ADA state-of-the-art generative model and variance of the Laplacian method to generate and filter out high-quality “new” synthetic rice leaf disease images to improve the performance of the CNN-based object-detection models. We compared the performances between the proposed augmentation pipeline and the standard augmentation method on one-staged and two-staged object detection models, SSD and Faster-RCNN, respectively. Based on our results we conclude that: (1) the proposed augmentation pipeline is efficient in producing high-quality synthetic images of rice leaf disease with few training datasets. (2) The visual quality of the synthetic images was very close to real images and could be difficult to visually distinguish. (3) We show that the proposed augmentation pipeline could be an effective tool for amplifying, diversifying, and correcting the imbalances in plant-based disease datasets. (4) We observed that the proposed augmentation pipeline yielded better efficacy when compared to the standard augmentation method. (5) Faster-RCNN and SSD model proves to be effective in detecting various classes of rice leaf disease. (6) This work skillfully combines several concepts, approaches, techniques, and components, such as Data Augmentation, Object Detection Models, Rice Leaf Disease Detection, SG2-ADA, Laplacian filter, Faster-RCNN, and SSD models. Hence, generative model-based augmentation is a promising area of study to improve model performance and to solve the issue of generalization when data is limited or imbalanced, though it comes with time and resource costs. This is true, especially in the area of agriculture, medicine, etc.

Author Contributions

Conceptualization, Y.H.; formal analysis, Y.H.; Funding acquisition, Y.H.; Investigation, Y.H.; Methodology, Y.H.; Software, Y.H.; Visualization, Y.H.; Writing (original draft), Y.H.; Validation, H.Y; project administration, S.Q.; supervision, S.Q.; Writing (review & editing), S.Q.; data curation, M.J.M.K.; resources, M.J.M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data generated during the study are subject to a data sharing mandate and available in a public repository that does not issue datasets with DOIs. Rice Leaf Disease Data that support the findings of this study have been deposited in a GitHub repository [https://github.com/yunusa2k2/GANLapRice] accessed on 15 October 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Behnassi, M.; Baig, M.B.; Sraïri, M.T.; Alsheikh, A.A.; Risheh, A.W.A.A. Food Security and Climate-Climate Smart Food Systems—An Introduction. In Food Security and Climate-Smart Food Systems; Springer: Cham, Switzerland, 2022; pp. 1–13. [Google Scholar]
Falsafi, P.; Baig, M.B.; Reed, M.R.; Behnassi, M. The Nexus of Climate Change, Food Security, and Agricultural Extension in Islamic Republic of Iran. In Food Security and Climate-Smart Food Systems; Springer: Cham, Switzerland, 2022; pp. 241–261. [Google Scholar]
Yuen, K.W.; Hanh, T.T.; Quynh, V.D.; Switzer, A.D.; Teng, P.; Lee, J.S.H. Interacting effects of land-use change and natural hazards on rice agriculture in the Mekong and Red River deltas in Vietnam. Nat. Hazards Earth Syst. Sci. 2021, 21, 1473–1493. [Google Scholar] [CrossRef]
Sekiya, N.; Nakajima, T.; Oizumi, N.; Kurosawa, C.; Tibanyendela, N.; Peter, M.A.; Natsuaki, K.T. Agronomic practices preventing local outbreaks of rice yellow mottle virus disease revealed by spatial autoregressive analysis. Agron. Sustain. Dev. 2022, 42, 1–15. [Google Scholar] [CrossRef]
Bari, B.S.; Islam, M.N.; Rashid, M.; Hasan, M.J.; Razman, M.A.M.; Musa, R.M.; Majeed, A.P.A. A real-time ap-proach of diagnosing rice leaf disease using deep learning-based faster R-CNN framework. PeerJ Comput. Sci. 2021, 7, e432. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Wang, H.; Peng, Z. Rice diseases detection and classification using attention based neural network and bayesian optimization. Expert Syst. Appl. 2021, 178, 114770. [Google Scholar] [CrossRef]
Deng, R.; Tao, M.; Xing, H.; Yang, X.; Liu, C.; Liao, K.; Qi, L. Automatic diagnosis of rice diseases using deep learning. Front. Plant Sci. 2021, 12, 1691. [Google Scholar] [CrossRef]
Kiratiratanapruk, K.; Temniranrat, P.; Kitvimonrat, A.; Sinthupinyo, W.; Patarapuwadol, S. Using deep learn-ing techniques to detect rice diseases from images of rice fields. In International Conference on Industrial, Engi-neering and Other Applications of Applied Intelligent Systems; Springer: Cham, Switzerland, 2020; pp. 225–237. [Google Scholar]
Lu, Y.; Yi, S.; Zeng, N.; Liu, Y.; Zhang, Y. Identification of rice diseases using deep convolutional neural net-works. Neurocomputing 2017, 267, 378–384. [Google Scholar] [CrossRef]
Jiang, F.; Lu, Y.; Chen, Y.; Cai, D.; Li, G. Image recognition of four rice leaf diseases based on deep learning and support vector machine. Comput. Electron. Agric. 2020, 179, 105824. [Google Scholar] [CrossRef]
Bhattacharya, S.; Mukherjee, A.; Phadikar, S. A deep learning approach for the classification of rice leaf diseas-es. In Intelligence Enabled Research; Springer: Singapore, 2020; pp. 61–69. [Google Scholar]
Mathulaprangsan, S.; Lanthong, K.; Jetpipattanapong, D.; Sateanpattanakul, S.; Patarapuwadol, S. Rice diseases recognition using effective deep learning models. In Proceedings of the Joint International Conference on Digi-tal Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON), Pattaya, Thailand, 11–14 March 2020; pp. 386–389. [Google Scholar]
Rahman, C.R.; Arko, P.S.; Ali, M.E.; Khan, M.A.I.; Apon, S.H.; Nowrin, F.; Wasif, A. Identification and recogni-tion of rice diseases and pests using convolutional neural networks. Biosyst. Eng. 2020, 194, 112–120. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; Zhang, D.; Nanehkaran, Y.A.; Li, D. Detection of rice plant diseases based on deep transfer learning. J. Sci. Food Agric. 2020, 100, 3246–3256. [Google Scholar] [CrossRef]
Joshi, A.A.; Jadhav, B.D. Monitoring and controlling rice diseases using Image processing techniques. In Proceedings of the International Conference on Computing, Analytics and Security Trends (CAST), Pune, India, 19–21 December 2016; pp. 471–476. [Google Scholar]
Wang, S.; Qin, C.; Feng, Q.; Javadpour, F.; Rui, Z. A framework for predicting the production performance of unconventional resources using deep learning. Appl. Energy 2021, 295, 117016. [Google Scholar] [CrossRef]
Richter, S.R.; Vineet, V.; Roth, S.; Koltun, V. Playing for data: Ground truth from computer games. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 102–118. [Google Scholar]
Xu, B.; Shu, X.; Song, Y. X-invariant Contrastive Augmentation and Representation Learning for Semi-Supervised Skeleton-Based Action Recognition. IEEE Trans. Image Process. 2022, 31, 3852–3867. [Google Scholar] [CrossRef] [PubMed]
Kang, J.; Lee, S.; Kim, N.; Kwak, S. Style Neophile: Constantly Seeking Novel Styles for Domain Generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 7130–7140. [Google Scholar]
Wang, K.; Gou, C.; Duan, Y.; Lin, Y.; Zheng, X.; Wang, F.Y. Generative adversarial networks: Introduction and outlook. IEEE/CAA J. Autom. Sin. 2017, 4, 588–598. [Google Scholar] [CrossRef]
Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst. 2020, 33, 12104–12114. [Google Scholar]
Nazki, H.; Yoon, S.; Fuentes, A.; Park, D.S. Unsupervised image translation using adversarial networks for im-proved plant disease recognition. Comput. Electron. Agric. 2020, 168, 105117. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Genera-tive adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Luo, J.; Huang, J.; Li, H. A case study of conditional deep convolutional generative adversarial networks in machine fault diagnosis. J. Intell. Manuf. 2021, 32, 407–425. [Google Scholar] [CrossRef]
Park, S.W.; Ko, J.S.; Huh, J.H.; Kim, J.C. Review on generative adversarial networks: Focusing on computer vi-sion and its applications. Electronics 2021, 10, 1216. [Google Scholar] [CrossRef]
Fang, W.; Zhang, F.; Sheng, V.S.; Ding, Y. A method for improving CNN-based image recognition using DCGAN. Comput. Mater. Contin. 2018, 57, 167–178. [Google Scholar] [CrossRef]
Yang, J.; Li, T.; Liang, G.; He, W.; Zhao, Y. A simple recurrent unit model based intrusion detection system with DCGAN. IEEE Access 2019, 7, 83286–83296. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Guo, T.; Xu, C.; Huang, J.; Wang, Y.; Shi, B.; Xu, C.; Tao, D. On positive-unlabeled classification in GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8385–8393. [Google Scholar]
Song, J.; Ermon, S. Bridging the gap between f-gans and wasserstein gans. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020; pp. 9078–9087. [Google Scholar]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8110–8119. [Google Scholar]
Zhao, S.; Liu, Z.; Lin, J.; Zhu, J.Y.; Han, S. Differentiable augmentation for data-efficient gan training. Adv. Neural Inf. Process. Syst. 2020, 33, 7559–7570. [Google Scholar]
Viazovetskyi, Y.; Ivashkin, V.; Kashin, E. Stylegan2 distillation for feed-forward image manipulation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 170–186. [Google Scholar]
Kumari, N.; Zhang, R.; Shechtman, E.; Zhu, J.-Y. Ensembling off-the-shelf models for gan training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; 2022; pp. 10651–10662. [Google Scholar]
Hermosilla, G.; Tapia, D.I.H.; Allende-Cid, H.; Castro, G.F.; Vera, E. Thermal Face Generation Using StyleGAN. IEEE Access 2021, 9, 80511–80523. [Google Scholar] [CrossRef]
Štepec, D.; Skočaj, D. Image synthesis as a pretext for unsupervised histopathological diagnosis. In International Workshop on Simulation and Synthesis in Medical Imaging; Springer: Cham, Switzerland, 2020; pp. 174–183. [Google Scholar]
Frid-Adar, M.; Diamant, I.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 2018, 321, 321–331. [Google Scholar] [CrossRef]
Saiz, F.A.; Alfaro, G.; Barandiaran, I.; Graña, M. Generative Adversarial Networks to Improve the Robustness of Visual Defect Segmentation by Semantic Networks in Manufacturing Components. Appl. Sci. 2021, 11, 6368. [Google Scholar] [CrossRef]
Jullum, M.; Løland, A.; Huseby, R.B.; Ånonsen, G.; Lorentzen, J. Detecting money laundering transactions with machine learning. J. Money Laund. Control. 2020, 23, 173–186. [Google Scholar] [CrossRef]
Sheema, D.; Ramesh, K.; Renjith, P.N.; Lakshna, A. Comparative Study of Major Algorithms for Pest Detection in Maize Crop. In Proceedings of the International Conference on Intelligent Technologies, Hubbali, India, 25–27 June 2021; pp. 1–7. [Google Scholar]
Ngugi, L.C.; Abelwahab, M.; Abo-Zahhad, M. Recent advances in image processing techniques for automated leaf pest and disease recognition–A review. Inf. Process. Agric. 2021, 8, 27–51. [Google Scholar] [CrossRef]
Shu, X.; Xu, B.; Zhang, L.; Tang, J. Multi-Granularity Anchor-Contrastive Representation Learning for Semi-Supervised Skeleton-Based Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 1, 1–18. [Google Scholar] [CrossRef] [PubMed]
Nie, Y.; Zamzam, A.S.; Brandt, A. Resampling and data augmentation for short-term PV output predition based on an imbalanced sky images dataset using convolutional neural networks. Solar Energy 2021, 224, 341–354. [Google Scholar] [CrossRef]
Hu, C.; Sapkota, B.B.; Thomasson, J.A.; Bagavathiannan, M.V. Influence of image quality and light consistency on the performance of convolutional neural networks for weed mapping. Remote Sens. 2021, 13, 2140. [Google Scholar] [CrossRef]
Kurutach, T.; Tamar, A.; Yang, G.; Russell, S.J.; Abbeel, P. Learning plannable representations with causal in-fogan. Adv. Neural Inf. Process. Syst. 2018, 31, 8747–8758. [Google Scholar]
Huang, X.; Li, Y.; Poursaeed, O.; Hopcroft, J.; Belongie, S. Stacked generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5077–5086. [Google Scholar]
Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13001–13008. [Google Scholar]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. Mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Pandian, J.A.; Geetharamani, G.; Annette, B. Data augmentation on plant leaf disease image dataset using im-age manipulation and deep learning techniques. In Proceedings of the IEEE 9th International Conference on Advanced Computing (IACC), Tiruchirapalli, India, 13–14 December 2019; pp. 199–204. [Google Scholar]
Hady, A.A.; Ghubaish, A.; Salman, T.; Unal, D.; Jain, R. Intrusion detection system for healthcare systems using medical and network data: A comparison study. IEEE Access 2020, 8, 106576–106584. [Google Scholar] [CrossRef]
Mao, J.; Xiao, T.; Jiang, Y.; Cao, Z. What can help pedestrian detection? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3127–3136. [Google Scholar]
Punmiya, R.; Choe, S. Energy theft detection using gradient boosting theft detector with feature engineer-ing-based preprocessing. IEEE Trans. Smart Grid 2019, 10, 2326–2329. [Google Scholar] [CrossRef]
Chaabene, H.; Negra, Y.; Bouguezzi, R.; Capranica, L.; Franchini, E.; Prieske, O.; Granacher, U. Tests for the as-sessment of sport-specific performance in Olympic combat sports: A systematic review with practical recom-mendations. Front. Physiol. 2018, 9, 386. [Google Scholar] [CrossRef] [PubMed]
Gikunda, P.K.; Jouandeau, N. State-of-the-art convolutional neural networks for smart farms: A review. In Intelligent Computing, Proceedings of 2019 Computing Conference, London, UK, 16–17 July 2019; Springer: Cham, Switzerland, 2019; pp. 763–775. [Google Scholar]
Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous structural visual inspection us-ing region-based deep learning for detecting multiple damage types. Comput. Aided Civ. Infrastruct. Eng. 2018, 33, 731–747. [Google Scholar] [CrossRef]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 818–833. [Google Scholar]
Binkowski, M.; Simonyan, K.; Donahue, J.; Clark, A.; Dieleman, S.E.L.; Elsen, E.K.; Casagrande, N. High Fidelity Speech Synthesis With Adversarial Networks. U.S. Patent 17/032,578, 2021. [Google Scholar]
Sajjadi, M.S.; Bachem, O.; Lucic, M.; Bousquet, O.; Gelly, S. Assessing generative models via precision and recall. Adv. Neural Inf. Process. Syst. 2018, 31, 5228–5237. [Google Scholar]
Sethy, P.K.; Barpanda, N.K.; Rath, A.K.; Behera, S.K. Deep feature based rice leaf disease identification using support vector machine. Comput. Electron. Agric. 2020, 175, 105527. [Google Scholar] [CrossRef]
Tzutalin. LabelImg. Git Code. 2015. Available online: https://github.com/tzutalin/labelImg (accessed on 22 March 2022).
Clark, A. Pillow (PIL Fork) Documentation. 2015. Available online: https://buildmedia.readthedocs.org/media/pdf/pillow/latest/pillow.pdf (accessed on 25 March 2022).

Figure 1. Detected blurry image.

Figure 2. Detected as not blurry image.

Figure 3. Visual illustration of the proposed GAN-based augmentation pipeline.

Figure 4. GAN Generator.

Figure 5. GAN Discriminator.

Figure 6. GAN process.

Figure 7. Illustration for intuitive GAN training process.

Figure 8. SG2-ADA network architecture. (a) SG2 generator (b) SG2 discriminator Hence, SG2 performs data augmentation after the input vector for both (a,b).

Figure 9. SSD Model (Liu et al. [56]).

Figure 11. Real images.

Figure 12. Synthetic images.

Figure 13. Training and validation losses between the proposed augmentation pip line and the standard augmentation method on Faster-RCNN.

Figure 14. Training and validation losses between the proposed augmentation pipeline and the standard augmentation method on SSD.

Figure 15. K-fold validation.

Figure 16. Boxplot visualizing the differences in the minimum training and validation losses between the proposed augmentation pipeline and the standard augmentation method.

Table 1. Limitations and advantages of the standard method.

Limitations of Standard Data Augmentations	Advantages of Standard Augmentation
Standard data augmentation still inherits biases from the original datasets.	It reduces the cost of collecting or generating and labeling new datasets.
Finding an effective optimal standard data augmentation approach can be challenging.	It reduces data scarcity and overfitting whilst improving model accuracy.
Quality assurance for standard data augmentation is costly.	Helps with resolving class imbalance and creates variability in data models.

Table 2. Results of the SG2-ADA.

	FID	KID	P & R
	FID	KID	Precision	Recall
SG2-ADA	26.67	0.08	0.49	0.14

Table 3. Results of the proposed GAN-based augmentation pipeline generative model.

CNN Models	Rice Leaf Diseases				Standard Augmentation (mAP)	GAN-Based Augmentation (mAP)
CNN Models	BB	TG	BS	RB	Standard Augmentation (mAP)	GAN-Based Augmentation (mAP)
Faster-RCNN	0.78\|0.90	0.89\|0.93	0.79\|0.92	0.88\|0.95	0.84	0.93
SSD	0.77\|0.89	0.85\|0.90	0.76\|0.91	0.84\|0.93	0.81	0.91

Table 4. K-fold cross validation classification accuracy (%) of the four model methods.

Dataset (k = 5)	Gan- Faster-rcnn		Standard Faster-rcnn		Gan-ssd		Standard-ssd
Dataset (k = 5)	Validation	Train	Validation	Train	Validation	Train	Validation	Train
k-fold 0	85.25	91.78	81.56	84.21	84.32	92.81	79.21	82.95
k-fold 1	76.11	89.33	71.28	77.37	77.51	84.58	71.13	76.27
k-fold 2	87.21	94.08	82.93	86.18	85.98	86.47	80.91	82.78
k-fold 3	91.38	95.10	87.39	90.08	90.38	92.98	85.72	89.97
k-fold 4	78.94	88.85	71.43	83.59	79.17	86.71	72.67	84.88
Average	83.78	91.83	78.92	84.29	83.47	88.71	77.93	83.37

Table 5. Pairwise comparison between the minimum training and validation losses from the proposed augmentation pipeline and the standard augmentation method.

Augmentation	Model	Data Type	t-Test (p-Value)
Proposed vs. Standard	Faster-RCNN	Training	$4.8 \times 10^{- 3}$
Proposed vs. Standard	Faster RCNN	Validation	$9.1 \times 10^{- 4}$
Proposed vs. Standard	SSD	Training	$3.6 \times 10^{- 6}$
Proposed vs. Standard	SSD	Validation	$8.3 \times 10^{- 5}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Haruna, Y.; Qin, S.; Mbyamm Kiki, M.J. An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline. Appl. Sci. 2023, 13, 1346. https://doi.org/10.3390/app13031346

AMA Style

Haruna Y, Qin S, Mbyamm Kiki MJ. An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline. Applied Sciences. 2023; 13(3):1346. https://doi.org/10.3390/app13031346

Chicago/Turabian Style

Haruna, Yunusa, Shiyin Qin, and Mesmin J. Mbyamm Kiki. 2023. "An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline" Applied Sciences 13, no. 3: 1346. https://doi.org/10.3390/app13031346

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline

Abstract

1. Introduction

2. Relative Works and Challenges

3. Methodology and Tools

3.1. Research Scheme

3.2. Research Method

3.3. GAN

3.4. Standard Data Augmentation

3.5. SG2-ADA and Its Performances

4. Detection Algorithm and Its Applicability

4.1. Implementing Algorithm and Its Procedures

4.1.1. SSD Model

4.1.2. Faster RCNN

4.2. Performance Measurement

4.2.1. Mean Average

4.2.2. Fréchet Inception Distance (FID)

4.2.3. Kernel Inception Distance (KID)

4.2.4. Precision & Recall (P&R)

5. Experimental Results and Comparative Analysis

5.1. Experimental Datasets Description

5.2. Experimental Scheme and Process

5.3. Image Data Preprocessing

5.4. Synthetic Data Generation

5.5. Comparison of Mean Average Precisions (mAP)

5.6. Comparison of Learning Curves

5.7. K-Fold Validation

5.8. Comparison of Losses with Boxplots

6. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI