1. Introduction
Low temperature, scant lighting, high humidity and other extremely complicated greenhouse environments frequently cause cucumber diseases. Moreover, most diseases spread rapidly. Thus, accurate and rapid identification of diseases in the early stage of infection has great practical significance. Traditional methods rely on naked eye observation [
1,
2], pathologic analysis including the microscopic observation of pathogen morphology, as well as molecular, serological, and microbiological diagnostic techniques [
3]. Because of the poor real-time performance and high requirement for professional analysts, pathologic analysis is rarely used in practical production [
4]. As for naked eye observation, it lacks unified measurement criteria to go on and is influenced by the observer’s subjective consciousness and empirical knowledge, which often results in a wrong diagnosis. Moreover, due to the resolution ratio of the human eye, it is almost impossible to distinguish diseases only by naked eye especially at the early stage of infection.
With the rapid development of computer vision and artificial intelligence, visual-image (composed of three wavelength bands: 475, 520 and 650 nm) processing technique has been successfully exploited for disease diagnosis [
5,
6,
7,
8]. The earliest study can date back to the mid-1980s. In 1985, Yasuoka et al. [
9] researched the infrared image of crop blades polluted by noxious gas. Since then, plant disease diagnosis by analyzing the image of diseased blade started. Based on the optical filtering and spectroscopic characteristics on healthy and diseased leaves, Sasaki et al. [
10] established identification parameters using a genetic algorithm and studied the automatic diagnosis of cucumber anthracnose. El-Helly et al. [
11] developed an image processing system to automatically detect disease spots and well-differentiated cucumber downy mildew and powdery mildew diseases using an artificial neural network. Geng et al. [
12] analyzed the mean distribution of Cb and Cr channels in YCbCr space, and effectively separated the information pertaining to cucumber downy mildew by constructing an algorithm combining Cb and Cr channels. Peng et al. [
13] extracted the color and texture features of cucumber blades and established a linear discriminant model for cucumber downy mildew and anthracnose. To effectively reduce the computation cost and improve the identification performance, Zhang et al. [
14] segmented diseased blades by
K-means clustering, extracted shape and color features from lesions, and realized the diagnosis using sparse representation classifiers. Their success suggests that the visual-image processing technique has great potential in plant disease diagnosis. However, their effectiveness depends on obvious symptoms. In other words, they work well only with obvious disease spots containing the information of color, shape, texture, etc. But at the early infection stage, disease symptoms are often unobvious, and visual-image-based methods struggle to work in such a situation.
Unlike the common methods above, hyperspectral imaging (HSI) technique obtains both the spatial and spectral information of plants over a large range of the light spectrum, which has shown significant potentials and advantages for identifying plant diseases [
2,
3,
15]. As we know, after infection, changes in plant tissues occur earlier than disease symptoms and can be reflected by the radiation to electromagnetic waves. Given that, HSI technique can be utilized to detect diseases based on the variations of reflectance spectra, even if symptoms are unobvious. Over the past few years, many HSI-based methods and systems have been developed, which can be roughly divided into two categories: feature-extraction-based method and the effective-wavebands-based method. The latter category also includes methods on reflectance indices obtained by combining the effective wavebands. One of the best reflectance indices is the photochemical reflectance index (PRI), introduced by Gamon et al. [
16], which can show stress-induced changes in photosynthesis [
17]. Though reflectance indices can simplify the analysis of the reflectance spectrum, they can be affected by many factors, such as illumination, atmosphere, soil background and location [
17,
18]. Compared with the reflectance indices with fixed calculation formulas, feature-extraction-based methods have the advantage that one can autonomously design appropriate algorithms to extract features that are invariant to interferences such as illumination variations and atmospheric noise to a certain degree. Below, we introduce some effective feature-extraction-based methods. Ma et al. [
19] proposed an identification method for Fusarium head blight by applying continuous wavelet analysis to the reflectance spectra of wheat ears. Chai et al. [
20] proposed rapid identification of cucumber diseases based on HSI and distance discriminant analysis. Barbedo et al. [
21] presented an automatic method to detect the fusarium head blight disease in wheat kernels by performing morphological mathematical operations and spectral band manipulations on hyperspectral data. Based on the HSI technique, Cen et al. [
22] detected chilling injury in cucumbers by combining three feature-extraction methods with two traditional classification methods, and achieved the overall accuracy of 90.5%. Zhu et al. [
23] utilized machine learning classifiers and variable selection methods to research the potential of pre-symptomatic identification of tobacco disease. Although the HSI technique has the capability to detect diseases at a much earlier infection stage, the vast majority of current studies are still concentrated on the cases with obvious lesions.
Consequently, this paper aims to establish an early identification method for cucumber diseases based upon HSI technique. By analyzing the reflectance spectra of diseased and normal leaves, it can be observed that the spectral curves of different diseases have a certain degree of similarity in appearance and shape; besides, the coverage areas of the spectral curves corresponding to different diseases are almost coincident. Therefore, it is very difficult to distinguish diseases in the original hyperspectral data space. Moreover, hyperspectral data are generally of high dimensionality and direct processing may result in high computation and time costs. To address such problems, this study attempts to train a discriminative projection to transform the spectral curves into a low dimensional space, in which the similarity of spectral curves of the same disease is enhanced while that of the different diseases is weakened. However, even if the above goal is achieved, the projection does not necessarily guarantee a positive impact on the ultimate goal because the training procedure is completely independent of the subsequent diagnosis. To address this problem, we establish a connection between them by utilizing the decision rule of the collaborative representation classifier (CRC) [
24] to steer the training procedure. Since the label and spatial distribution information of the data is usually of great importance for discrimination [
25], we additionally design graph constraints to steer the training procedure. In summary, this paper presents a graph constraint and CRC-steered discriminative projection learning method (CRC-DP) and applies it to the early identification of cucumber diseases.
2. Materials and Methods
2.1. Acquiring the Hyperspectral Data
‘Lufeng’ cucumber is a widely cultivated cucumber variety because of its strong growth vigor and resistance to diseases such as downy mildew, powdery mildew and fusarium wilt. Herein, it was used for experiments. A total of 55 healthy cucumber plants were selected. Their age was 36 days. All the selected plants were of a similar growth condition and had three leaves. Among these, 25 plants were randomly selected for inoculation against cucumber anthracnose; another 25 plants were inoculated against cucumber Corynespora cassiicola; and the above 50 plants formed the inoculation group; the remaining 5 healthy plants formed the healthy control group. The strains were purchased from the agricultural culture collection of China. Inoculation was conducted by manually making a small cut on the leaf using a sharp knife and then covering the cut with a small mycelia block. Two leaves were inoculated for each plant. After inoculation, plants of different groups were put in different artificial climate boxes for cultivation. The artificial environment was controlled with a relative humidity of 90% and temperatures of 28 °C and 24 °C respectively for day and night. The illumination and darkness durations were set to 16 h and 8 h, respectively. LED lights with illuminance of 22,000 lx were used to provide illumination at cultivation. About 24 h later, hyperspectral images of 100 inoculated leaves in the inoculation group and two normal leaves of each plant in healthy control group were acquired every 24 h using a push-broom HSI system named GaiaSorter (Dualix spectral imaging, Chengdu, China). HSI images stopped being collected after 12 days. Hence, there were 1320 hyperspectral images with each image containing one leaf. The HSI system comprised two hyperspectral imaging units (visible and near infrared), a horizontal motorized translation stage (HSIA-T1000), image acquisition software (SpecView), and a uniform illumination light source (HSIA-LS-T-H), which was composed of 8 halogen lamps with adjustable light intensity and provided spectra of 350–2500 nm. In this paper, we only used the visible hyperspectral imaging unit to collect raw hyperspectral images, which consisted of 256 spatially resolved reflectance profiles with 1394 × 1024 pixels for the wavelengths of 391 to 1044 nm with a spectral resolution of 2.8 nm. Leaves with lesions occupying less than 20% of the leaf area were selected for experiments.
Affected by the measurement environment, the status of experimental devices, the skill level of operators and other factors, the collected hyperspectral images often contained some noise and disturbing information. To alleviate their adverse effects, a correction was performed using the following formula:
where,
and
respectively represents the hyperspectral image before and after correction;
is a dark reflection image obtained when the halogen lights are turned off and the camera lens is completely covered with its own non-reflective opaque black cap with 0% reflectance;
is a white reflection image obtained by capturing the hyperspectral image of a Teflon white board with 99% reflectance. Afterwards, the spectral curves of pixels within disease lesions were extracted for further analysis.
2.2. Proposed CRC-DP Method
As stated in the introduction, identifying different diseases directly in the original hyperspectral data space is difficult. Thus, we aimed to locate a low-dimensional space in which the projected spectral curves of different diseases can be well separated. Here, for narrative convenience, we take each vectorized spectral curve as a sample and refer to ‘cucumber anthracnose disease’, ‘cucumber Corynespora cassiicola disease’ and ‘normal plant’ as the first, second and third class of disease, respectively. The CRC-DP method consists of two sequential procedures, which are respectively described in detail as follows.
2.2.1. Offline Training Stage
Suppose each class has enough training samples spanning a subspace and any sample from this class lie on this subspace. Let
represent all the training samples in the high-dimensional input space, where
is the
training sample and
is the number of training samples.
are linearly converted to new ones in low-dimensional space by
, where
is the desired discriminative projection matrix. According to a modified collaborative representation model, each training sample in the low-dimensional space is encoded as a linear combination of the rest training samples by Equation (2):
where
; the collaborative representation coefficient vector
is a
k-dimensional column vector whose
element is forced to zero;
is a column vector consisting of all ones. Obviously, Equation (2) can be considered as a least-square problem and thus has an analytical solution. Since the negative coefficients in
have no practical significance, they are further updated using Equation (3):
By doing this, a new coefficient vector can be obtained.
Based on the decision rule of CRC, each sample from the
class should be well represented by the training samples from the
class. To this end, a same-class reconstruction residual is defined as:
where
and
is a column vector obtained by preserving the entries of
associated with the
class and setting the rest to zeros. Beyond that, training samples from the
class should not be able to well represent this sample. To this end, we define a different-class reconstruction residual as:
where
and
is the number of classes. The above two reconstruction residuals are named as discriminative reconstruction fidelity terms. To meet the decision rule of CRC, the same-class reconstruction residual is imposed to be as small as possible while the different-class reconstruction residual is imposed to be as large as possible. The discriminative fidelity terms are powerful for both representation and classification but fail to take into consideration the spatial distribution and label information of the training samples, which are of great importance for classification.
To solve the above problem, we introduce two novel graph constraints to associate the class labels with the spatial distributions of training samples. First of all, a same-class graph
and a different-class graph
are respectively constructed as follows:
where
denotes the class label of
with
.
reflects the relation of samples belonging to the same class while
reflects the relation of samples belonging to different classes. To ensure samples from different classes can be well separated, the CRC-DP method encourages that in the low-dimensional space, if two training samples are from the same class, they should reside close to each other, and if two training samples are from different classes, they should be far away from each other. To this end, a same-class graph constraint and a different-class graph constraint are, respectively, mathematically formulated as:
where
and
is a diagonal matrix with entry
the summation of the
row of
.
and
is a diagonal matrix with
entry the summation of the
row of
. Differing from the local-graph constraint proposed by Zheng et al. [
26] which preserves the local (neighborhood) structure of data, the graph constraints force the training samples from the same class more concentrated and avoids parameter selection. To enhance the discrimination, we need to minimize the same-class graph constraint and maximize the different-class graph constraint.
Finally, we incorporate the fidelity terms with the graph constraints and formulate the objective function as:
where
and
. The optimal projection matrix
can be determined by maximizing the objective function (Equation (10)). We impose a constraint
on the objective function. By doing this,
can be formed by the generalized eigenvectors of
corresponding to the largest
eigenvalues. However,
in
and
are unknown beforehand, so we solve
and
in an iterative manner.
is initialized using a
random matrix and each iteration mainly includes four steps: (a) project
to
using
; (b) solve the collaborative representation coefficients
using Equations (2) and (3); (c) compute
and
; (d) obtain a new projection matrix
by maximizing the objective function (10). Repeat the above steps until the difference of the objective function values between two iterations is smaller than a predefined value
.
2.2.2. Online Identification Stage
Given a query sample
whose identity (for disease identification, “identity” refers to the type of disease that the query sample is infected with) is unknown beforehand, we determine it as follows. Firstly,
is converted to a
vector by
. Then, we collaboratively represent
as
using all the training samples in the low-dimensional space. And the coefficient vector
is obtained by solving a regularized least square problem:
The identity
of
can be determined by evaluating which class of training samples leads to the minimal reconstruction residual, as follows:
To show our method more concisely, the overall framework and detailed steps of CRC-DP method is summarized as Algorithm 1 and a flowchart of CRC-DP method is plotted in
Figure 1.
Algorithm 1. CRC-DP method. |
Input: the query sample , the training samples , parameters and . |
Offline training stage: |
1. Initialize using a random matrix. |
If the values of objective function between two iterations is larger than , repeat steps 2–5. |
2. Project to the -dimensional space by . |
3. Solve using Equations (2) and (3). |
4. Calculate and . |
5. Update using the generalized eigenvectors of corresponding to the largest eigenvalues. |
Online identification stage: |
1. Transform by . |
2. Represent as and solve the coefficient vector . |
3. Determine the identity of by Formula (12). |
2.3. Experiment Design and Setup
The proposed method consists of two parts: training a projection matrix to transform samples into a low-dimensional space and then identifying disease using the modified CRC. The former, if used as a dimension reduction (DR) operation, can be applied to classification problem (disease identification is also a classification problem). As we know, DR should be beneficial for the subsequent classification. In other words, the samples of different classes should have better separation trends after DR. Thus, to verify whether DR using the CRC-DP method can lead to better separation trends than using other DR methods, we first performed different DR methods on two types of easily accessible unitless data (a manually created toy dataset and wine dataset from UCI [
27]) to project them to a low-dimensional space. Then, to evaluate our method’s capability in the early diagnosis of plant disease, some experiments are performed using the hyperspectral data collected in the early infection stage of cucumber anthracnose and cucumber
Corynespora cassiicola. Herein, the training and testing sets are prepared as follows unless otherwise stated: 1000 hyperspectral curves were extracted from the lesions of each disease, among which, half were randomly selected for training and the rest were used for testing. As for normal leaves, 500 hyperspectral curves were extracted for training and testing, respectively. Each hyperspectral curve is vectorized by stacking the reflectance values of band 391–1045 nm. By doing this, each sample is a 256 × 1 column vector. After that, normalize it to have the unit
l2-norm and then take it as a sample. Thus, the training and testing sets respectively have three classes of 1500 samples. Note that ‘normal’ was considered as the third type of disease for narrative convenience. For comparison, we also assessed the performances of five other classifiers: support vector machines (SVM),
K-nearest neighbor classifier (KNN), naive Bayes classifier (NB), random forest classifier (RF) and discriminant analysis classifier (DA).
Here, we briefly introduce the principles of these five classifiers:
SVM seeks hyperplanes to classify samples in high-dimensional space. The goal of SVM is to maximize the margin between hyperplanes and support vectors, which can be solved by transforming into a convex quadratic programming problem.
The core idea of KNN classifier is that if the majority of the K most-similar samples of a query sample belong to a certain category, then the query sample also belongs to this category. KNN does not require training.
The principle of NB is to calculate the posterior probability of the query sample using its prior probability, and the query sample belongs to the class with the largest posterior probability.
RF repeatedly randomly selects samples with placement from the original training set to generate a new training set to train decision tree, then repeat the above steps to train multiple decision trees to form a random forest. Given a query sample, each decision tree is used to make a decision and finally determine which category it belongs to by voting.
Distance-based DA calculates the distance between the query sample and the mean of all the training samples of each class. Then, the query sample is classified into the class with the minimal distance.
According to experimental experiences, unless otherwise specified, parameter in CRC-DP method is set as 0.05; the number of neighbors in KNN and the number of decision trees in RF take values between 1 and with intervals of 1 and 25 respectively, where represents the number of training samples per class. We report their best results here. All the experiments are carried out on a 2.1 GHz computer with 64 GB RAM.