1. Introduction
As one of the most important food crops for humans, potato is a significant source of carbohydrates, vitamins, and minerals, with an annual production of up to 370 million tons [
1,
2], but because of the complex growing environment, potato is susceptible to diseases during growth [
3]. For example, blackleg and soft tuber rot are significant bacterial diseases associated with potatoes worldwide [
3,
4,
5]. These pathogens are plant-pathogenic bacteria from the genus Pectobacterium [
3]. They produce enzymes that cause the decay of plant tissues [
6], leading to damage to roots, stems, and leaves, resulting in a severe reduction in yield and storability [
7,
8,
9,
10]. Such bacteria can remain latent during plant growth until favorable conditions for their development, reproduction, and infection prevail [
11]. Once pest and disease toxicity occur, they not only cause losses to agricultural production but also significantly impact human health and the ecological environment [
12,
13,
14,
15,
16]. Therefore, timely and accurate detection and identification of potato diseases is essential to maintain crop yield and quality.
To detect crop disease, traditional methods are based on manual visual inspection and human empirical analysis, which cannot meet the need for rapid and accurate detection of potato diseases [
17]. To accurately identify potato diseases and achieve disease control, management, and prevention, the most popular approach is to combine machine learning and image classification methods with multiple imaging techniques for disease detection of plants [
18,
19,
20,
21]. However, traditional image-based classification methods cannot identify diseases that are difficult to detect beyond RGB images because they only consider image information and lack depth data features [
22].
Hyperspectral imaging has emerged as a crucial technique in recent years, providing valuable spectral and spatial information for potato disease detection and identification [
23,
24,
25]. The combination of hyperspectral image techniques, preprocessing methods, and deep learning convolutional neural networks has proved effective in detecting potato late blight [
26]. Other researchers have used multispectral image systems to detect plant growth in a noninvasive manner [
27], while the Cube CNN SVM (CCS) method has been shown to improve spectral image classification by extracting high-level features directly from raw data [
28]. Previous studies have also shown that 3D-CNN can achieve better classification accuracy than 2D-CNN without preprocessing [
29]. Multiscale wavelets, combined with in-depth feature information extracted by 3D-CNN, can generate super-resolution hyperspectral images from low-resolution ones [
30]. However, initial 3D-CNN networks tend to suffer from overfitting and higher training costs, necessitating more hardware resources and training time, resulting in poor generalization of the overall network model [
31]. To address these issues, a combined 2D–3D model approach can extract both spatial and spectral features, resulting in better fusion features for hyperspectral image classification (HSIC) [
32], while reducing the network structure parameters [
33].
Computer vision application in agriculture has become an alternative solution to manual detection [
34]. Polder et al. designed a hyperspectral line scan device for virus damage detection in different potatoes [
35]. They demonstrated that a deep learning approach improved the accuracy of real-world potato disease detection. Hyperspectral imaging is a valuable tool for disease detection in various crops from different angles (tissue to canopy) [
36]. Atherton et al. [
37,
38] used hyperspectral remote sensing to detect disease in potato plants; they only used spectral information but not imaging sensors. Ray et al. used a point-spectrum approach without considering spatial information [
39]. Hu et al. successfully detected late blight on potato leaves using hyperspectral imaging to improve disease recognition [
40]. Griffel et al. [
41] used SVM to classify spectral features of potato plants infected with PVY obtained with a handheld device with recognition accuracy close to 90%. Kang et al. proposed a lightweight convolutional neural network model [
42] that could identify potato leaves with three different diseases, reducing the number of parameters and improving accuracy. Shi et al. proposed a novel end-to-end deep learning model (CropdocNet) [
43] for accurate and automated late blight diagnosis based on UAV-based hyperspectral images with an average accuracy of 98.09% on the test dataset. Gao et al. [
44], based on high-resolution field-of-view images and deep learning algorithms to extract late blight spots from unstructured field environments, demonstrated that unbalanced weighting of lesion and background categories could improve segmentation performance. Qi et al. [
45] proposed a deep collaborative attention network (PLB-2D-3D-A) by combining 2D convolutional neural network (2D-CNN) and 3D-CNN with a deep collaborative attention network (PLB-2D-3D-A) for hyperspectral deep learning classification architecture for images, showing promising results for early detection of potato late blight by deep learning and near-end hyperspectral imaging. Chen et al. [
46] proposed a weakly supervised learning approach to identify potato plant diseases by extracting high-dimensional features through a hybrid attention mechanism.
Although potato disease detection technology has advanced significantly, there remain some challenges that impede the accurate and rapid identification of diseases. One such obstacle is the variety of potato diseases, which often present with similar symptoms, making them difficult to differentiate. Additionally, the complexity of diseases, which can result from a range of factors such as genetic and environmental conditions, further exacerbates this issue. Moreover, while 3D convolutional neural networks are commonly used for processing hyperspectral data, they are known to have high hardware requirements, and the accuracy of 1D convolutional neural networks for hyperspectral data is often suboptimal. Furthermore, numerous factors such as light, noise, distortion, and color changes present further challenges to disease detection, underscoring the need for increased algorithmic robustness and repeatability.
To address these issues, this paper proposes a novel network architecture that leverages 1D, 2D, and 3D convolutional neural networks [
47] in a multidimensional fusion approach. The network uses dilated convolution [
48,
49,
50] for feature extraction, which avoids data loss and increases the perceptual field compared to the conventional convolution-pooling layer in CNNs. The proposed convolutional operation in different dimensions takes full advantage of hyperspectral data’s spectral and spatial information, reducing network parameters and improving the model’s generalization and classification accuracy. The purpose of this paper is the following:
(1) To address the current problems of potato diseases that can cause serious harm to human health and crop yield and economic losses, we use deep learning technology to provide a new solution for detecting potato diseases to ensure their product and healthy growth. (2) By analyzing the existing technologies for potato disease detection, we innovatively propose a multidimensional fusion Atrous-CNN architecture to solve the problems of insufficient accuracy, low disease recognition rate, high hardware resource consumption, and data loss of current detection technologies. Testing the proposed model on multiple datasets confirmed that it has good detection capability and reduces hardware consumption, which to a large degree meets the current needs of potato disease detection.
3. Analysis of Experimental Results
For the method part, we use the dilated convolution layer instead of the conventional convolution pooling operation to solve the data loss problem in information extraction. We compare the standard convolution with the dilated convolution, as shown in
Figure 7. By comparing the experimental results, using the dilated convolution layer can improve the efficiency of data feature extraction and increase the convolutional computational field while maintaining information integrity.
To better validate the detection performance of the proposed algorithm, the traditional 3D-CNN and multidimensional fusion CNN and multidimensional fusion Atrous-CNN are compared in training experiments. The total data volume is 262,144 (512 × 512), with 209,715 data in training set accounting for 80% of the total data and 52,429 data in the validation set accounting for 20% of the total data. The hardware environment is an Intel Xeon E5-2650 v4 processor, NVIDIA Tesla V100-PCIE-16GB graphics card, and 256G RAM.
Figure 8 shows the training process of hyperspectral data disease detection of potato leaves using three network models. The training results show that the loss function of the training process using the proposed multidimensional Atrous-CNN model decreases faster and converges better than the other two network models. Furthermore, the accuracy of prediction using the multidimensional Atrous-CNN model is also significantly higher than the other two network models. The training performance of this method outperformed the other two models in both 100 and 500 training sessions.
Table 3 shows the comparative training results for classifying potato leaf hyperspectral image data using three network models: 3D-CNN, multidimensional fusion CNN, and multidimensional fusion Atrous-CNN. According to the training results, the training time of the 3D-CNN model is longer than that of the multidimensional CNN structure at 100 training times, the prediction accuracy of feature extraction using null convolution is higher than that of the traditional convolution-pooling operation, and the accuracy of the proposed multidimensional fusion Atrous-CNN model is improved by 0.69% over the multidimensional fusion CNN model in the validation set at 100 training times. The accuracy of the proposed multidimensional fusion Atrous-CNN model is improved by 0.69% compared to the multidimensional fusion CNN model in the validation set at 100 training cycles. The training time is significantly reduced compared to the 3D-CNN network. At 500 training cycles, the accuracy of all three network models for potato plant leaf disease classification improved with increasing training times on the training set performance. Among them, the accuracy of the training set using the multidimensional fusion Atrous-CNN method was as high as 99.78% after the 500th training. The accuracy of this method on the training set was improved by 0.6% compared to the 500th-training 3D-CNN method and by 0.21% compared to the multidimensional fusion CNN. In the validation set, the accuracy of this method improves by 0.15% compared to the training 500 times 3D-CNN method and 0.45% compared to the multidimensional fusion CNN method.
Table 4 shows the results of disease detection using three network models for potato hyperspectral data. The potato hyperspectral data included four types of hyperspectral images: normal leaf pixels, diseased leaf pixels, background pixels, and whiteboard pixels. The results showed the highest prediction accuracy of the four types of pixels using the multidimensional fusion Atrous-CNN model. The recognition accuracy of all types of pixels reached more than 99.7%. Among them, in recognizing diseased leaf pixels, the accuracy was improved by 7.09% compared with the 3D-CNN method and 1.7% compared with the multidimensional fusion CNN method. The results proved that recognizing diseased leaves using multidimensional fusion Atrous-CNN has high effectiveness. Regarding the recognition accuracy of total pixels, the recognition accuracy using the multidimensional fusion Atrous-CNN model improved by 0.51% compared with the 3D-CNN method and by 0.94% compared with the multidimensional fusion CNN method.
In order to evaluate the performance of the model independently of the data set, this study uses the k-fold cross-validation method to divide the hyperspectral data of the diseased pixels five times, that is, the k-fold cross-validation species k = 5, and the number of training sets for each division is 50,508. The number of test sets is 12,627. The data division is shown in
Figure 9. This study trained the five-times-divided data with 1DCNN, SVM, gradient_boosting_model, and multinomial naive Bayesian classification. The evaluation results are shown in
Table 5 and
Figure 10. The average accuracy of k-fold cross-validation of 1DCNN. Compared with the polynomial naive Bayesian model, the accuracy rate increased by 0.3401. Compared with the gradient_boosting_model model, the accuracy rate increased by 0.0276 and the average accuracy rate of SVN increased by 0.047.
The proposed multidimensional fusion Atrous-CNN fuses 3D convolution with 2D-AtrousCNN and 1D-AtrousCNN, which can not only reduce the training parameters of the network but also ensure the model’s effective spatial feature extraction of hyperspectral data compared with the training of the classification task by processing hyperspectral data features entirely through the use of 3D convolution operation. The ability to extract the spatial characteristics of hyperspectral data is also ensured. Compared with the traditional 1D and 2D convolution pooling for feature extraction, the null convolution operation provides no loss of data information. It significantly increases the perceptual field of convolution calculation, which ensures the feature extraction capability of the model. From the spectral data classification performance of leaves, the classification results of hyperspectral data of potato leaves using the proposed algorithm in this paper are more accurate than the other two deep learning models, and the multidimensional fusion Atrous-CNN with cavity convolution is used to train the loss function to fall faster and converge better during the training process.
1DCNN uses convolutional operation for feature extraction, which can more effectively identify the deeper feature information of hyperspectral data than traditional machine learning methods. In the model’s training, the difference between predicted and accurate labels is used to construct the loss function, the gradient descent method minimizes the loss function, and the optimal model is finally obtained through continuous training. From the five tests of k-weight cross-validation, we can see that the potato disease identification model trained by 1DCNN has better accuracy than the three machine learning algorithms. It indicates that the deep learning network using convolutional operations is more effective for feature extraction of hyperspectral data and performs better in the task of spectral classification than the traditional machine learning methods using polynomial and kernel techniques.
After k-fold cross-validation, this study re-divided the data set. The number of training sets was 47,352, including 11,782 pieces of spectral information of anthracnose leaves, 25,402 pieces of spectral information of leaf blight leaves, and 10,168 pieces of spectral information of early blight leaves information. The number of test sets is 15,784, including 3957 pieces of spectral information of anthracnose leaves, 8595 pieces of spectral information of leaf blight leaves, and 3232 pieces of spectral information of early blight leaves.
Figure 11 shows the confusion matrix of the prediction results of the three diseases using the 1D-CNN network. The marked position is the number of disease categories correctly identified using the spectral information of the diseased leaves using the 1D-CNN.
Table 6 shows the classification accuracy and recall rate of the three diseases calculated using the confusion matrix. The accuracy and recall rates of the three diseases in the training set using the 1D-CNN network are above 0.99. In the test set, the recognition accuracy rates of the three diseases were all above 0.98, among which the recognition accuracy rate of anthracnose reached 0.9987, and the recognition recall rates of the three disease test sets were all above 0.97, of which for the anthracnose and leaf blight, the recognition recall rate was above 0.99. In summary, using the 1D-CNN network and hyperspectral image technology to identify potato plant diseases is feasible.
Figure 12 shows the detection results of potato leaves with three anthracnose diseases, leaf blight, and early blight, using multidimensional fusion Atrous-CNN. The results show that this method can effectively extract the characteristic information of hyperspectral data and realize the accurate detection of potato leaf diseases.
In the two classification processes mentioned above, we classified four categories of healthy leaf pixels, diseased leaf pixels, background pixels, and whiteboard pixels, and the other for three diseases, respectively. The latter of these classifications is based on the classification task for the former. We labeled diseased leaf pixels from the first classification result as a new object of study. We performed a secondary classification using 1DCNN in combination with the hyperspectral number of diseased leaf pixels. Both varieties use the hyperspectral data of the leaves. However, because the four categories of objects on the leaves—healthy leaf pixels, diseased leaf pixels, background pixels, and whiteboard pixels—have specific regional connectivity and need to consider the influence of the spectral information of the pixels in their neighborhood, the structure of the network is enriched by considering the spatial, spectral data in the first classification. The second classification does not influence surrounding pixels, and the variety is predicted only based on the tremendous spectral details of the diseased pixels of the leaves.