**1. Introduction**

Grapes are one of the most popular fruits due to its unique taste, multiple vitamins, and nutrients. Grapes can be eaten fresh and processed into various products, for instance, juice and wine. Thus, there exists excellent commercial potential for the grape industry. During the grape growing season, fungicides, insecticides, and herbicides are often applied to cure the stresses of the diseases and pests [1,2]. The pesticide residue in grapes has increasingly aroused the attention of consumers. Certain intake of pesticide residue content may harm consumers' health [3,4].

Various methods have been developed for the detection of pesticide residue in fruits and vegetables [5]. Generally speaking, they can be divided into conventional and rapid detection methods. Traditional detection methods for detecting pesticide residues include gas chromatography (GC) and capillary electrophoresis (CE) [6], gas chromatographymass spectrometry (GC-MS) [7], high-performance liquid chromatography (HPLC) [8], supercritical fluid chromatography (SFC) [9], and so on. Rapid detection methods include the fast detection card method and enzyme inhibition rate method. These methods have high accuracy for the detection of pesticide residue. However, they are costly. Moreover, they require complex pre-processing and highly skilled operators.

**Citation:** Ye, W.; Yan, T.; Zhang, C.; Duan, L.; Chen, W.; Song, H.; Zhang, Y.; Xu, W.; Gao, P. Detection of Pesticide Residue Level in Grape Using Hyperspectral Imaging with Machine Learning. *Foods* **2022**, *11*, 1609. https://doi.org/10.3390/ foods11111609

Academic Editors: Samuli Urpelainen and Mourad Kharbach

Received: 9 May 2022 Accepted: 27 May 2022 Published: 30 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Hyperspectral imaging (HSI) is a technology that combines spectroscopy and conventional imaging to attain the spectral and spatial information from the research object [10]. HSI has been used effectively in the non-destructive quality detection of grapes, such as total soluble solids [11–13], total phenolic compounds [12], polyphenol contents [14], amino acids [11], and PH [11,12], etc. Moreover, there have been quantitative analyses, such as discriminating geographical origin [15], the year of harvest [15,16], and the maturation stage [17], etc.

Detection of pesticide residue in agricultural products combined with HSI technology has also been used widely, due to its advantage of rapid, non-destructive, and accurate quality detection. Sun et al. used HSI technology (431–962 nm) to quantitatively identify the pesticide mixtures on lettuce leaves [18]. Jia et al. detected apple surface pesticide residue based on HSI technology (865–1712 nm) [19]. Mohite et al. used hyperspectral sensing (350–1052 nm) to detect pesticide (Cyantraniliprole) residue on grapes with no, single, and double doses [20]. Ren used HSI technology (900–170 nm) to distinguish various concentrations of pesticide residues of dimethoate on the surface of spinach leaves [21]. Sun et al. identified pesticide residues in lettuce combining chemical molecular structure and NIR hyperspectral (870–1780 nm) [22]. Jiang et al. used NIR HIS (390–1050 nm) to predict the distribution of pesticide residues on mulberry leaves and visualize the results [23]. Studies have shown that HSI technology has been widely used in the nondestructive detection of pesticide residue in agricultural products. However, research on the pesticide residue in grapes is still rare, and a single spectral region was studied for most. Therefore, it is feasible and proposed to use hyperspectral imaging technology to detect different levels of pesticide residues in grapes here.

It is a great challenge to research massive and redundant data obtained by hyperspectral imaging systems (HIS) effectively, which prevents its application. Machine learning is exceptionally crucial for predicting features and analyzing spectral information. Recently, deep learning, as a new method of machine learning, has gained remarkable results for detecting and classifying the spectral and spatio-spectral signatures in HIS. Deep learning learns features deeply and automatically, and processes large volumes of data effectively [24–26]. Thus, it can construct a network containing many neurons efficiently and quickly, and it is applied widely in spectroscopy [27–30]. Yan et al. used HIS with deep learning to detect geographical origin of Radix Glycyrrhizae [31]. Jiang et al. used HIS with AlexNet-CNN deep learning network to detect postharvest pesticide residues [32]. Dreier et al. used CNN and ResNet with HSI to identify the bulk grain [33]. Gomes et al. used deep learning CNN to predict sugar and pH levels in grapes [34]. Deep learning has decent performance, but the process is obscure and difficult to understand. The contribution of wavelength is visualized to observe crucial wavelengths, which can explain the deep learning process well and analyze data effectively.

The purpose of the study was to use hyperspectral imaging technology combined with machine learning to identify the different pesticide residue levels in grapes. The specific goals were: (1) to explore the spectral differences among different pesticide residue levels of different varieties of grape; (2) to compare the performances of hyperspectral imaging at two different spectral regions for pesticide residue level identification; (3) to compare the performances of conventional machine learning methods (LR, SVM, and RF) and deep learning (CNN and ResNet); (4) and to explore the spectral features of different models which contribute more to the identification.

### **2. Materials and Methods**

#### *2.1. Samples Preparation*

The research was carried out in the laboratory and simulated the process of spraying pesticides. Three grape varieties were used in this study, including Munage, Cabernet Sauvignon (Cabernet), and Red grape. The fresh grapes of Munage were purchased from the Jinma Market near Shihezi University, and Cabernet and Red grape were collected from the experimental vineyard located in the School of Agriculture, Shihezi University, Xinjiang Uygur Autonomous Region (Xinjiang), China (73◦40 –96◦18 E, 34◦25 –48◦10 N). Each grape variety was randomly divided into four groups, corresponding to four different concentrations of pesticide residues (corresponding to four levels mentioned later). To increase the number of samples and comply with sampling inspection in the actual production, the bunch of the grape was cut smaller, considering the cluster of 3–6 berries as a sample, as shown in Figure 1. After cutting off grape bunches, 288 clusters of Cabernet, 411 clusters of Red grape, and 372 clusters of Munage were collected. In total, 1071 small clusters of grapes were used as input samples. The sample data were randomly divided into training, validation, and test sets with a ratio of 3:1:1. The specific sample size of clusters of the grape is shown in Table 1.

**Figure 1.** The flow chart of spraying pesticides and obtaining clusters of the grape.


**Table 1.** Number of samples after cutting intact grapes.

Level 1, Level 2, and Level 3 mean the pesticide mixtures with concentrations of 10%, 15%, and 50% prepared later, and Level 0 means distilled water.

In this study, Jiatu (25% trifloxystrobin, 50% tebuconazole), Xishuangke (56% cymoxanil, 14% cyazofamid), and Huiyin (80% procymidone) were prepared, and the details are shown in Table 2. According to relevant information and instructions, these pesticide mixtures do not react chemically but only enhance the effect. Pesticide mixtures were sprayed on the grapes to simulate the pesticide residue. Different pesticides were applied to evaluate their effects on the growth of the grape. One reason for choosing these three pesticides was wide use during the ripening period of the grapes, and the other was the recommendations and suggestions of the planter. Roughly speaking, Jiatu, Xishuangke, and Huiyin are common fungicides, and they have a certain inhibitory effect on the growth of fungi.


**Table 2.** Information about the pesticides used in the experiment.

There were two steps to making pesticides mixtures:

(1) Make standard pesticide mixtures. According to the instructions of each pesticide, three single-pesticide solutions (Jiatu, Huiying, and Xishaungke) were prepared with the proportion of 1:4000, 1:6000, and 1:24,000, respectively. Then, the three singlepesticide solutions were mixed together to make a 2 L pesticide mixture, as 100% standard pesticide mixtures.

(2) Make three pesticides mixtures. A beaker was used to dilute the 100% standard solution into three different pesticide mixtures. Concentrations of three pesticide mixtures were 10%, 15%, and 50% (respectively corresponding to Level 1, Level 2, and Level 3). Level 0 represented distilled water as a control group for comparing with others.

The corresponding concentration of the final configuration of each pesticide is shown in Table 3.



<sup>a</sup> means distilled water; b,c,d mean the pesticide mixtures with Level 1, 2, and 3, corresponding to concentrations of 10%, 15%, and 50%. The unit of concentrations is g/L.

With a spraying bottle, four groups of grapes were sprayed with Level 0, 1, 2, and 3 mixed pesticides, respectively. Then, the sprayed grapes were placed in a low-temperature and ventilated area for air drying for about 36 h [18,23,35–37]. When there was no more water on the grape surface, each intact bunch of grapes was cut, as shown in Figure 1.

#### *2.2. Hyperspectral Image Acquisition and Correction*

In this study, Vis-NIR and NIR HISs (SOC 710VP and SOC 710SWIR) were used in obtaining hyperspectral images. The SOC 710VP covers the spectral range of 376–1044 nm (128 bands), captures the image size of each waveband with 520 pixels × 696 pixels, and has an exposure time of 24 ms and a spectral resolution of 5 nm. The SOC 710SWIR covers the spectral range of 915–1699 nm (288 bands), captures the image size of each waveband with 512 pixels × 640 pixels, and has an exposure time of 34 ms and a spectral resolution of 2.7 nm. The distance from the sample to the imaging device was adjusted to 93.5 cm. Other information about the two HISs can be shown in Yan [31]. In the study, grapes of Level 0, Level 1, Level 2, and Level 3 were captured sequentially by the HIS. Each sample was fully photographed by shooting the front and back (randomly, one side was the front, and the other side was the back).

The raw hyperspectral images were corrected into the reflectance images by using a grayscale reference image. The correction was conducted by the following Equation (1):

$$I\_r = \frac{I\_{raw} - I\_{dark}}{I\_{white} - I\_{dark}} \tag{1}$$

*Ir* is the reflectance image, *Iraw* is the raw image, *Iwhite* is the entirely white reference image, and *Idark* is the entirely black reference image. The grayscale reference image was composed of 50% *Idark* and 50% *Iwhite*.

#### *2.3. Spectral Data Preprocessing and Extraction*

The segmentation between the grape and the background was necessary to obtain accurate spectral information. In this study, ENV 5.2 (ITT Visual Information Solutions, Boulder, CO, USA) was used to crop a hyperspectral image to various hyperspectral sub-images containing a sample of 3–6 single berries. The sample in each hyperspectral sub-image was defined as a region of interest (ROI), which is a mask formed by threshold segmentation of the 804 nm Vis-NIR hyperspectral sub-image and the 1092 nm NIR hyperspectral sub-image. Further, spectra information in the ROI of the hyperspectral sub-image was extracted by Matlab R 2018b (The Math Work, Natick, MA, USA). The average spectrum of ROI was calculated as the spectral value of the sample, as shown in Figure 2. The spectral value at the beginning and the end were removed to eliminate obvious noise. The reserved wavelength range of Vis-NIR spectra was 476–890 nm (80 bands), and that of NIR spectra was 970–1594 nm (230 bands). For Vis-NIR and NIR spectral value, Savitzky–Golay (SG) [38] smoothing filter (the polynomial order was 0, the kernel size was 3) was used to improve the smoothness of the spectra and reduce noise interference. Then, the Standard Normal Variate transform (SNV) [39] was applied to avoid the impact of surface scattering, solid particle size, and the optical path change of diffuse reflection spectra.

**Figure 2.** The flow chart of hyperspectral image data acquisition and data contact.

#### *2.4. Data Analysis Method*

#### 2.4.1. Principal Component Analysis (PCA)

Principal component analysis (PCA) is a commonly used statistical method. A group of variables related to each other can be transformed into uncorrelated and independent ones through orthogonal transformation [40,41]. The primary purpose is to reduce the number of variables, namely dimensionality reduction. It is a linear dimensionality reduction method. The transformed variable is called the principal component (PC), and the top PCs explain most of the information of the hyperspectral image. The PCA score scatter plots for qualitative analysis of grape pesticide residues could be formed.

#### 2.4.2. Support Vector Machine (SVM)

Support vector machine (SVM) is a supervised pattern recognition approach. SVM is a traditional classification method, and it is widely applied in classification conditions [42,43]. Moreover, SVM has excellent generalization ability, so it is widely used in spectroscopy. The kernel function is highly vital to the SVM model. In this paper, the tuning range of the kernel function was "poly, rbf, sigmoid". The kernel parameter g and penalty coefficient C were used to get optimal performance. A grid-search procedure was used to optimize *g* and *C*. The searching range of *g* and *C* were 10−<sup>5</sup> to 50 and 10−<sup>5</sup> to 50, respectively.

#### 2.4.3. Logistic Regression (LR)

Logistic regression (LR) is a generalized linear regression analysis model, and it is often used in data mining, automatic disease diagnosis [44], economic forecasting [45], and other fields [46]. Linear regression is a machine learning method used to solve binary classification (0 or 1) problems, which are used to estimate the possibility of something. Adding the sigmoid active function to linear regression, LR can then be used for multiple classifications and introduced non-linear elements [47]. In this study, the optimization range of the solver was in "newton-cg", "lbfg", ''liblinea", ''sag", and that of C was between 10−<sup>5</sup> and 105. The penalty was set to L2.

#### 2.4.4. Random Forest (RF)

RF is ensemble learning, which consists of the decision tree (DT) [48]. RF shows two important traits: random sampling of training data points when building trees, and random subsets of features considered when splitting nodes [49,50]. The last result of the decision is determined by the voting method, so it has strong robustness. Random forest can process high-dimensional data without feature selection. In our study, *n\_estimators* were between 100 and 1000, and *max\_depth* was between 4 and 8.

#### 2.4.5. Convolutional Neural Network (CNN)

A convolutional neural network (CNN) is a forward neural network. It usually consists of the following six layers: input layer, convolution layer, activation layer, pooling layer, fully connected layer, and output layer [31]. CNN has an excellent performance in classification. One advantage of CNN is local perception. CNN only perceives the local elements of the data and then merges local information in the higher-level network to obtain all the characterization information of the data. The second is weight sharing. By weight sharing, the number of weights of the network can be decreased, and the complexity of the network can be reduced [29]. A simple CNN architecture was designed for our study. The structure of the CNN is shown in Figure 3.

**Figure 3.** The proposed convolutional neural network (CNN) structure for the identification of pesticide residues in grapes.

In Figure 3, two main blocks were involved in the structure. The first block was the convolutional block (Conv Block), which consisted of three convolutional layers. Each convolutional layer was followed by a batch normalization layer (BN) and rectified linear unit (ReLU). In the end, an average pooling layer was added to alleviate the excessive sensitivity from the convolutional layer. In this process, one-dimensional (1D) spectral data were involved, and Conv1D was used, as shown in Figure 3. The second was a fully connected block (FC Block). The features extracted by the convolutional layer were learned through the fully connected layer. A linear layer was added, and BN and ReLU followed. The dropout was applied to alleviate overfitting. For the output layer, the network outputs the final result according to the probability of the four classification results. The input channels of the first three convolutional layers were 128, 64, and 32; the kernel sizes were 3, 3, and 5; the stride was 1, and the padding was 1. For the average pooling layer, kernel size was 2. The FC block included two fully connected layers, which consisted of 256 and 128 neurons, respectively. The dropout ratio was set as 0.5. Another linear layer was set for output at the end of the network. During the training process of CNN, the Adaptive Moment estimation (Adam) algorithm was used to optimize softmax cross-entropy. Weights were initialized using the Xavier algorithm.

#### 2.4.6. Residual Neural Network (ResNet)

With the deepening of the neural network, there would be problems of overfitting, gradient explosion, and network degradation, and ResNet could effectively handle those [51]. In this study, based on the ResNet18, the ResNet was applied to identify pesticide residual levels. Figure 4a shows the structure of ResNet. The ResNet consisted of one convolutional layer and two residual blocks, the last was average pooling. The output channel of the convolutional layer was 64, kernel size was 1 × 3, and stride and padding were 1. Then a batch normalization layer (BN) and rectified linear unit (ReLU) were added. The channels of 3 residual blocks were 64, 128, and 256, kernel size was 1 × 3, and stride and padding were 1. The average pooling was followed to extract features smoothly, the last was the linear layer.

**Figure 4.** The proposed residual neural network (ResNet) (**a**) and residual block (**b**) structures for the identification of pesticide residues in grapes.

#### *2.5. Saliency Map*

Saliency map is a visualization technique in order to gain better insights into the decision-making of a neural network. When a sample was predicted correctly, it would be added to compute the feature importance [52]. Scale information contributions within the network could be computed [53]. Once the sample label was correctly predicted, the corresponding weights of the elements would be obtained, which represents the contribution rate (importance) of the elements. A saliency map can visualize the contribution rate of each element to intuitively see which elements play important roles in the process of CNN-based sample identification. For hyperspectral data, a saliency map could effectively visualize the importance of the wavebands.

Given the hyperspectral data *X*<sup>0</sup> with the set of the test, which was built by the classification model CNN-based, the class score function *S*<sup>C</sup> (*X*0) was obtained for all the wavebands [53]. When the label of this sample was correctly classified, the weight *w* could be calculated by the followed Equation (2).

$$w = \text{abs}\left(\frac{\partial \mathcal{S}}{\partial X} \middle| X\_0\right) \tag{2}$$

where *w* means the absolute value of the derivative of the score value *S* concerning the spectral data *X*0.

In this study, test set data were used to compute the importance of all the wavelengths, when the sample label was predicted correctly.

#### *2.6. Software and Model Evaluation*

In this study, the areas of the samples were defined in ENVI 5.2 (ITT Visual Information Solutions, Boulder, CO, USA). The spectral data were extracted in Matlab R 2018b (The Math Work, Natick, MA, USA). The Python scripting language (version 3.8,64 bit) was applied for the numerical calculations. SVM, LR, and PLS-DA were conducted by using the machine learning library scikit learn (version 0.23.2). The 1D CNN model was built on the deep learning Pytorch framework (version 1.5.1). All data analysis procedures were implemented on a computer with a memory of 10 GB, a SSD of 238.35 GB, and a CPU of i5-7200 U.

The accuracy is used to illustrate the discrimination ability of classifier systems. The definition was the following:

$$Accuracy = \frac{TP}{All} \tag{3}$$

*TP* (true positive) means the number of the predicted result consistent with the actual label. *All* means the number of all samples. *Accuracy* is the index to evaluate the model.
