2.3.1. Hyperspectral Imagery Pre-Processing
The pre-processing of hyperspectral data acquired by UAV mainly includes three parts: radiation correction, imagery mosaicking, and spectral stitching. The radiation correction was completed by Matlab based on the center wavelength and half-width wavelength of the UHD185 [
34]. Considering the limited hardware configuration of the data processing ground station and the large memory of the stitched full-band hyperspectral imagery, the data were firstly exported by JPG and cue formats based on Cubert-PILOT provided by Cubert. Firstly, aiming at mosaicking hyperspectral UHD185 imagery to be a hyperspectral map, this study used the Pansharpen algorithm to fusion the single panchromatic image of 1000 × 1000 pixels and the corresponding hyperspectral image of 50 × 50 pixels. Secondly, extracted and matched image feature points based on the improved SIFT (scale invariant feature transform) algorithm by PhotoScan software, and then used IDL (interactive data language) programming to realize the extraction and merging of hyperspectral sub-band imagery. Finally, the geocoding of hyperspectral imagery was completed based on ArcGIS to form a hyperspectral map, with a spatial resolution of 2.6 cm.
This study then evaluated and analyzed the overall quality of both spatial information and spectral information of the pre-processed image, because both the spatial information and the spectral information played important roles in the following classification. In terms of spatial information, the pre-processed hyperspectral orthophoto image in
Figure 4 was used to do the overall evaluation. It can be seen from
Figure 4 that the hyperspectral imagery after stitching can effectively convey spatial geometry information, for example, the geometric position of its ground objects was stitched accurately, and the whole image was clear without stitched cracks.
To evaluate the spectral information of the pre-processed imagery, five typical ground objects corresponding to the mosaicked hyperspectral orthographic image and the original hyperspectral image were selected, respectively. Their spectral data information was extracted and then analyzed and evaluated based on qualitative and quantitative methods. Qualitative analysis was carried out by drawing the spectral curve of the extracted spectral data, and the results are shown in
Figure 5. It can be seen from
Figure 5 that the spectral curves of the selected ground objects of the original hyperspectral images are close to the spectral curves of the hyperspectral orthophoto after spectral curves stitching.
Quantitative analysis was carried out by analyzing the correlation of the extracted spectral data, and the results are shown in
Figure 6. The correlation coefficient of stitched spectral curves before and after mosaicking reached 0.97, and the imagery after mosaicking effectively retained the original spectral feature information.
2.3.3. Data Classification Methods
Maximum likelihood estimate (MLE), support vector machine (SVM), and the improved ENVINet5 classification model based on U-Net neural network structure were, respectively, used to conduct the fine classification of the obtained hyperspectral imagery. Overall accuracy and Kappa coefficient were used to evaluate the classification results.
MLE is a classification algorithm based on normal Gaussian distribution. It is one of the most frequently used supervised classification methods. Its discriminant function is in Equation (1):
In Equation (1), is the probabilistic discriminant function of class . is the conditional probability of occurring under the condition of category , which is expressed as the probability that vector belongs to category . is the prior probability of category , is the probability density function of .
SVM is a machine learning algorithm based on statistical learning theory. It has the advantages of higher classification accuracy and better generalization ability compared with other traditional classifiers. In addition, SVM can also effectively reduce the dimension disaster, especially for classifying data with high dimensions and small samples [
36]. It can be simplified as an optimal decision function in Equation (2):
In Equation (2),
is the support vector,
is the Lagrange multiplier,
y is the label for categories,
can be 1 or −1,
is the classification threshold, and
n is the number of support vectors. In application, it is important to choose the appropriate kernel functions of SVM to get better classification results. This study selected linear kernel, polynomial kernel, RBF kernel, and sigmoid kernel of SVM to process hyperspectral data classification. SVM with linear kernel function had the highest overall accuracy and Kappa coefficient when the C value and gamma value were set to 5 and 0.5, respectively. Thus, it was used for the following study, and the classification results of the SVM classifiers with different kernel functions are shown in
Table A1. In addition, both MLE and SVM were implemented through ENVI and MATLAB in this study. Firstly, the hyperspectral image was converted into a matrix format, where each rectangular block represented a sample and each sample contained multiple spectral bands. Meanwhile, each sample was labeled with its corresponding cultivars. Secondly, the spectral feature of each sample was extracted, and then the cross-validation method was used to divide these samples into training and testing sets, and the classifiers were trained by using the training set. Finally, the performance of the trained SVM and MLE classifiers was evaluated by using the testing set, and the overall accuracy and Kappa coefficient were used to assess the classification results.
ENVINet5 classification model was developed by the Environmental Systems Research Institute (ESRI) based on the TensorFlow deep learning framework. Since 2019, ENVI deep learning module has been released in three versions, which were Deep Learning V1.0, Deep Learning V1.1 Tech Preview, and Deep Learning V1.1.2. Compared with the first two versions, Deep Learning V1.1.2 has enhanced parameter settings for stronger learning abilities, for example, it adds Augmentation as the new training tool, which can not only expand the training data but also improve the training and extraction accuracy. Deep Learning V1.1.2 was officially released in October 2020. It needs to be adapted to the latest ENVI5.6. Thus, ENVI5.6 and ENVI Deep learning1.1.2 modules were installed and activated from the ENVI official website to run the ENVINet5 classification model for this study. The basic architecture design of the ENVINet5 neural network model is based on the improved U-Net neural network structure. U-Net neural network is a classical implementation and semantic segmentation algorithm, which is based on fully convolutional neural networks (FCN). It mainly includes conv, max pool, up-conv, and ReLU activation functions as shown in
Figure 8. Its operating mechanism can mainly be divided into two parts. The left half is the down-sampling part, which focuses on data feature extraction. The right half is the up-sampling part, in which extracted feature information is mapped to the category image. To realize pixel-level classification of images and keep image context information to higher resolution feature images, U-Net associates the feature image extracted from the lower sampling part with the upper sampling part to make up for the lost context information [
37]. Thus, the most important improvement of U-Net is in the up-sampling operation part [
38].
The advantages of the ENVINet5 classification model are that it can be combined with the powerful ENVI imagery processing platform, and it can be used to build training samples, do model initialization, train models, and run deep neural networks without any code operations. This is convenient for researchers and users, who hope to apply remote sensing and classify remote sensing imagery without deep learning foundations. The module used in this study mainly consists of four steps: creating training samples, creating models, training models, and image classification. The first three steps are mainly used for model training and parameter setting. For image slicing in deep learning, manual slicing is not required in the ENVINet5 model; instead, by inputting the images to be sliced or the representative parts of the images, and selecting appropriate parameters of slicing size, the ENVINet5 model will automatically perform slicing on the input images. In addition, the ENVINet5 model adopts an inverse transformation sampling method and adds a weight parameter when selecting a large number of slices. This not only reduces the number of iterations during training and improves training efficiency but also enables the trained model to better select slices containing target elements.
The ENVINet5 model is sensitive to parameters during the sample feature learning process, which is similar to parameter tuning in other neural network deep learning models. When tuning the ENVINet5 parameters, the model parameter adjustment is friendly. As the module currently supports multi-element target classification, the study mainly focuses on parameter settings such as iteration and training quantity, fixed distance and fuzzy distance, slice sampling ratio, and classification weight and loss weight. Based on our computer hardware environment, the patch size of 256 × 256 was set in this study. Considering that there were 125 bands of UHD185 hyperspectral image data after image mosaicking and spectral stitching, and if the 125 bands had been directly set as the number of bands in model initialization, the first convolutional kernel input would have 125 layers, which would seriously affect the patch input rate. Thus, the number of bands after the optimal dimension reduction was selected as the number of bands participating in the training. For the iteration and training quantity-related parameters set in the model training, the number of epochs was set to 25, the number of patches per epoch was set to 200, the number of patches per batch was set to 4, and the patch sampling rate was set to 16. As the resolution of the experimental image data was high, the boundaries of each research target element could be drawn, so fixed distance and fuzzy distance settings were not required.
After model initialization, the model needed to be trained. Most parameters related to model iteration and training amount need to be set based on the computer hardware environment. Among all the parameters, the setting of class weight and loss weight can influence the accuracy of the model, so this study mainly focused on the adjustment of these two parameters to acquire satisfactory classification results. After training, the ENVINet5 model can finally perform classification. It should be noted that the image to be classified after dimension reduction should not be smaller than the patch size of the training model.