1. Introduction
Grapes occupy an important position in the global fruit production. According to the International Organization of Vine and Wine (IOVW), the vineyard planted area in the major vineyard growing countries is about 7.33 million hm
2 in 2020, and China ranks third in the world with 783,000 hm
2 of grape planted area [
1,
2]. With the expansion of planting scale and rising production of specialty crops such as grapes, it is important to obtain the structural composition and planting area of grape canopy quickly and accurately [
3,
4,
5].
Remote sensing technology [
6,
7,
8] provides great support for people to obtain spatial distribution information of the land. Li et al. [
9] studied the monitoring of frost damage of wine grapes by using satellite remote sensing technology, and they established a monitoring framework by integrating the spectral information of visible and infrared light to achieve the monitoring of frost damage of wine grapes. However, satellite remote sensing technology is costly and not suitable for farmers with small planting areas. For this reason, this paper adopts the UAV remote sensing technology with low acquisition cost and high resolution [
10,
11] to study the information of grape farmlands and solve the problem of low accuracy in identifying grape-planting areas in the field under complex environments. Han et al. [
12] combined visible and UAV multispectral cameras to acquire images of downed maize and extract spectral information, texture features, etc., screened potential Eigenfactors, and constructed two logistic models based on selected Eigenfactors, which not only extracted the area of downed maize but also predicted the probability of occurrence of downed maize. Valderrama Landeros et al. [
13] combined multispectral data with a combined algorithm of normalized vegetation indices to monitor mangrove forests with different health conditions, which is important for the conservation of small mangrove forests. Lan et al. [
14] applied UAV remote sensing technology to rice weed identification and achieved accurate identification of rice and weeds based on a semantic segmentation method of full convolution. The aforementioned researchers applied UAV remote sensing technology to agriculture and achieved better results, and for this reason, they also firmly established this paper to apply UAV remote sensing technology to grape farming structure identification to solve the problems of low accuracy and difficult management of large-field grape-growing areas in complex environments [
15,
16,
17].
Although there have been many applications of multispectral UAVs in various crops [
18,
19], there are still some problems in the application, especially in the recognition accuracy, and there have been less applications in grapes. Che’Ya et al. [
20] used UAV multispectral images to select six (440, 560, 680, 710, 720, and 850) bands of multispectral images to distinguish weeds from crops. Ren et al. [
21] first used the Relief-F method to obtain the weight factors of each band, then used a partitioning strategy based on the correlation between bands to orderly divide the entire band interval, and finally selected the band with the highest importance score in each subinterval to obtain the best band of the image; however, the content of their study mainly tended to the image band itself, and less to the texture of the image. However, their studies mainly focus on the image bands themselves, with less consideration of the image texture, vegetation index, and band information. Shi et al. [
22] used LiDAR and a multispectral camera to obtain relevant data and extract texture features of temperate tree species, and the results showed that the accuracy was significantly improved when LiDAR 3D structural features were combined with texture features for tree species classification. Zhang et al. [
23] analyzed 11 vegetation index types based on multispectral orthophotos, and the algorithms were compared to conclude that the vegetation index information of the images can be used for ground classification and the combination of multispectral bands and vegetation indices can improve the classification accuracy more than the method using only spectral bands. It can be seen that adding vegetation indices and optimal bands when recognizing targets can effectively enhance the recognition accuracy of image bands [
24,
25,
26]; using the integrated features of each band-specific correlation, Sun et al. [
27] proposed a band-information-enhanced method for vineyard area recognition, which solved the problem of texture similarity among features in vineyard areas, but the recognition accuracy was not high for areas with large differences in spectral information. To improve the recognition accuracy, Kwan et al. [
28] compared a classification algorithm based on deep learning with nine traditional classification algorithms and concluded that the extraction accuracy of SVM for vegetation was higher than that of NDVI, and they also proposed that DSM could be added to improve the extraction accuracy for vegetation.
In summary, this paper extracts 24 types of texture features, 5 types of vegetation index information, and spectral information of each waveband from field grape images based on UAV remote sensing technology and DeepLabV3+, and comprehensively analyzes the weighting ratio of the influence of these types of features on the model [
29,
30,
31,
32] to filter out the most suitable scheme for grape-planting structure extraction, In addition, we propose to improve the DeepLabV3+ deep semantic segmentation model to improve the extraction accuracy of grape-planting structures in fields from UAV multispectral images by improving the input layer structure to adapt to multispectral images and grape field vegetation a priori feature fusion, modifying the activation function to optimize the DeepLabV3+ model [
33,
34,
35]. This paper also explores the extraction accuracy of grape field crops under different waveband combinations to provide reference for future UAV multispectral applications in agriculture, and provides ideas for agricultural informatization [
36,
37,
38].
2. Materials and Methods
2.1. Study Area
The experimental site of this paper is located at Nanliang Farm, Xixia District, Yinchuan City, central Ningxia Hui Autonomous Region. The geographical coordinates are longitude 106°9′30″–106°9′50″ east, latitude 38°38′0″–38°38′10″ north, with an area of about 13 km2. The region has obvious temperate continental monsoon climate characteristics, with strong solar radiation, low rainfall, high daily evapotranspiration, and severe dusty weather. Winter lasts for a long time and summer is relatively short. The annual frost-free period is about 166 days, with the first frost usually in October and the last frost usually in April of the following year. Soil types in the study area are mainly sandy soils, sandy loam, and clay soils.
2.2. Image Acquisition and Dataset Construction
In this paper, a DJI warp M600 multirotor UAV was used to obtain spectral data, as shown in
Figure 1. It includes three visible wavelengths: 450 nm (blue), 555 nm (green), and 660 nm (red); and 710 nm (RedB), 840 nm (N
1), 940 nm (N
2). The UAV flight altitude was set at 75 m, with a heading overlap rate of 85%, a side overlap rate of 70%, and a ground image resolution of 4.68 (cm/pixel/M600).
The spectral remote sensing images of six different bands under the same test area were collected continuously in early September 2021. The camera performance was stable, the UAV basically maintained a flight speed of 8 m/s throughout the whole process, according to the predetermined serpentine trajectory line, and the shooting points were evenly distributed on the serpentine trajectory line, as shown in
Figure 2, ensuring that each test field was photographed with a small flight distance. The study area has a large number of trees; different sun heights will produce different shadows, and the crops will also produce different shadows, which will change the texture characteristics and color characteristics of the crops and affect the recognition. For this reason, the shooting time was set at 12:00 noon, and the shooting time was about 18 min. A total of 400–500 original multispectral remote sensing images were obtained for each test field.
2.3. Data Preprocessing
The test area images were first calibrated by the calibration plate, and then checked and rejected by the Pix4Dmapper (Pix4D, Lausanne, Switzerland). As the images of the grape-growing areas needed to be stitched together after shooting, this would lead to problems such as overexposure and color distortion, which in turn would affect the accuracy of subsequent information extraction, as shown in
Figure 3.
To avoid the above problems, three kinds of histogram correction tests (histogram equalization, histogram prescriptive, and Esri correction) were conducted on the stitched images in this paper. The results show that the Esri correction is better than the other two histogram correction methods, and the obtained features have clearer boundaries and greater variability of features between categories, which are easy to distinguish.
2.4. Factors Influencing the Selection of Characteristics of Grape Land
Extracting and analyzing the spectral information, texture features, and vegetation indices of the planting area, and screening the key feature parameters that can extract the vineyard structure, are essential for the information perception of the vineyard.
2.4.1. Best Band
When screening the appropriate optimal band combinations of Daejeon grape images, care needed to be taken that the preferred combinations contained a large amount of information and used data downscaling, but still retained their efficient original images. The basic spectral data of the Daejeon grape images were analyzed to obtain the standard deviation (SD) and mean (means) of the grayscale of the six bands, as well as the correlation coefficient matrix between the bands. SD is often used to measure the dispersion of the grayscale of an image, and a larger SD indicates a greater amount of effective information that maximizes the presentation of the image.
The basic spectral data of Ohta grape images were analyzed to obtain the optimal bands, and optimum index factor (OIF) was introduced, which is a method that considers the SD, mean, and other measures of information quantity, as well as the correlation coefficient between bands, and can quickly and accurately describe the information quality of the band combination [
39,
40], calculated as follows:
2.4.2. Texture Characteristics
Texture is an arrangement property that expresses periodic variations or approximately the same combination of certain regions in remote sensing images, and quantitatively portrays the degree of homogeneity and detailed internal characteristics of each category in the images. The Daejeon grape images acquired by multispectral UAV can provide grayscale information of R, G, and B channels in the visible band, but the random noise between R, G, and B channels is high and the information contained in them is overlapping. The HSI model can eliminate the influence of intensity components in color images, and can significantly reduce the workload of image analysis and processing, which is conducive to the detection and analysis of color characteristics.
The principal component analysis (PCA) [
41] algorithm is used for feature dimensionality reduction. PCA is the most classical algorithm for dimensionality reduction of remote sensing high-dimensional data, and its core idea is to maximize the variance as a criterion to find the optimal projection components. The amount of information in the texture feature data after dimensionality reduction by PCA is huge, and the five types of texture features filtered by OIF and eliminated with probability statistics are shown in
Table 1, and three types of texture features with small overlap of information and larger amount of data contained in the components are screened as the theoretical basis of the later classifier.
2.4.3. Vegetation Index
The visible light vegetation index for vegetation information extraction was constructed based on the reflection and absorption characteristics in the large-field viticulture area with rich crop types facing more interference in the complex environment. However, it is difficult to ensure high classification accuracy by virtue of the visible vegetation index in the large-field grape cultivation area with rich crop types, large differences in vegetation cover, and complex environment. Therefore, this paper uses the visible and near-infrared spectral information of multispectral images to extract richer vegetation indices as the theoretical basis for subsequent model segmentation.
In this paper, four common vegetation indices (Normalized Difference Vegetation Index (NDVI), Ratio Vegetation Index (RVI), Difference Vegetation Index (DVI), and Soil Adjustment Vegetation Index (SAVI)) were extracted [
42], the preferred vegetation index characteristics were determined by comparison, and the spectral information of different types of crops was recorded in uncorrected digital number (DN) values for the field grape images acquired by the UAV multispectral system. The multispectral images were first calibrated before interpretation, the recorded DN values were transformed into surface reflectance (SR) values of the images, and the spectral data of the grape-growing areas were analyzed to obtain the spectral characteristics parameters of each category in the test area.
2.5. DeepLabV3+ Model and Improvements
2.5.1. DeepLabV3+ Model Building
DeepLabV3+ is an improvement based on the DeepLabV3 deep learning model, and the DeepLabV3 model was tested after 16-fold interpolation sampling of the output convolutional layer. The deep learning network structure is composed of encoder and decoder; the encoder is used to extract the depth features of the image, and is composed of a deep convolution neural network (DCNN) and an atrous spatial pyramid pooling (ASPP) module. The DCNN is used to extract the features of the input image, and the ASPP is used to optimize the DCNN module to extract the depth feature map, in which the ASPP module uses the null convolution with four different sampling rates to extract multiscale information from the feature map output of the feature extraction network. The disadvantage of high-magnification sampling is solved. The decoding area fuses the shallow feature map features with the depth feature map after upsampling, and optimizes the location information that cannot be recovered by upsampling using the shallow features to obtain the semantic segmentation prediction results, as shown in
Figure 4.
2.5.2. Model Training
The deep learning framework and server-related configurations used in this paper are shown in
Table 2.
The dataset was cut into 960 images of fixed pixel size 256 × 256 from the original images and labels of arbitrary scale, and 5000 model input images of fixed size 256 × 256 were obtained by rotation, noise addition, mirroring, and other enhancements. The dataset was divided according to the ratio of 8:2, and 4569 training sets and 1142 test sets were obtained. The model was trained using the momentum gradient descent algorithm under the mainstream PyTorch framework, and the main parameters were set as follows: batch size was 32, the training used the cross-entropy loss function, the momentum was set to 0.9, the learning degradation rate was 0.1, and the initial learning rate was set to 0.0001.
2.5.3. Research on Information Extraction of Large-Field Grapes Based on Traditional Methods
The remote sensing images were processed by the above methods, and the models of support vector machine, maximum likelihood classifier, random forest classifier, and ISO DATA classifier were constructed in ArcGIS software (Esri, Redlands, CA, USA), respectively. The information was extracted from the multispectral images of Daejeon grapes, and the optimal model was obtained by comparing the results with the evaluation indexes such as overall accuracy (OA) and kappa coefficient, and compared with the improved model in DeepLabV3+.
2.5.4. Model Improvement
In this paper, we replace the loss function and change the input layer of the network based on the DeepLabV3+ model in order to obtain a better network for extracting information of large-field grapes. Firstly, the best band selected by different band combinations and OIF algorithm is used as the input image for model loading; secondly, the preferred vegetation index and texture features are incorporated into the model, the loss function is replaced, the hyperparameters are adjusted by several trials, and then the network model is optimized.
2.6. Evaluation Indicators
The classification accuracy of the scheme is evaluated based on the relative error of the confusion matrix and area.
2.6.1. Extraction of Test Set Area
The prediction results of the model in the validation set were imported into ArcGIS software, the grape-planting area was counted, the relative error of the model area extraction was calculated to obtain the validation set area as the real area by combining field mapping and photographed images, and the relative error was calculated as follows.
where
M denotes the grape area predicted by the DeepLabV3+ model on the validation set, and the true value
T denotes the grape acreage in the validation set obtained from field mapping.
2.6.2. Confusion Matrix
The confusion matrix is the most basic and intuitive way to measure the accuracy of the subtype model. Many classification metrics are also derived from the confusion matrix, such as OA, kappa coefficient (kappa), precision, recall, F1-score [
43], etc.; mean intersection over union (MIoU) [
44] and FW-IOU can also be obtained from the confusion matrix by calculation for this purpose In this paper, these metrics are cited to evaluate the semantic segmentation models.
(1) OA: The ratio of the number of correctly classified samples to the total number of validation samples for constructing the classifier.
(2) Kappa: The final judgment index based on producer accuracy (PA) and user accuracy (UA) and using the information of the whole error matrix, which can reflect the classification accuracy of the model comprehensively and accurately.
(3) FW-IOU: This is an improvement of MIoU, where each category is weighted according to its importance, which is derived from its frequency of occurrence.
(4) F1-score: This is the harmonic mean of the combined precision (P) and recall (R).
The formulas for calculating these metrics are as follows.
Assume that there are k+1 classes of targets in the dataset, 0 denotes the background, is the positive sample class, and is the negative sample class. Then represents the number of pixels that are actually class and predicted to be class (true positive, TP), and then represents the number of pixels that are actually class and predicted to be class (false positive, FP). The closer the value of the average intersection ratio is to 1, the better the network segmentation effect; the closer the value is to 0, the worse the network effect.
3. Results
3.1. Analysis of the Effect of Image Band Results
The standard deviations of bands 1, 2, 3, 4, 5, and 6 (corresponding to B, G, R, N1, N2, and N3) were obtained by statistical analysis of the study area: 12,568, 10,725, 6877, 14,508, 18,949, and 20,450, respectively. From the visible band (B, G, R, and N1) to the near-infrared band (N2 and N3), the standard deviation first decreases and then increases.
In order to consider standard deviation, band correlation coefficient, mean value, and other measures of information indicators, the optimal index factor (OIF) was introduced and calculated by computing Equation (1) and using Python programming language to select three bands to calculate the OIF values for that band combination. For the multispectral images of six bands, the OIF values of 15 sets of three-band combinations were calculated. The calculation results are shown in
Table 3, and finally the band combinations of 1, 2, 5 (B, G, N
2) and 1, 2, 6 (B, G, N
3) were selected as the best band combinations for the Daejeon grape images.
3.2. Characterization of Color and Texture
By studying the different correlation coefficients and standard deviations between different texture features, we lay the foundation for the subsequent extraction of different crops in farmlands. In this paper, five categories of texture features with greater impact are selected as the basis for image classification, and the sample features of each category under RGB color space and HIS color space are compared. It can be seen that there are obvious differences in hue, saturation, and brightness of grapes, corn, greenhouses, etc. The results are shown in
Figure 5 and
Figure 6.
In the comparison between the RGB color space and the HIS color space, the internal texture and boundary features of the grapes in the HIS color space were clearly defined, and the shadows around the grapes caused by the sun height were eliminated. In the complex environment of the field grape test area, the increase of spatial resolution reduces the spectral differences of the same type of features to some extent, but the internal detail features of the images are significantly enhanced, which can more finely delineate the planting plot boundaries of grapes, corn, and greenhouses, and the canopy shape and geometric structure of the vegetation in its test area are more clearly characterized, which increases the accuracy and robustness of the texture features in the classification model.
The obtained results were then subjected to PCA algorithm feature dimensionality reduction, and the standard deviation and correlation coefficients among the components were obtained using ENVI software (Exelis visual information Solutions, Boulder, CO, USA), as shown in
Table 4.
The correlations between the components were screened by statistical data analysis, and component 1 had a low correlation with components 2 and 4 and a high correlation with bands 3 and 5. The correlation between component 2 and component 4 is around 0.8, which is a high correlation, and the correlation with 3 and 5 is also low. Component 3 has a strong correlation with component 5. The final components of hue mean (H-Means), hue synergy (H-Hom), hue contrast (H-Con), hue low-pass (H-CLP), and saturation low-pass (S-CLP) were obtained as the five preferred texture classification features to support the improvement of the algorithm and the accuracy of the data.
3.3. Vegetation Index Analysis
The spectral feature variability of plant canopy leaves was used to distinguish the planting information of different crops on multispectral images, and the four common vegetation indices of NDVI, RVI, DVI, and SAVI were extracted in ArcGIS software, as shown in
Figure 7 below.
As can be seen from
Figure 7, NDVI can, to a certain extent, weaken the external influences brought about by soil background, shadows from solar height, and atmospheric errors. It is more sensitive to changes in the substratum of different crops, and its value ranges between −1 and 1; RVI is more sensitive to the growth state of vegetation, and is more sensitive to crops with high coverage and good growth condition in the image; SAVI and DVI are more sensitive to changes in soil, and other SAVIs and DVIs are more sensitive to changes in the background, but less sensitive to large green vegetation. In summary, NDVI and RVI were chosen as the preferred vegetation indices to study the characteristics of the vine-growing area according to the actual situation of the experimental site. The NDVI and RVI of different features in the study area were extracted, as shown in
Table 5.
From the vegetation index schematic, it can be seen that the NDVI values of maize are higher than those of grapes, due to the higher reflectance of maize in the NIR band. From the statistical table of the ratio vegetation index of the features in the study area, it can be concluded that the difference in RVI of grapes, maize, canopy, trees, and weeds is easy to distinguish.
3.4. Research on Traditional Grape Land Information Extraction Methods
The field grape validation set was classified using SVM [
45], ML [
46], RF [
47], and ISO DATA, and the classification results are shown in
Figure 8 and
Table 6. The results showed that the cultivation of grapes in the field was complex, containing not only non-crops, such as barns and land, but also green vegetation, such as maize and grape weeds. Among them, the canopy density of the three types of green vegetation differed greatly, with maize being higher, grapes second, and weeds the smallest; the uneven salinization in the study area led to problems such as the large differences in the growth of grapes.
For the UAV multispectral images of the field grape test area under complex agricultural information, four common machine learning classification methods (SVM, ISO Data, MC, RF) are compared and their corresponding evaluation indexes are calculated. As can be seen from the chart, the classification accuracy of SVM is better than the other three categories, with OA of 76.03% and kappa coefficient of 0.72, and the user accuracy for grape extraction is up to 97%. However, the traditional four classification methods, in terms of the overall effect, all have the phenomenon of image element mixing and cannot achieve accurate extraction of grape image information from large fields.
3.5. Unimproved DeepLabV3+ Model in the Test Set Results
The unimproved DeepLabV3+ model has the phenomena of image element mixing and category edge stitching mismatch in the test set. The reason for this phenomenon is that the overall remote sensing image is large and needs to be cropped into 256 × 256 small images before loading into the model for training and prediction. When the unimproved DeepLabV3+ model has poor extraction of each category edge area, it will make the test set results of stitching appear as edge mismatch and other phenomena. In addition, there is a mismatch in the number of grapes for some regions, and the error caused by shadows is not solved, as shown in
Figure 9.
4. Discussion
4.1. Effect of Spectral Information on Improving DeepLabV3+ Model
In this subsection, only the effect of different bands of spectral information on the improved model is considered, and the images of grape-growing areas with different band combinations are tested on the improved DeepLabV3+ model. Seven sets of comparison experiments are set up, in which there are RGB visible three-bands, three-bands of NIR, and different NIR bands incorporated on the basis of RGB, respectively, the real bands of the original images, and the best transformed bands. The model prediction results are shown in
Table 7, where R, G, and B denote the red, green, and blue three-bands of visible 450, 555, and 660, respectively, and N
1, N
2 table, and N
3 denote the NIR bands of 710 nm, 840 nm, and 940 nm, respectively.
The BGN3, RGB, RGB-N1, RGB-N2, RGB-N3, and RGB-N1N2N3 were experimentally compared, where RGN3 was the best combination of bands determined by the OIF algorithm and the band correlation coefficient. When each NIR band is loaded on top of an RGB band, the classification accuracy of the DeepLabV3+ model is improved to some extent. When increasing to the six-band RGB_N1N2N3 multispectral experimental set, the overall accuracy and FW-IoU reach the better values of 79.09% and 76.79%, respectively. The overall accuracy of RGB-N1N2N3 in the validation set is higher than that of RGB, RGB-N1, RGB-N2, and RGB-N3, which leads to the conclusion that the best band combination obtained by data downscaling is BGN3 (1, 2, 6), satisfying the reduced data volume while still achieving better accuracy.
The specific extraction effect of the improved DeepLabV3+ model for grapes is shown in
Table 8. Through experimental comparison, it can be concluded that the best band selection in the previous section is feasible, and the N3 band spectral information has a greater influence on the model. The experimental group BGN3 (126) has the best grape extraction effect, the evaluation index F1-score of extraction accuracy can reach 86.0%, and MIOU is the optimal value of 75.6%.
4.2. Effect of Texture, Vegetation Index, and DSM on Improved DeepLabV3+ Model
In this section, the mean value (H-means) of the first principal component of the HIS color space of the study area image is used as the preferred texture feature of the Daejeon grape image, and NDVI is used as the preferred vegetation index of the image. Then DSM is incorporated on this basis [
43], and the improved DeepLabV3+ method is used to test the effects of remote sensing images of texture, vegetation index, and DSM on the model prediction results.
As shown in
Table 9, the overall accuracy OA improves by 5% and FW-IOU improves by 5.88% when the first principal component H-mean under HIS color space is added based on the improved DeepLabV3+ incorporating six-band spectral information. It can be seen that the texture features shared by the RGB band and NIR band are the biggest factors affecting the model classification. When incorporating the vegetation index NDVI, the OA improved by 0.49% and the FW-IOU improved by 0.19%. When incorporating DSM, the model improved OA by 2.9% and FW-IOU by 0.37% on the validation set. It was shown that the factors affecting the extraction accuracy of the field vine planting structure were, in order, texture feature H-mean > DSM > vegetation index NDVI.
Through the above analysis, the final training set was imported into the improved DeepLabV3+ deep learning model for validation, and it was obtained that its OA could reach 87.48%, which exceeded the best OA of 76.03% for the information extraction effect of SVM among the traditional information extraction methods. The improved DeepLabV3+ model in this paper can effectively improve the classification accuracy and robustness of the model for field planting areas.
4.3. Test Set Area Extraction and Summary
The prediction results of the model in the validation set were imported into ArcGIS software, as shown in
Figure 10, the grape-planting area was counted, and the relative error of the model area extraction was calculated (Equation (2)). The grape-planting area obtained from field mapping was T: 5475.08 m
2, the final predicted extraction area of grapes was M: 5348.21 m
2, and the relative error of the extraction area was 1.9%, which again verified the feasibility of the experimental scheme.
In summary, the DeepLabV3+ deep learning scheme based on spectral information + texture + vegetation index + DSM in this paper has greater advantages in all evaluation indexes compared with traditional SVM, ML, RF, and ISO DATA information extraction methods and the unimproved DeepLabV3+ deep learning model, which can meet the requirements of the complex situation of large-field grape-planting area recognition accuracy and planting structure situation extraction, as well as further to manage the problem of missing plants and plants in large-field grape areas.
5. Conclusions
(1) Aiming at the problems that the traditional field grape information extraction methods are extremely ineffective in extracting field grapes in complex environments and the DeepLabV3+ model has confusion in recognizing image elements and some grapes are misclassified into trees, an improved DeepLabV3+-based field grape information extraction scheme is proposed to provide an effective and feasible solution for deciphering the planting structure of Ningxia field grapes.
(2) The experimental results for the field grape dataset showed that the best band combination for grape-growing areas is BGN3 (1, 2, 6); the main factor affecting the classification accuracy of large-field grape images is the texture feature shared by the RGB band and the NIR band, and the fusion of DSM in the model can improve the classification accuracy. The DeepLabV3+ deep learning scheme based on spectral information + texture + vegetation index + DSM was finally determined. The OA of the improved scheme reaches 87.48%, which is 11.45 percentage points higher than the traditional optimal classification method SVM, and the FW-IOU achieves the best accuracy of 83.23%. It solves the problem of confusing image elements of the original model, improves the recognition accuracy of large-field grapes in complex environments, and the relative error of extracted area is 1.9%.
The deep learning model based on UAV multispectral images to improve DeepLabV3+ proposed in this study solves the problem of information collection of large-field grape plantation areas in complex environments and improves the recognition accuracy of grape plantation areas in complex environments. It meets the requirements of information management of grape-growing areas and lays the foundation for the realization of information management of grape-growing areas.