**UAV-Based Terrain Modeling under Vegetation in the Chinese Loess Plateau: A Deep Learning and Terrain Correction Ensemble Framework**

**Jiaming Na 1,2,3,4,**† **, Kaikai Xue 4,5,**† **, Liyang Xiong 1,2,4,6,**† **, Guoan Tang 1,2,4,6,, Hu Ding <sup>4</sup> , Josef Strobl 4,8 and Norbert Pfeifer <sup>3</sup>**


Received: 8 September 2020; Accepted: 9 October 2020; Published: 12 October 2020

**Abstract:** Accurate topographic mapping is a critical task for various environmental applications because elevation affects hydrodynamics and vegetation distributions. UAV photogrammetry is popular in terrain modelling because of its lower cost compared to laser scanning. However, this method is restricted in vegetation area with a complex terrain, due to reduced ground visibility and lack of robust and automatic filtering algorithms. To solve this problem, this work proposed an ensemble method of deep learning and terrain correction. First, image matching point cloud was generated by UAV photogrammetry. Second, vegetation points were identified based on U-net deep learning network. After that, ground elevation was corrected by estimating vegetation height to generate the digital terrain model (DTM). Two scenarios, namely, discrete and continuous vegetation areas were considered. The vegetation points in the discrete area were directly removed and then interpolated, and terrain correction was applied for the points in the continuous areas. Case studies were conducted in three different landforms in the loess plateau of China, and accuracy assessment indicated that the overall accuracy of vegetation detection was 95.0%, and the MSE (Mean Square Error) of final DTM (Digital Terrain Model) was 0.024 m.

**Keywords:** UAV photogrammetry; terrain modeling; vegetation removal; deep learning

#### **1. Introduction**

Accurate topographic mapping is essential for various environmental applications because elevation affects hydrodynamics and vegetation distributions [1–3]. Small elevation changes can alter sediment stability, nutrient, organic matters, tides, salinity, and vegetation growth, and therefore might cause substantial vegetation transition in relatively flat wetlands [4–7]. Topography influences flow erosion and thus is a prerequisite for soil erosion studies, especially in the loess plateau of China [8,9]. The temporal dynamics of topography helps understand the erosion process and contributes to conservation planning.

Various remote sensing techniques, such as RADAR [10–13], light detection and ranging (LiDAR) [14–16], and stereo photogrammetry [17–20], were developed and applied to model terrains of various scales. However, accurate topographic mapping in gully areas in the loess plateau of China remains challenging due to complications of hydrodynamics, ever-changing terrains, and dense vegetation covers. The widely used LiDAR is the best method because it provides the highest accuracy of mean terrain error within 0.10 m to 0.20 m [21–23]. Meanwhile, terrestrial laser scanning is restricted for terrains with a strong relief [24]. The field measurement always fails in some certain areas because the complex terrain might influence the visibility from the sensor perspective. Airborne laser scanning is also limited under poor weather condition. Errors further increase in dense and tall vegetation conditions and might reach a challenging 'dead zone' when the marsh vegetation height is close to or beyond 2 m [4]. Moreover, laser scanning is expensive and hard to implement in developing countries [25]. Frequent deployment of LiDAR surveys is in such scenarios is cost-prohibited. Therefore, affordable methods for rapid and accurate measurements without relying on out-dated historical data are needed.

State-of-art unmanned aerial vehicle (UAV) provides a promising solution to general mapping applications. Remarkable progress was achieved in light-weight sensor and UAV system developments [26,27], improvement of data pre-processing [28], registration [29,30], and image matching [31–33]. The UAV-based terrain modeling has advantages of low costs, high spatial resolution, high portability, flexible mapping schedule, rapid response to disturbances, and convenient multi-temporal monitoring [34]. UAV has become a favourable surveying method in many areas with challenging mobility and accessibility. In particular, cameras are miniaturised and have low power consumption, making them ideal sensors for area-wise coverage from UAVs [35].

Despite various successful applications, challenges for UAV usage still remain, especially in areas with a dense vegetation condition. UAV terrain modeling is best suited to areas with sparse or no vegetation, such as sand dunes and beaches [36], coastal flat landscapes [37], and arid environments [38]. Establishing a satisfactory terrain model is hindered by difficulties in point-based ground-filtering. Some successful works for automatic ground-filtering were conducted in digital terrain model construction [39,40], the application of which remains 'pointless' due to difficulty in penetration and lack of points from ground [41]. Current developments in UAV communities provide no solution to these issues of terrain mapping in densely vegetated environments [42].

This study aimed to address the challenges in terrain mapping under vegetation cover by developing a UAV photogrammetry mapping solution that does not depend on historical data. The main objective was to propose an algorithmic framework correct terrain based on vegetation detection, by using deep learning (DL). First, image matching point cloud was generated by UAV photogrammetry. Second, vegetation points were identified based on U-net deep learning network. After that, ground elevation was corrected by estimating vegetation height to generate the digital terrain model (DTM). Two scenarios, namely, discrete and continuous vegetation areas were considered. The vegetation points in discrete area were directly removed and then interpolated, and terrain correction was applied for the points in continuous areas. Given that most photogrammetric UAV systems carry colour cameras, the possible application of the proposed method in photogrammetric UAV system for terrain mapping in vegetated environments was also explored.

#### **2. Materials and Methods**

The proposed approach involved the following four steps—(1) UAV photogrammetry; (2) DL-based vegetation detection, (3) terrain correction, and (4) DTM generation. Accuracy assessment was conducted through the comparison between check points generated by global navigation satellite system (GNSS) unit and produced DTM elevation.

#### *2.1. Study Site 2.1. Study Site*  Three study areas, namely, Xining (SA1), Wangjiamao (SA2), and Wucheng (SA3) located in

Three study areas, namely, Xining (SA1), Wangjiamao (SA2), and Wucheng (SA3) located in Qinghai, Shaanxi, and Shanxi, respectively, were selected in the Loess Plateau of China (Figure 1) and represent loess hill and gully, loess hill and loess valley area, respectively. Among them, Wangjiamao and Wucheng cover the complete catchments, and Xining covers a hillslope area. All three study areas were covered with vegetation since the implementation of the 'Grain for Green' project (changing the agriculture to conservation area) from late 1990s [43,44]. Vegetation status of three different study areas varied in their types and spatial distributions. The vegetation in Xining was manmade for ecological protection from the formal cultivation, with an average interval distance of 2 m in the terraced slopes. While in the Wuchenggou and Wangjiawao areas, vegetation are more natural but some cash crops like apples and jujubes (Chinese dates), were still planted, with more dense horizontal distance around 1 m in the slopes. The basic geographic information is listed in Table 1. Qinghai, Shaanxi, and Shanxi, respectively, were selected in the Loess Plateau of China (Figure 1) and represent loess hill and gully, loess hill and loess valley area, respectively. Among them, Wangjiamao and Wucheng cover the complete catchments, and Xining covers a hillslope area. All three study areas were covered with vegetation since the implementation of the 'Grain for Green' project (changing the agriculture to conservation area) from late 1990s [43,44]. Vegetation status of three different study areas varied in their types and spatial distributions. The vegetation in Xining was manmade for ecological protection from the formal cultivation, with an average interval distance of 2 m in the terraced slopes. While in the Wuchenggou and Wangjiawao areas, vegetation are more natural but some cash crops like apples and jujubes (Chinese dates), were still planted, with more dense horizontal distance around 1 m in the slopes. The basic geographic information is listed in Table 1.

**Figure 1.** Study areas.



#### Precipitation 327 mm/y 486 mm/y ~450 mm/y Vegetation Weed Shrub Arbor *2.2. Unmanned Aerial Vehicle (UAV) and Global Navigation Satellite System (GNSS) Field Data Collection*

Main vegetation type *Rhamnus erythroxylon, Artemisia Haloxylon ammodendron, Ziziphus jujuba Hippophae, Malus domestica*  Vegetation height 0.5–2 m 0.5–6 m 0.5–6 m *2.2. Unmanned Aerial Vehicle (UAV) and Global Navigation Satellite System (GNSS) Field Data Collection*  Image matching point clouds from UAV photogrammetry were used as the inputs for terrain modeling. Optical aerial photographs were captured using a DJI Inspire 1 microdrone [45] mounted with a digital camera system Zenmuse X5 [46] (15 mm focal length, RGB color, and 4096 × 2160 resolution), with a battery time of approximately 18 min, and could resist wind speeds of up to 10 m/s. Detailed flight information is shown in Table 2. Pix4D Capture flight planner software was used to plan a round-trip flight line along the study areas, and automatically collect images within certain designed flight distance. All flights were completed from 10 am to 2 pm, to ensure that the image quality would not be influenced by the shades. Ground control points (GCPs) in WGS-84 were obtained by the

Topcon HiperSR RTK GNSS unit [47] (10 mm horizontal positioning accuracy and 15 mm vertical positioning accuracy), with a tripod, to ensure horizontal and vertical accuracy. Bundle adjustment was implemented in Pix4D Mapper software [48]. The point clouds were finally generated and interpolated into the grid digital surface model (DSM).


**Table 2.** Unmanned aerial vehicle (UAV) flight information of three study areas.

Eight targets along the vegetation in Xining (SA1) were designated as check points (CPs) for the uncertainty assessment of the final terrain modeling results. These targets were 1-m-wide boards painted in black and white in a diagonal chessboard pattern.

#### *2.3. Deep Learning (DL)-Based Vegetation Detection*

Most DL networks connect simple layers for data distillation. Input information passes through a layer of filter that increases the purity in distillation to achieve the desired result [49]. Convolutional neural network (CNN) is one of the representative algorithm structures of the deep neural network structure and is a feed-forward neural network usually used in object recognition, target detection, semantic segmentation, and other issues [50,51]. A typical structure for a CNN network, U-Net [52], was implemented for vegetation detection, because of its effectiveness and simplicity. U-net adopts the principle of gradient descent, propagates data information forward, and reverses propagation to correct the parameter weights and deviations [53]. Certain layers were changed and adjusted to specific terrain modeling tasks, on the basis of the existing U-Net structure.

#### 2.3.1. Training Data Generation

DL is usually used in datasets with a large amount of data, and convolutional neural networks are suitable for processing relevant image data. Therefore, the U-Net model can generate a large number of images as input data. Here, input data were randomly cropped to ensure proper representation and eliminate the influence of manual selection. Random coordinate points were expanded, based on the desired image size. The crop range was calibrated, and the crop operation fully utilized the cell size and projected coordinate information.

Data enhancement is the process of generating new data for training, based on image nature, without actually collecting new samples. Convolution operations have translational invariance, and similar transformations such as rotation and scaling of vegetation data do not change the information characteristics of the vegetation data. Here, similar data outside the sample area chart were provided to the model to ensure data diversity. Random similar transformation, scale transformation, Gaussian blur, and image enhancement were performed for the crop data, in which the rotation allocation transformation matrix and 2D Gaussian function were treated as follows—Equations (1) and (2).

$$M = \begin{pmatrix} \cos \theta - \sin \theta \\ \sin \theta \cos \theta \end{pmatrix} \tag{1}$$

$$\mathcal{G}(\mathbf{x}, y) = \frac{1}{2\pi\sigma^2} e^{-(\mathbf{x}^2 + y^2)/2\sigma^2} \tag{2}$$

where θ is the angle of rotation, and σ is the variance.

For the classification task, the training data was labeled as one-hot encoding logic category, namely, 1 for vegetation and 0 for non-vegetation. Manual work was first done for the labeling task at, based on the original point clouds. The RGB and additional elevation information of vegetation of the manmade labels of three study areas were then generated from the original image matching point cloud. All labels were divided into two groups for model training and validation. Forty percent of the dataset was randomly sampled as the training data. Since the DL requires a large amount of training samples, a tool was developed based on the ArcGIS Pro [54] software, using the python script for a multi-scale replicability of the training samples. Finally, 10,000 samples of 4 dimensions (R, G, B, Z) with 128 × 128 cells were automatically generated.

#### 2.3.2. Feature Selection

The data for neural network represent a multidimensional feature array, also known as a tensor, a container for numerical data of images. All transformations learned by the neural network could be summed up as tensor operations for numerical data and formed matrix extension dimensions. Spectral information (R, G, and B values) and elevation provide theoretical feasibility for the division of vegetation. The training data generated by the original point clouds had an RGB value and underwent elevation, and the input data were normalized to reasonably eliminate the scale effect.

#### 2.3.3. Design of the U-Net Network

An improved U-Net framework with a slightly altered structure was used for vegetation detection. The improved U-Net produced split maps of the same size as the input data and preserved the continuity of the resolution.

The predictive model describes the relationship between input x (features) and desired output (answer) y. The system 'learns' the relationship between data and output repeatedly through differential equations and random deviations and obtains the values of a series of unknown parameters, thus, forming a set of rules on its own. These rules are applied to a set of untrained data to allow the model to predict the corresponding set of answers. This process is the core architecture of the image segmentation task. With the use of the FCN (Fully Convolutional Networks, [55]) architecture, the simple representation of the relationship between the predictive output and the input is as follows—Equation (3).

$$y = \mathbf{f}\left(\sum\_{j=1}^{m} \left(w\_j \left(\sum\_{i=1}^{n} w\_i \mathbf{x}\_i - \theta\_n\right) - \theta\_m\right)\right) \tag{3}$$

where x is the input; y is the forecast output; m is the number of hidden layers that determines the depth ˆ of the network to a certain extent and represents the complexity of the network; n is the number of neurons in each layer of network, and each neuron in the convolutional neural networks is represented as a filter (nine neurons in this study); w is expressed as a weight assigned to a neuron to connect input information for signaling; and f is an activation function for nonlinear mapping.

Three specific network structures architecture with different hyper-parameters were designed (Figure 2) for the vegetation detection tasks. In the down-sampling procedure, convolution was performed to extract features and activation values at different levels. Each convolution was based on the result of the previous layer of convolution, thus, bringing the model to a certain depth. Some convoluted feature values were de-dimensionalized from the input, through pooling, to reduce a large amount of computational consumption. The vegetation characteristics were summarized, and a wide range of features were extracted. The data were easily learned, and the model learning ability was enhanced. In the upper-sampling, the image size was expanded layer-by-layer to interpolate the feature maps at all levels. Details on the three model hyper-parameters are shown in Table 3.

feature maps at all levels. Details on the three model hyper-parameters are shown in Table 3.

convoluted feature values were de-dimensionalized from the input, through pooling, to reduce a large amount of computational consumption. The vegetation characteristics were summarized, and a wide range of features were extracted. The data were easily learned, and the model learning ability

**Figure 2.** Three designed U-net model structures. **Figure 2.** Three designed U-net model structures.

**Table 3.** Comparison of model hyper-parameters.


#### 2.3.4. Vegetation Detection Accuracy Assessment 2.3.4. Vegetation Detection Accuracy Assessment

The detection accuracy was assessed through a comparison with the reference. The reference data were manually interpolated from the original point cloud. The confusion matrix was applied to calculate the accuracy in the rasterized results. The detection accuracy was assessed through a comparison with the reference. The reference data were manually interpolated from the original point cloud. The confusion matrix was applied to calculate the accuracy in the rasterized results.

#### *2.4. Terrain Correction 2.4. Terrain Correction*

After vegetation detection, the terrain information could be modified using the vegetation results. In terrain modeling, the ability to reasonably eliminate the vegetation points, determined the accuracy of the DTM result. In urban areas, a cross-section is usually used to completely eliminate the vegetation point and then interpolate the complement point to obtain the DTM [56]. The ground is fitted in a 2D terrain plane, and the points higher than the plane are removed. However, this trend approach always fails, because the planes are difficult to estimate, due to the dramatic reliefs of the mountainous terrains (e.g., the Loess Plateau). The alternative practice for mountainous areas is usually to universally lower the vegetation points, based on the estimation of vegetation average height [37]. This method is effective for continuous vegetation in mountainous areas and maintaining the original terrain fluctuation, but is restricted for discrete vegetation in mountain areas, due to elevation fragmentation or convex terrain [57,58]. To solve this problem, this study divided the terrain correction into two scenarios, namely, discrete and continuous vegetation areas (Figure 3). The vegetation points in the discrete area were directly removed and then interpolated, and terrain After vegetation detection, the terrain information could be modified using the vegetation results. In terrain modeling, the ability to reasonably eliminate the vegetation points, determined the accuracy of the DTM result. In urban areas, a cross-section is usually used to completely eliminate the vegetation point and then interpolate the complement point to obtain the DTM [56]. The ground is fitted in a 2D terrain plane, and the points higher than the plane are removed. However, this trend approach always fails, because the planes are difficult to estimate, due to the dramatic reliefs of the mountainous terrains (e.g., the Loess Plateau). The alternative practice for mountainous areas is usually to universally lower the vegetation points, based on the estimation of vegetation average height [37]. This method is effective for continuous vegetation in mountainous areas and maintaining the original terrain fluctuation, but is restricted for discrete vegetation in mountain areas, due to elevation fragmentation or convex terrain [57,58]. To solve this problem, this study divided the terrain correction into two scenarios, namely, discrete and continuous vegetation areas (Figure 3). The vegetation points in the discrete area were directly removed and then interpolated, and terrain correction was applied for the points in continuous areas.

correction was applied for the points in continuous areas. Step 1: Identification of discrete and continuous vegetation areas.

The vegetation detection result was firstly rasterized then converted into polygon by the Raster to Polygon tool in ArcGIS Pro software [54]. A threshold of 30 m<sup>2</sup> by expertise was then used to identify discrete and continuous vegetation areas. Vegetation areas of less than 30 m<sup>2</sup> were classified as discrete, and those greater than 30 m<sup>2</sup> were labeled as continuous.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 7 of 19

**Figure 3.** Workflow of terrain correction. **Figure 3.** Workflow of terrain correction.

Step 1: Identification of discrete and continuous vegetation areas. Step 2: Point removal and spatial interpolation in discrete vegetation area.

The vegetation detection result was firstly rasterized then converted into polygon by the Raster to Polygon tool in ArcGIS Pro software [54]. A threshold of 30 m2 by expertise was then used to identify discrete and continuous vegetation areas. Vegetation areas of less than 30 m2 were classified as discrete, and those greater than 30 m2 were labeled as continuous. Step 2: Point removal and spatial interpolation in discrete vegetation area. The original point cloud obtained for the UAV photogrammetry represents a surface model including the vegetation information. To achieve a terrain model, all these vegetation points should be excluded. The points in the discrete vegetation area could be directly eliminated. Since the 'holes' after the removal were relatively small, it would not affect the overall trend of the terrain. Therefore, the terrain could then be interpolated.

The original point cloud obtained for the UAV photogrammetry represents a surface model Step 3: Terrain correction in continuous vegetation areas when considering vegetation height.

including the vegetation information. To achieve a terrain model, all these vegetation points should be excluded. The points in the discrete vegetation area could be directly eliminated. Since the 'holes' after the removal were relatively small, it would not affect the overall trend of the terrain. Therefore, the terrain could then be interpolated. Step 3: Terrain correction in continuous vegetation areas when considering vegetation height. The commonly used local polynomial interpolation ignores its own terrain fluctuations. Thus, the elevation information would be lost when the points in the continuous vegetation area are simply removed. A possible solution was to estimate the terrain elevation and then modify the elevation of the vegetation points in the point cloud. With regard to the varying heights for each individual continuous vegetation area, an adaptive process with less human interaction was proposed. The commonly used local polynomial interpolation ignores its own terrain fluctuations. Thus, the elevation information would be lost when the points in the continuous vegetation area are simply removed. A possible solution was to estimate the terrain elevation and then modify the elevation of the vegetation points in the point cloud. With regard to the varying heights for each individual continuous vegetation area, an adaptive process with less human interaction was proposed. Vegetation height was estimated by the elevation in the 0.5 m buffer zone of each polygon. This could be achieved by the Zonal Statistics tool by ArcGIS Pro [54], using the original point clouds. The difference between the vegetation elevation point and the ground elevation in the polygonal area from DSMs was treated as the unified elevation value of the area, and the final fine DTM was obtained by subtracting the estimated mean height of each polygon.

by subtracting the estimated mean height of each polygon.

Vegetation height was estimated by the elevation in the 0.5 m buffer zone of each polygon. This could be achieved by the Zonal Statistics tool by ArcGIS Pro [54], using the original point clouds. The difference between the vegetation elevation point and the ground elevation in the polygonal area from DSMs was treated as the unified elevation value of the area, and the final fine DTM was obtained

#### *2.5. Terrain Modeling Result Validation Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 8 of 19

Evaluating the elevated generated DTM is the key to measuring accurate terrain modeling results. To achieve the validation, a comparison between the final generated DTM with CPs from field survey by GNSS unit was conducted. The Xining area was selected for the validation. *2.5. Terrain Modeling Result Validation*  Evaluating the elevated generated DTM is the key to measuring accurate terrain modeling results. To achieve the validation, a comparison between the final generated DTM with CPs from

field survey by GNSS unit was conducted. The Xining area was selected for the validation.

#### **3. Results**

#### *3.1. Vegetation Detection Results* **3. Results**

Xining was selected for model training. After performance comparison for the designed U-Net network structures, the U-Net model C was finally chosen for vegetation detection. Details of three structures' performance are discussed in following Section 4.1. After model training, the model was applied in two other study areas. Figure 4 shows the results for the three study areas. *3.1. Vegetation Detection Results*  Xining was selected for model training. After performance comparison for the designed U-Net network structures, the U-Net model C was finally chosen for vegetation detection. Details of three structures' performance are discussed in following Section 4.1. After model training, the model was applied in two other study areas. Figure 4 shows the results for the three study areas.

**Figure 4.** Vegetation detection result. (**a**) Xining; (**b**) Wangjiamao; and (**c**) Wucheng **Figure 4.** Vegetation detection result. (**a**) Xining; (**b**) Wangjiamao; and (**c**) Wucheng.

Table 4 shows that the confusion matrix of vegetation detection results in three areas with the reference. The detection accuracies were acceptable at 90.9% for Xining, 96.4% for Wangjiamao, and 87.2% for Wucheng. The vegetation detection of Wucheng was not as highly accurate as for the other two areas because the tie points in the southwest corner of Wucheng were relatively insufficient during the automatic image matching. Hence, the accuracy of the original image matching point cloud was reduced. Table 4 shows that the confusion matrix of vegetation detection results in three areas with the reference. The detection accuracies were acceptable at 90.9% for Xining, 96.4% for Wangjiamao, and 87.2% for Wucheng. The vegetation detection of Wucheng was not as highly accurate as for the other two areas because the tie points in the southwest corner of Wucheng were relatively insufficient during the automatic image matching. Hence, the accuracy of the original image matching point cloud was reduced.

#### *3.2. Vegetation Identification Results*

Identification was conducted in the three study areas, based on the adaptive treatment of discrete and continuous vegetation (Figure 5). The manmade vegetation spatial distribution patterns in Xining and the natural patterns in Wuchenggou and Wangjiawao were successfully identified. Vegetation height estimations ranged from 0.01 to 2.26 m (1.81 m in mean) in Xinning, 0.01 to 7.12 m in Wangjiamao (4.23 m in mean), and 0.66 to 6.38 m in Wucheng (4.21 m in mean), respectively. Reference Ground <sup>4457886</sup> (62.3%) 225949 (3.1%) 127645071 (90.0%) 2049627 (1.4%) 2095418 (69.4%) 135462 (4.5%) Vegetation <sup>425710</sup> 2039464 3181941 9075223 252464 535952

 **Detection (In Cells)** 

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 9 of 19

**Table 4.** Confusion matrix of vegetation detection results in three areas for architecture C.

 **Ground Vegetation Ground Vegetation Ground Vegetation** 

(17.8%)


**Table 4.** Confusion matrix of vegetation detection results in three areas for architecture C. (6.0%) (28.6%) (2.2%) (6.4%) (8.3%)

**Figure 5.** Vegetation identification results. (**a**) Xining; (**b**) Wangjiamao; (**c**) Wucheng. Base map is the digital surface model (DSM), and the estimated vegetation height is colored. **Figure 5.** Vegetation identification results. (**a**) Xining; (**b**) Wangjiamao; (**c**) Wucheng. Base map is the digital surface model (DSM), and the estimated vegetation height is colored.

#### *3.3. Terrain Correction Results 3.3. Terrain Correction Results*

After the vegetation identification, terrain correction was done and DTMs with 1 m resolution were then interpolated (Figure 6). The proposed method removed the vegetation points without losing the terrain details and restored the fine DTM. Compared with orthophotos, the terrain reliefs After the vegetation identification, terrain correction was done and DTMs with 1 m resolution were then interpolated (Figure 6). The proposed method removed the vegetation points without losing the terrain details and restored the fine DTM. Compared with orthophotos, the terrain reliefs were well presented in the modeling results. The smooth color rendering of DTMs indicated that the vegetation recognition removal was good, and DTM was visually refined.

*Remote Sens.* **<sup>2020</sup>**, *<sup>12</sup>*, 3318 *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 10 of 19 were well presented in the modeling results. The smooth color rendering of DTMs indicated that the vegetation recognition removal was good, and DTM was visually refined.

**Figure 6.** Digital terrain model (DTM) (left) and orthophoto (right) results after terrain corrections. (**a**) Xining; (**b**) Wangjiamao; (**c**) Wucheng. Two detailed windows of each study areas are enlarged.

7).

#### *3.4. Terrain Modeling Result Validation with Field Measurement Data 3.4. Terrain Modeling Result Validation with Field Measurement Data*

Ground control points in Xining by field survey were elevated to verify the DTM results (Figure 7). Ground control points in Xining by field survey were elevated to verify the DTM results (Figure

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 11 of 19

**Figure 6.** Digital terrain model (DTM) (left) and orthophoto (right) results after terrain corrections. (**a**)

**Figure 7.** Elevation uncertainty assessment in Xining. **Figure 7.** Elevation uncertainty assessment in Xining.

Table 5 shows the elevation comparison of the CPs. The MSE was 0.024 m, which met the standard of the accurate terrain modeling. Points D and H had the highest prediction accuracy, which were originally ground points. Correctly predicting the vegetation points ensured that the ground elevation values were preserved correctly. Point G failed the accurate elevation, because it was located at a hole even when the vegetation detection was not correct. The terrain correction of the remaining vegetation points was guaranteed. Table 5 shows the elevation comparison of the CPs. The MSE was 0.024 m, which met the standard of the accurate terrain modeling. Points D and H had the highest prediction accuracy, which were originally ground points. Correctly predicting the vegetation points ensured that the ground elevation values were preserved correctly. Point G failed the accurate elevation, because it was located at a hole even when the vegetation detection was not correct. The terrain correction of the remaining vegetation points was guaranteed.


**Table 5.** Elevation comparison of CPs. **Table 5.** Elevation comparison of CPs.

#### **4. Discussion**

**4. Discussion**  In this section, some extra analyses were conducted to discuss the key to the success of vegetation detection. Hyper-parameter (network structure and epoch) influence analysis was done at first to achieve an optimized parameter setting. The comparison with other two published methods In this section, some extra analyses were conducted to discuss the key to the success of vegetation detection. Hyper-parameter (network structure and epoch) influence analysis was done at first to achieve an optimized parameter setting. The comparison with other two published methods (perceptron and adaptive filtering) was then done for a deeper analyses of the performance of our proposed vegetation detection method.

H 2340.806 2340.800 −0.006

#### *4.1. U-Net Hyper-Parameter Influence on Vegetation Detection Performance*

The performance of the three designed different U-net networks was assessed in terms of training loss, validation accuracy, and training accuracy, to understand the influence of parameter and architecture on vegetation detection.

Figure 8a shows the training loss of the three models with different epoch settings. Model A is simple with a small network layer and capacity. Its training loss reached the local minimum at 48 epochs. The training loss of model B bounced at the 16th and 38th epochs, and was overall faster than that for Model A. The training loss of model C declined smoothly and reached the local minimum at the 45th epoch. Figure 8b shows the training accuracy of the three models with different epoch settings. All three models generally showed an increasing trend. Model A in the 8th epoch to 40th epoch did not meet the saturation. Model B in the 17th and 38th epochs showed a decline in training accuracy. Model C in the 45th epoch achieved the local maximum accuracy. Figure 8c shows the validation accuracy of the three models with different epoch settings. Model A had the lowest verification accuracy. Model B was moderately complex with convolution occurring during pooling, and its verification accuracy was high. However, a substantial decline in the 15th epoch to 0.92, indicated a slightly weakened stability of its performance. Model C was the most stable and accurate with a high accuracy of 0.94 at the 45th epoch (Figure 8c). *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 13 of 19

**Figure 8.** *Cont.*

**Figure 8.** DTM results after terrain corrections. (**a**) training loss, (**b**) training accuracy, and (**c**)

Model C with an epoch of 45 was selected for vegetation detection, due to its lowest loss function and highest accuracy during training and validation. When the network structure was large, the

validation accuracy of the three different network structures, with different epoch settings.

**Figure 8.** DTM results after terrain corrections. (**a**) training loss, (**b**) training accuracy, and (**c**) **Figure 8.** DTM results after terrain corrections. (**a**) training loss, (**b**) training accuracy, and (**c**) validation accuracy of the three different network structures, with different epoch settings.

validation accuracy of the three different network structures, with different epoch settings.

Model C with an epoch of 45 was selected for vegetation detection, due to its lowest loss function and highest accuracy during training and validation. When the network structure was large, the Model C with an epoch of 45 was selected for vegetation detection, due to its lowest loss function and highest accuracy during training and validation. When the network structure was large, the epoch should be increased appropriately to ensure that the parameters were updated. Merging combined the features of convolution and enhanced the upper sampling of data.

#### *4.2. Comparison of Vegetation Detection Performance with Other Methods*

Two other methods, namely, perceptron [59] and adaptive filtering [60] were selected for comparison to additional assessment of vegetation detection. Precision, recall, and F-score values were used for validation. Precision indicated the extent to which the extraction result represented the real target and the error of the model. Prediction was positive when the following values were obtained—true positive (TP) and false positive (FP)—which indicated the extraction of the correct vegetation grid. FP indicated that the ground sample was predicted as a vegetation sample, and FP was a 'false positive' situation. True negative (TN) was achieved when the results predicted for the ground was also a ground sample. Recall represents the extent to which real targets can be extracted and indicates the model's leakage. Predicting vegetation as true (TP) and vegetation samples as ground samples were a false negative (FN), i.e., no vegetation samples were extracted. Precision was the number of samples that were positive relative to the predicted positive, and recall was relative to the number of positive samples in the original sample. The F-value was the reconciliation average of precision and recall. The formulas for precision, recall, and F-values were as follows (Equations (4)–(6).

$$\text{Precision} = \text{TP/(TP} + \text{FP)}\tag{4}$$

$$\text{Recall} = \text{TP/(TP} + \text{FN)}\tag{5}$$

$$F = 2 \times \text{Precision} \times \text{Recall} (\text{Precision} + \text{Recall}) \tag{6}$$

Figure 9 shows the precision, recall, and F-value of the three methods. Our improved U-Net architecture had the highest values for all three study areas. Particularly, the best identification result was observed in Xining with a precision of 0.91. The performances of last two methods were seriously weaker than that of our improved U-Net architecture. Perceptron lacked the hidden layer and did not introduce random deviation. The final classification result was based on the hyperplane, which could not adapt to the complex terrain, resulting in a low accuracy for vegetation detection. Adaptive filtering was excessive in vegetation recognition, and its results depended on the sketched vegetation

respectively.

**5. Conclusions** 

range. This phenomenon required the manual sketching of the training area, as a supervised area for vegetation recognition in each study area. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 15 of 19

**Figure 9.** Comparison of accuracy under three methods (FCN by our U-net based method, Perceptrons by Kwak et al., 2007 and Adapative filtering by Hu et al., 2019). (**a**) Xining; (**b**) Wangjiamao; and (**c**) Wucheng. Dark green, orange, and blue bars are Precision, Recall, and F-value **Figure 9.** Comparison of accuracy under three methods (FCN by our U-net based method, Perceptrons by Kwak et al., 2007 and Adapative filtering by Hu et al., 2019). (**a**) Xining; (**b**) Wangjiamao; and (**c**) Wucheng. Dark green, orange, and blue bars are Precision, Recall, and F-value respectively.

#### **5. Conclusions**

This study proposed a UAV photogrammetric framework for terrain modeling in dense vegetation areas. With the loess plateau of China as the study area, a DL and terrain correction ensemble method was proposed and applied. An improved U-net network for vegetation segmentation was presented. The feature combination of RGB+DSM was used for vegetation detection. According to four-fold cross-verification, the accuracy was 94.97%, and the model had a good generalization ability. The influence of U-Net architecture and parameter epoch setting on vegetation detection performance was also assessed. Comparison with other methods confirmed the better performance of the proposed technique. Fine DTM generation method for terrain modeling was also put forward. The vegetation area was divided into discrete and continuous, and adaptive terrain correction was proposed and realised. DTM accuracy was evaluated with the field measurements. This framework could be applied in dense vegetation, with an advantage of low-cost UAV photogrammetry when laser scanning was limited.

**Author Contributions:** Conceptualization, J.N. and K.X.; algorithm, J.N. and K.X.; classification analysis, J.N.; process the data, K.X.; writing—original draft preparation, J.N. and K.X.; writing—review and editing, H.D., J.S. and N.P.; supervision, L.X. and G.T.; funding acquisition, L.X. and G.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was financially supported by the Natural Science Foundation of China, grant numbers 41930102, 41971333, and the Priority Academic Program Development of Jiangsu Higher Education Institutions (No. 164320H116).

**Acknowledgments:** The authors sincerely thank for the comments from anonymous reviewers and members of the editorial team.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
