**Landslides Information Extraction Using Object-Oriented Image Analysis Paradigm Based on Deep Learning and Transfer Learning**

**Heng Lu 1,2, Lei Ma 3,4,5, Xiao Fu 6, Chao Liu 1,2, Zhi Wang 1,2, Min Tang <sup>7</sup> and Naiwen Li 1,2,\***


Received: 31 December 2019; Accepted: 22 February 2020; Published: 25 February 2020

**Abstract:** How to acquire landslide disaster information quickly and accurately has become the focus and difficulty of disaster prevention and relief by remote sensing. Landslide disasters are generally featured by sudden occurrence, proposing high demand for emergency data acquisition. The low-altitude Unmanned Aerial Vehicle (UAV) remote sensing technology is widely applied to acquire landslide disaster data, due to its convenience, high efficiency, and ability to fly at low altitude under cloud. However, the spectrum information of UAV images is generally deficient and manual interpretation is difficult for meeting the need of quick acquisition of emergency data. Based on this, UAV images of high-occurrence areas of landslide disaster in Wenchuan County and Baoxing County in Sichuan Province, China were selected for research in the paper. Firstly, the acquired UAV images were pre-processed to generate orthoimages. Subsequently, multi-resolution segmentation was carried out to obtain image objects, and the barycenter of each object was calculated to generate a landslide sample database (including positive and negative samples) for deep learning. Next, four landslide feature models of deep learning and transfer learning, namely Histograms of Oriented Gradients (HOG), Bag of Visual Word (BOVW), Convolutional Neural Network (CNN), and Transfer Learning (TL) were compared, and it was found that the TL model possesses the best feature extraction effect, so a landslide extraction method based on the TL model and object-oriented image analysis (TLOEL) was proposed; finally, the TLOEL method was compared with the object-oriented nearest neighbor classification (NNC) method. The research results show that the accuracy of the TLOEL method is higher than the NNC method, which can not only achieve the edge extraction of large landslides, but also detect and extract middle and small landslides accurately that are scatteredly distributed.

**Keywords:** landslides information extraction; unmanned aerial vehicle imagery; convolutional neural network; transfer learning; object-oriented image analysis

#### **1. Introduction**

Human beings are facing, and will continue to face, challenges that affect the harmony and sustainable development of the society for a long time, such as population, resources, and environment. Among all environmental problems, geological environment is one of the most prominent [1–3]. For one thing, the geological environment is a necessary carrier and a basic environment for all human life and engineering activities. For another, it is fragile and difficult, or even impossible to restore. Landslide is a dire threat to people's lives and property and social public safety [4–6]. Geological disasters very frequently occur in China and cause tremendous loss, especially in the western mountainous areas with complex topographic and geological conditions. Landslides in these areas are generally characterized by suddenness. There is no forewarning that can be directly observed and perceived beforehand since landslide is triggered by external factors (such as heavy rainfall, earthquakes, etc.). Therefore, quick and automatic information extraction of sudden landslides has become a hot topic, also a hot potato, of the day in the landslide research in the world [7,8].

In the field of remote sensing, the extraction of landslide information that is based on satellite images or aerial images is mainly realized through the spectrum, shape, and texture features of landslides that are shown in images that are different from other surface features [9]. In the early application, the identification of landslide information and boundary extraction were mainly actualized by means of manual digitization. This method is featured by high accuracy, but, when it is necessary to process the data of a large region or to meet the disaster emergency demand, the manual digitalization operation mode is of no advantage in time and cost. In addition, if the region is segmented into several regions for different interpreters to interpret, it is inevitable that the subjectivity of different interpreters will be brought into the interpretation results [10,11]. With the development of digital image processing technology, increasing image classification algorithms have been applied to the extraction of landslide information. The reflectivity difference (spectrum information) of different surface features on the remote sensing image is used to extract the landslide region and non-landslide region. Generally, the landslide region shows a high reflectivity on the remote sensing image, which is easy to be distinguished from surface features with low reflectivity, but it is easy to be confused with bare land because the bare land also has high reflectivity [12]. Additionally, this kind of pixel-oriented method makes the classification result easily produce "salt and pepper" noise. In recent years, with the launch of more and more earth observation satellites with high-resolution sensors, the data sources for research are more and more abundant, and the object-oriented image analysis method comes into being. In the object-oriented image analysis method, the available information, such as spectrum, shape, texture, context semantics, and terrain on remote sensing images are comprehensively selected to extract the information of surface features [13–17].

In recent years, with the development of machine learning technology, more and more algorithms have been applied to the remote sensing identification of landslides. At present, the widely used machine learning algorithms include support vector machine (SVM), random forest (RF), artificial neural network (ANN), convolutional neural network (CNN), deep convolutional neural network (DCNN), etc. [18–23]. The conventional object-oriented image analysis method requires acquiring a large number of image features for subsequent classification, and carrying out a large number of feature selection experiments, which is very time-consuming and difficult to obtain accurate features completely. Deep learning (DL) and transfer learning (TL) are the fastest-developing machine learning methods that are applied to remote sensing image classification in recent years, which can automatically extract features from the original images and the extracted deep features are often very effective for processing complex images [24–26]. However, the problem is that the features outputted by the deep learning method are highly abstract, and the boundaries of actual surface features cannot be accurately obtained, and the classification result is different from the boundaries of the actual surface features. However, the object-oriented image analysis method is based on homogeneity to segment and obtain the boundaries of surface features, usually the results of segmentation are consistent with the actual boundaries of surface features. Therefore, a method integrating transfer learning and object-oriented image analysis is proposed in this paper by combining the respective advantages of the two methods. Firstly, the multi-resolution segmentation algorithm obtains an irregular segmented object, and then a regular image block is generated according to the barycenter of the segmented object, so that the

segmented object that is obtained by the object-oriented method is combined with the transfer learning. The rest of this paper are organized, as follows: In Section 2, three research sites selected in the experiment are introduced. In Section 3, the availability of four deep learning models Histograms of Oriented Gradients (HOG), Bag of Visual Word (BOVW), CNN, and TL) for landslide feature extraction were compared, and the realization process of the proposed method and the experiment steps are described in detail. The experiment results are given in Section 4 and discussed in Section 5. Finally, the full text is summarized in Section 6.

#### **2. Study Sites**

The experimental area is located in Wenchuan and Baoxing counties in Sichuan province, China. The experimental data contains high-risk areas of geological hazards UAV images. The images with spatial resolution of 0.2 m were taken in May 2017. The UAV that is used in the experiment is a fixed wing, which is equipped with SONY Cyber shot DSC RX1 digital camera. The relative flying height predetermined for this experiment is 600 m, the longitudinal overlap is 70%, and the lateral overlap is 40%. Small squares (marked in green, blue, and red) are selected for the experimental purpose, as shown in Figure 1.

**Figure 1.** Location of study area.

#### **3. Methods**

Firstly, the acquired UAV images were pre-processed to generate orthoimages. Subsequently, the image objects were obtained by multi- resolution segmentation, and calculating the barycenter of each object was undertaken to generate the samples for deep learning landslide (including positive and negative samples). Next, HOG, BOVW, CNN, and TL landslide feature models were compared, and found that the TL model had the best feature extraction effect. Therefore, TL model and object-oriented image analysis (TLOEL) method based on TL model and object-oriented image analysis was proposed. Finally, the TLOEL method was compared with the NNC method. Figure 2 shows the research process of this paper.

**Figure 2.** The workflow of landslides information extraction from high-resolution UAV imageries.

#### *3.1. Preprocess of High-Resolution Images*

The digital camera on UAV is of non-metric type, so the images are subject to serious lens distortion. Therefore, distortion correction shall be carried out based on distortion parameters of the camera. Meanwhile, the exposure time intervals and different weather conditions in the flight course will result in chromatic aberration, so color and light uniformizing shall be carried out with the mask method. Preliminary image sorting and positioning can be carried out for matching homologous points of adjacent image pairs based on the aircraft attitude parameters recorded by the flight control system. After the matching of homologous points, block adjustment can be made based on the conditions of collinearity equation. After that, the coordinates of ground control points may be incorporated to realize absolute orientation, so as to obtain the corrected orthoimages [27,28].

#### *3.2. Segmentation*

Image segmentation is the first step of the experiment to form a basic classification unit (object) with high homogeneity. Multi-resolution segmentation has proved to be a successful segmentation algorithm in many applications. In the research stated in this paper, the multi-resolution segmentation algorithm of uniform rule is used in the eCognation 9.0 software to generate the segmented objects for the three experiment areas [29]. Using the multi-resolution segmentation algorithm requires setting three parameters: segmentation scale, color/shape ratio, and smoothness/compactness ratio. Among them, the most important parameter is segmentation scale, which determines the heterogeneity inside the object. Specifically, when the segmentation scale is too large to the classification target object, undersegmentation occurs, and small objects are "submerged" by large objects, thus resulting in mixed objects. When the segmentation scale is too small to the classification target object, over-segmentation occurs, which causes the segmentation result to be "broken" and increases the calculation burden of the subsequent classification process. The color/shape ratio reflects the ratio of spectral uniformity to shape uniformity. The smoothness/compactness ratio is used to define the smoothness or compactness of each object.

#### *3.3. Constructing Landslide Sample Library*

Image objects with irregular boundaries must be transformed into image blocks with regular shapes and fixed sizes in order to combine the TL model with the object-oriented image analysis method. The size of image block is related to the depth of CNN and it is limited to computer hardware (e.g., memory capacity). Through experiments and comparative literature, it is found that it is most suitable to choose 256 × 256 pixels as the size of image block instead of the super large CNN learning frame [30].

During the experiment, it is found that there is a lot of work to manually build a landslide sample library with positive and negative samples, so this paper studies how to realize automatic batch cutting of image blocks that are based on ArcGIS Python secondary development package (ArcPy). The image blocks after cutting are divided into positive and negative samples by visual interpretation.

The specific process flow for building a landslide sample library is as follows:

(1) Calculate a barycentric point position of each segemented objects, and take the barycentric point position as the center of the image block.

(2) Automatically generate the boundary of image block on the ArcPy platform, and clip to generate an image block according to the boundary and store it.

(3) Visually identify and distinguish an image block containing a landslide, storing the image block as a positive sample, and storing the remaining image blocks as negative samples, and removing the image blocks with no practical significance or the image blocks with too cluttered surface features.

#### *3.4. Building Landslide Interpretation Model*

#### 3.4.1. Landslides Feature Extraction Based on HOG Model

Dalal proposed HOG on CVPR in 2005 [31]. When compared with deep learning, it is a common shallow feature that is used in computer vision and pattern recognition to describe the local texture of an image, applied as a feature descriptor to perform object detection. Figure 3 shows the process of HOG feature extraction.

First, divide the images into several blocks by statistics. Afterwards, calculate the distribution of edge intensity histogram separately. Finally, assemble the block histograms together to get feature descriptors of the images. As HOG operates on the local grid unit of the image and the space field is small, it can keep well the original image geometry and optical deformation. The combination of HOG features and SVM classifiers has been widely used in image identification. The HOG feature can be used to distinguish them while considering that there is a gradient change between the landslide area and the surrounding environment in a high-resolution image.

**Figure 3.** The flow chart of Histograms of Oriented Gradients (HOG) feature extraction.

#### 3.4.2. Landslides Feature Extraction Based on BOVW Model

BOVW is an image representation model using Bag of Words (BOW). It can map two-dimensional image information into a set of visual keywords, which effectively compresses the description of the image and saves the local features of the image. The BOVW model extracts low-level features from images in the sample library, and then, given the number of cluster centers, clusters these low-level features with an unsupervised algorithm, such as K-means [32]. Here is a sequence of observations (*x*1, *x*2, ... , *xn*). Each of the observed value is a d-dimensional real-value vector. The goal of K-means is to divide these n observations into k sequences *s* = (*s*1,*s*2, ... ,*sk*), *k* < *n*, such as:

$$\text{argmin} \sum\_{i=1}^{k} \sum\_{\mathbf{x}\_{j} \in S\_{i}} \|\mathbf{x}\_{j} - \mu\_{i}\|\tag{1}$$

where μ*<sup>i</sup>* is the average of *si*.

Visual keywords ("vocabulary: represented by w1, w2, ..., wp, wr, ..., wm" in Figure 5) are obtained through the cluster centers, mapping each feature extracted from the images to the nearest visual vocabulary, where the image can be represented as a histogram feature descriptor. Figure 4 shows the feature extraction process.

#### 3.4.3. Landslides Feature Extraction Based on CNN Model

CNN was mainly used to identify distortion-invariant (e.g. displacement, scaling) two-dimensional graphics in previous applications [33]. As CNN's feature detection layer learns with training data, implicit feature extraction is adopted instead of the explicit one when CNN is used. Neurons of the same feature map are set to have the same weights, so the network can learn in parallel, which is a big advantage of CNN relative to a simple neural network [34,35].

Feature extraction is to abstract the image information and obtain a set of feature vectors that can describe the image. Extraction is the key to image classification and selection determines the final result of classification. Manual visual interpretation can produce good classification results for new data and tasks, accompanied by large workload, low efficiency, subjective randomness, and non-quantitative analysis [36]. The middle and low-level features perform well in specific classification and identification tasks. However, high-resolution image is a far cry from an ordinary natural image—the spatial spectrum changes greatly, so the middle and low levels feature extraction is not able to produce a good result in high-resolution images. With the continuous advancement in deep learning, by inputting data to extract the features layer-by-layer from bottom to top, mapping the relationships between bottom signals and top lexemes can be established, so that the high-level features of the landslide can be obtained to better present the landslide in a high-resolution image [37].

**Figure 4.** The flow chart of Bag of Visual Word (BOVW) feature extraction.

CNN avoids explicit feature sampling and learns implicitly from the training data, which distinguishes it from other neural-network-based classifiers. Feature extraction is integrated into the multi-layer perceptron through structural reorganization and weights reduction, so that gray-scale images can be directly processed and, therefore, CNN can be directly used to process image-based classification [38]. It has many advantages in image processing. (1) Input images match network topology perfectly. (2) Feature extraction and pattern classification are performed simultaneously and simultaneously generate results in training. (3) Weight sharing reduces the training parameters of the network and makes the structure of the neural network simpler and more adaptable [39,40]. A basic convolutional neural network structure can be divided into three layers, namely feature extraction layer, feature mapping layer, and feature pooling layer. A deep convolutional network can be established by stacking multiple basic network structures, as shown in Figure 5. Conv1 and conv2 represent convolutional layer 1 and convolutional layer 2, and pool1 and pool2 represent pooling layer 1 and pooling layer 2.

**Figure 5.** Construction of deep convolution neural network.

(1) Feature extraction layer: This layer is the feature extraction layer. The input of each neuron is connected with the local receptive field of the previous layer and extracts the local features. Assume that the input image *I* is a two-dimensional matrix, the size of which is γ × *c*, use a trainable filter group *k*, the size of which is *w* × *w*, to compute the convolution, and the step size is *l*. Finally, there is a *Y* output of ((γ − *w*)/*l* + 1) × ((*c* − *w*)/*l* + 1) size. Where:

$$y\_i = b\_i + \sum\_i k\_{ij} \* x\_i \tag{2}$$

where *xi* represents the input convolutional layer, *kij* represents the convolutional kernel parameters, *bi* represents the deviation value, and \* represents the convolution operation. Each filter corresponds to a specific feature.

(2) Feature mapping layer: A nonlinear function is used to map the results of the filter layer to ensure the validity of the feature, and the feature map *F* is obtained.

$$f\_i = \delta(b\_i + \sum\_{i} k\_{ij} \* x\_i) \tag{3}$$

where, δ is the activation function. Tanh, sigmoid, and softplus are common activation functions. Tanh is a variant of sigmoid, whose value range is [0,1]. The linear correction unit ReLU is the closest to the activation model of biological neurons after stimulation and has certain sparsity. The calculation is simple, which is helpful to improve the effect [41].

(3) Feature pooling layer: Theoretically, features can be acquired through convolution and then directly used to train the classifier. However, the feature dimensions of any medium-sized image are in millions after convolution, and the classifier is easily overfitted after direct training. Therefore, the pooling of convolution features, or downsampling, is needed. *F* is the convolution feature map, which is divided into disjoint regions with a size of *m* × *m*, and then calculate the average value (or maximum value) of these regions and taken as the pooling feature *P*, whose size is ((γ − *w*)/*l* + 1)/*m*! × ((*c* − *w*)/*l* + 1)/*m*! . The pooled feature dimension is greatly reduced to avoid overfitting and it is robust. Figure 6 is the flow chart of the CNN-based landslide interpretation model.

**Figure 6.** The flow chart of the convolutional neural network (CNN)-based landslide interpretation model.

#### 3.4.4. Landslides Feature Extraction Based on TL Model

In the process of training a deep learning model, problems, such as insufficient training samples, often occur. The emergence of TL has aroused extensive attention and research. Technologies that are related to machine learning and data mining have been used in many practical applications [42]. In traditional machine learning framework, the task of learning is to learn a classification model with sufficient training data. In the field of image identification, the first step is to label a large amount of image data manually and, then, based on the machine learning method, obtain a classification model, which is used to classify and identify the test images. Traditional machine learning needs to calibrate a lot of training data for each field, which will cost a lot of manpower and material resources. However, without a large number of labeled data, many studies and applications that are related to learning cannot be carried out. Generally, traditional machine learning assumes that training data and test data obey the same data distribution. However, in many cases, this assumption cannot be met, which might lead to the expiration of training data. This often requires re-labeling a large number of training data to meet the needs of training, but it is very expensive to label new data because it requires a lot of manpower and material resources.

There are many connections between the target detection of remote sensing image and natural image in nature; in many ways, they are thought of as the same problem. The goal of transfer learning is to transfer knowledge from the existing priori sample data and use the knowledge learned from an environment to help the learning task in the new environment. Moreover, there is no strict assumption, as traditional machine learning theory requires that the training data and test data should have the same distribution [43]. The weight of a new category and a classification model applicable to the target task are obtained through pre-training models in the existing classification data set, removing the neural network on top of the training model, and retraining an output layer through the target task data set. This method can shorten the training time of the model and improve work efficiency [44,45]. At present, there are many labeled natural image libraries. For example, the typical ImageNet library labeled by Stanford University, which contains millions of labeled images, includes more than 15 million high-resolution images with labels. These images are divided into more than 22,000 categories, thus making it the largest labeled image library in image identification field. The pre-training model is obtained by learning the method of image feature extraction from ImageNet library with the method of transfer learning on features.

Figure 7 is the framework of the landslide interpretation model that is described in this paper, which is obtained through transfer learning. It mainly includes three parts, namely feature learning, feature transfer, and landslide interpretation model training. Source task is the scene classification in the original deep learning. The classification model of the target task is built by transferring the network parameters and the results of the source task to the optimized target task. In Figure 7, conv represents convolutional layer, pool represents pooling layer, and FC represents the fully-connected layer.

#### 3.4.5. Reliability Evaluation of Landslide Feature Extraction Model

There are many methods for evaluating the landslide feature extraction results. The Confusion Matrix is used to verify the accuracy of the interpretation model, in accordance with the quantitative research needs of this paper [46]. Table 1 shows the evaluation indicator system.

**Figure 7.** Feature extraction of landslides by transfer learning.

**Table 1.** Related indicators of confusion matrix.


Where TP represents the positive is predicted to be a positive value; TN indicates that the negative is predicted to be a negative value; FP means that the negative is predicted to be a positive value; and, FN indicates that the positive is predicted to be a negative value. Precision, ACC, and Recall rate are defined, as shown in Formula (4).

$$\begin{array}{l} \text{FPR} = \frac{\text{FP}}{\text{FP} \pm \text{TN}};\\ \text{TPR} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN}};\\ \text{ACC} = \frac{\text{TP} + \text{TN}}{\text{Total}}. \end{array} \tag{4}$$

#### *3.5. Combination of Object-oriented Image Analysis and TL Model*

The core concept of the object-oriented classification method is that the single pixel is taken as the minimum cell under a specific scale and calculated with its neighborhood pixels in the principle of being the most suitable for each other to obtain the object of segmentation with the best homogeneity; when the segmentation on a certain scale has been completed, the new object of segmentation is taken as a cell to continue the calculation of adjacent objects and then merge to generate the object under the new scale until merging under such scale has been completed [47]. As the object-oriented classification method regards the body as an integral object, such body also has such features as spatial form, geometric length, and neighborhood relationship, in addition to the spectral characteristic under this ideological system and, therefore, its accuracy is lifted to a certain extent in comparison with that of the pixel-oriented method. Although making use of object-oriented image segmentation technology can effectively realize segmentation in ground object target, it is difficult to obtain the specific attribute information on the ground object. The structure of landslide has certain space geometry and texture features, yet it is very difficult to accurately identify the landslide from mass and abundant remote sensing data by the traditional classification method and it possesses obvious advantages to realize the structural features of a complex system by CNN interpretation.

In the process of landslides information extraction from high-resolution images, the feature extraction determines the final accuracy. A great number of samples are required to support the feature extraction due to the huge difference between high-resolution images and ordinary natural images and great change in spatial spectrum, but there is a limited number of landslides in the research region and less collectible samples and, hence, this article combines the aforesaid TL model and the object-oriented image analysis to establish a method for landslide information extraction from high-resolution UAV images that can realize large-scale scattered landslide information extraction. The process of TLOEL method is as shown in Figure 8.

**Figure 8.** The flow chart of landslides extraction by the transfer learning model and object-oriented image analysis (TLOEL) method.

#### *3.6. Landslide Information Extraction by NNC Method*

The NNC method is a mature classification method in view of its simple operation, high efficiency, and wide application scope [48]. The control group researched in the paper is the nearest neighbor classification method. The specific process is as follows: First, the sample object is selected and statistically analyzed to obtain relevant feature values, such as texture, spectrum, shape, and neighborhood information, so as to build a multi-dimensional feature space. Afterwards, the distance difference between the object to be classified and the sample is calculated, and the object to be classified is classified into the class according to the distance relationship of the features and the membership function to determine which sample class is nearest to the object to be classified. As shown in Formula (5).

$$d = \sqrt{\sum\_{f} \left[ \frac{v\_f^{(s)} - v\_f^{(\sigma)}}{\sigma\_f} \right]^2} \tag{5}$$

Where, *d* is feature distance, *f* is the feature, *s* is the sample object, σ is the object to be classified, σ*<sup>f</sup>* is the standard deviation of feature *f* value, *v* (*s*) *<sup>f</sup>* is the feature value of feature *f* of sample object *s*, and *v* (σ) *<sup>f</sup>* is the feature value of feature *f* of object to be classified σ.

#### *3.7. Accuarcy Evaluation*

The accuracy of information extraction from different ground objects is known as the classification accuracy, and it is a common standard for testing the degree of advantages and disadvantages of the classification rules. We usually modify the final classification results according to the result of assessment on the classification accuracy, and if such assessment is low, it is necessary to improve the rule definition. The methods for assessment on the classification accuracy generally fall into two categories: one refers to the qualitative accuracy assessment methods by artificial visual interpretation and the other refers to the quantitative accuracy assessment methods [49]. The artificial visual interpretation method gives consideration to certain reliability on the premise of rapid assessment, but only interpretation professionals can carry out related operation, which results in large subjectivity of the assessment results; serious divergence between the results evaluated by field investigation and visual interpretation, respectively, indicates an undesirable classification, so it is required to set the feature rules for reclassification and the accuracy assessment will not be made until the classification results are relatively identical. The accuracy assessment is usually shown in percentage and the accuracy assessment method widely applied at present is called the confusion matrix method, which is defined, as follows:

$$M = \begin{bmatrix} m\_{11} \ m\_{12} & \dots & m\_{1n} \\ m\_{21} \ m\_{22} & \dots & m\_{2n} \\ \vdots & \ddots & \vdots \\ m\_{n1} \ m\_{n2} & \dots & m\_{nn} \end{bmatrix} \tag{6}$$

where *mij* represents the total number of pixels which are assigned to Category *j* from those subordinate to Category *i* in the research region and *n* represents the total number of categories. In the confusion matrix, the greater value in the leading diagonal indicates a higher reliability in the classification results.

The common indexes of assessment on classification accuracy include overall accuracy, product's accuracy, user's accuracy, and Kappa coefficient in addition to the confusion matrix.

(1) Overall Accuracy (OA) refers to the specific value of a total number of all correct classifications and that of samplings and reflects the degree of correctness of all categories in the classification results of images. It is calculated in the following formula:

$$\text{OA} = \frac{\sum\_{i=1}^{n} m\_{ii}}{\sum\_{j=1}^{n} \sum\_{i=1}^{n} m\_{ij}} \tag{7}$$

(2) Product's Accuracy (PA) refers to the specific value of the number of pixels in correct classification from a single category and the total number of pixels in reference data of such category. It is calculated in the following formula:

$$\text{PA} = \frac{m\_{ii}}{\sum\_{j=1}^{n} m\_{ij}} \tag{8}$$

(3) User's Accuracy (UA) refers to the specific value of the number of pixels in correct classification from a single category and the total number of pixels in such category and it indicates the probability that a classified pixel authentically represents such category. It is calculated in the following formula:

$$\text{UA} = \frac{m\_{ii}}{\sum\_{j=1}^{n} m\_{ji}} \tag{9}$$

(4) The Kappa coefficient refers to an assessment index to judge the extent of coincidence between two images and range from 0 to 1. It indicates how much the classification method selected is better than the method that the single pixel is randomly assigned to any category. It is calculated in the following formula:

$$\mathbf{K} = \frac{\mathbf{N} \sum\_{i=1}^{n} m\_{ii} - \sum\_{i=1}^{n} m\_{i+} m\_{+i}}{N^2 - \sum\_{i=1}^{n} m\_{i+} m\_{+i}} \tag{10}$$

where n represents the total number of categories, *mij* represents the pixel value at Line i and Row j in the confusion matrix, N represents the total number of samples, and *m*+*<sup>i</sup>* and *mi*<sup>+</sup> are, respectively, sums of rows and lines in the confusion matrix. The Kappa coefficient is calculated by comprehensive utilization of all information in the confusion matrix and, therefore, can be used as a comprehensive index for an assessment on classification accuracy.

#### **4. Results**

#### *4.1. Preprocessing of High-Resolution Images*

First, make a correction of the image distortion based on the camera's distortion parameters, and then conduct uniform color and light processing by mask method and carry out preliminary sequencing and locating for homonymy points matching of adjacent image pair by virtue of aircraft attitude parameter data recorded in the flight control system. Finally, make the block adjustment according to the collinearity equation condition. After the completion of the block adjustment, add the coordinates of ground control points to achieve absolute orientation and then obtain the corrected orthoimage. Figure 9 shows the preprocessed UAV images.

**Figure 9.** The preprocessed UAV imageries. (**a**) Experimental image 1; (**b**) Experimental image 2; and, (**c**) Experimental image 3.

#### *4.2. Segmentation Results*

This paper uses eCognition 9.0 software for multi-resolution segmentation of the experimental regions and an ESP tool determined the optimal dimension in the process of segmentation. Table 2 shows the optimal image segmentation parameters and Figure 10 shows the segmentation results. 


**Table 2.** Optimal image segmentation parameters.


**Figure 10.** Segmentation results. (**a**) Segmentation result of experimental image 1; (**b**) Segmentation result of experimental image 2; and, (**c**) Segmentation result of experimental image 3.

#### *4.3. Establishment of Landslide Sample Library*

Based on the principle that is mentioned in Section 3.3, establish the batch of items by ArcPy for batch processing of the images from the research region. Take the barycentric point position as the center of the image block to automate clipping, numbering, and storage. Figure 11 shows the partial of sample examples clipped and stored.

**Figure 11.** Partial of sample examples clipped and stored.

It should be illustrated that the landslide samples are naturalized to a size of 256\*256 in the experiment for the convenience of rapid computer processing (feature calculation) by giving consideration to both the computer processing ability in the identification process and the spatial resolution of landslide sample images. This article makes the certain screening of the samples collected by the visual interpretation method. A large number of positive and negative samples were marked for different types of landslide in the experimental region, and the samples were expanded by means of rotation and mirroring. With the increase in the number of positive samples of landslides in different types and structures, the deep learning algorithm can express the landslide features in a deeper level, and then obtain the landslide information in a more accurate and effective manner from the image, thus providing higher accuracy of recognition. Finally, the sample library established includes 5000 positive and 10000 negative samples of landslides, which are all zoomed to a pixel size of 256 × 256. Figure 12 shows the sample examples obtained by the above method.

**Figure 12.** Example sample library. (**a**) Positive samples (**b**) Negative samples.

#### *4.4. Results of Landslide Interpretation Model Construction*

The HOG parameters in the experiment are as follows: Bin with eight histograms are used for the gradient histogram projection, a single cell is in a size of 32 × 32 pixels, each Block is composed of 2 × 2 Cells, the sample size is unified as 256 × 256 pixels, and the length of feature vector finally obtained by merging is 1568. Figure 13 shows the visualization results of the HOG feature of experimental landslide samples.

**Figure 13.** The visualization results of HOG feature of experimental landslide samples. (**a**) landslide sample 1 and HOG feature; (**b**) landslide sample 2 and HOG feature; (**c**) landslide sample 3 and HOG feature; (**d**) landslide sample 3 and HOG feature.

BOVW parameters in the experiment are as follows: 128-dimension Surf feature is extracted from local features in the experiment, the "dictionary" is established based on K-means in the process of dictionary learning and the number of dictionaries is set as 400 [50]. Finally, find the Surf feature in the dictionary to generate statistics of its histogram distribution as the BOVW feature of the image.

This article uses the ImageNet classification model Decaf pre-trained by the computer vision research group of the University of California Berkeley in establishment of TL feature at the stage of feature learning and it is composed of three convolutional layers and three fully-connected layers. At the stage of feature transfer, the output of fc6 layer is selected as the feature on the conditions that the parameters of three convolutional layers and the fc6 layer of the whole model are kept constant, as the research results of such group indicate that the output of fc6 layer has a better generalization ability when compared with that of other layers and the feature vector is in a proper length (4096). The feature vector that is obtained based on transfer learning serves as the training input of SVM and it is used for training of landslide information extraction model. This article establishes a landslide interpretation model that is based on feature transfer and makes a feasibility analysis of whether the transferred feature extraction method is suitable for landslide interpretation in consideration of the landslide data of the research region. The experiment uses t-SNE (t-Distributed Stochastic Neighbor Embedding) method to achieve the visualization of sample clustering of TL landslide interpretation model; as a data dimension reduction method, t-SNE can offer a visual representation of data point positions of high dimensional data in two-dimensional or three-dimensional space, similar to PCA, and it is widely applied for high dimensional visualization in the fields of computer vision and machine learning.

Figure 14 shows the t-SNE visualization result of the landslide interpretation model feature based on transfer learning feature. Here the feature vector in 4096 dimensions is mapped into two-dimensional space by the t-SNE method to obtain the distribution model of the transfer learning feature. It can be seen from the red circle region in Figure 14 that, in the transfer learning feature space, the landslide samples have presented obvious aggregation, and it is believed that the feature extraction method that is obtained from training and learning of original image library has explored the mapping relation between natural and landslide images, which indicates that it is feasible to establish the landslide interpretation model based on transfer learning.

**Figure 14.** The t-Distributed Stochastic Neighbor Embedding (t-SNE) visualization result of landslide interpretation model feature based on TL feature.

The experiment uses the SVM as the output classifier and it takes the square root error L2 as a loss function by use of linear kernel function and L2 regularization. 10% samples (500 positive and 1000 negative samples) are left for testing in the process of training. The training results are as follows: the optimal parameters for cross validation of HOG, BOVW, CNN, and TL features are, respectively, {'C': 8000}, {'C': 6000}, {'C': 3000}, and {'C': 4000}, and the key parameter C is obtained by six-fold (six folders) cross validation. Table 3 shows the confusion matrix.


**Table 3.** Confusion matrix.

It can be seen from the training results in Table 2 that CNN (Precision: 95.1%, ACC: 97.6%) and TL (Precision: 96.4%, ACC: 98%) have certain advantages in high-resolution images when compared with BOVW (Precision: 78.6%, ACC: 87.6%) and HOG (Precision: 66.5%, ACC: 79.1%), so this article selects CNN and TL as the extraction features to establish the landslide extraction model.

#### *4.5. Landslides Information Extraction*

Based on the principle that is mentioned in Sections 3.5 and 3.6, Figure 15 shows the results of NNC method and the TLOEL method in the three experimental areas.

(**a**) (**b**)

**Figure 15.** Results of landslide information extraction based on two methods. (**a**) Experimental image 1, based on NNC; (**b**) Experimental image 1, based on TLOEL; (**c**) Experimental image 2, based on NNC; (**d**) Experimental image 2, based on TLOEL; (**e**) Experimental image 3, based on NNC; and, (**f**) Experimental image 3, based on TLOEL.

While considering the high spatial resolution of UAV images, the validation data can be directly obtained by visual interpretation combined with field investigation. For experimental image 1, 280 landslide data verification points and 320 non-landslide data verification points were randomly acquired by manual visual interpretation and field investigation. 320 landslide data verification points and 350 non-landslide data verification points were randomly acquired by manual visual interpretation and field investigation for experimental image 2. For experimental image 3, 160 landslide data verification points and 200 non-landslide data verification points were randomly acquired by manual visual interpretation and field investigation. The validation points were superimposed with the extracted results to determine whether the extracted landslide results are correct or not. It can be obtained through the statistical calculation that the overall accuracy of landslide information extraction from the experimental image 1 by NNC method is 89.5%, with the Kappa coefficient is 0.788, and the overall accuracy by TLOEL method is 90.7%, with Kappa coefficient is 0.812. Table 4 shows the specific results. The overall accuracy of landslide information extraction from the experimental image 2 by NNC method is 90.3%, with the Kappa coefficient is 0.838, and the overall accuracy by the TLOEL method is 91.9%, with the Kappa coefficient is 0.862. Table 5 shows the specific results. The overall accuracy of landslide information extraction from the experimental image 2 by the NNC method is 88.9%, with the Kappa coefficient is 0.842, and the overall accuracy by TLOEL method is 89.4%, with Kappa coefficient is 0.871. Table 6 shows the specific results.


**Table 4.** Confusion matrix of experimental image 1.


**Table 5.** Confusion matrix of experimental image 2.

**Table 6.** Confusion matrix of experimental image 3.


#### **5. Discussion**

It can be found that the landslides in experimental images 1 and 2 are secondary disasters caused by earthquakes through analysis and field investigation. Most of them are rock landslides, and the spectral and texture characteristics of rock landslides are very similar to those of bare rock masses. This leads to the NNC method misclassifying some bare rock masses into landslides. The TLOEL method has missing classification situation when extracting such landslides. Some of them are muddy landslides; the spectral characteristics of these landslides are similar to those of bare land. NNC method misclassifies some bare land into landslides. Although the TLOEL method has fewer misclassification, it also has some missing classification. The landslide in experiment image 3 is a landslide caused by rainfall. It is difficult to extract the landslide from turbid water and bare rock. The NNC method has the situation of missing classification and misclassifying bare rock into landslides. The TLOEL method has less misclassification, but there are also some missing classifications.

Generally speaking, TLOEL method has certain universality after the completion of sample database construction. For the same task workload, when compared with visual interpretation, this method has obvious advantages in interpretation efficiency, and it has higher accuracy than NNC landslide extraction method. However, in the process of object-oriented image analysis, simple image segmentation will obtain a large number of image objects, and subsequent information extraction will have a large amount of calculation. Landslides usually occur in areas with large topographic fluctuations. In the process of landslide extraction, the Digital Elevation Model (DEM) and slope stability model (e.g., shallow land sliding stability model) can be used to calculate the stability degree of the surface in the certain area [51]. By assigning the stability degree as a weight to the segmented image object, the region with high stability can be eliminated, and the number of objects participating in the subsequent operation can be reduced.

Through the above calculation results, it can be seen that the TLOEL method that is proposed in this paper has advantages for large-scale and scattered distribution landslide extraction, but, at the same time, it is still a complicated work to obtain training samples. In fact, many regions have historical images and historical thematic maps. If these historical data can be used in current interpretation tasks, it will inevitably improve the accuracy and efficiency of interpretation. In future research, besides feature transfer learning, we can also consider design a transfer method for surface feature category labels (associated knowledge) based on two temporal invariant object detection, so as to realize the transfer of "category interpretation knowledge of invariant surface features" from the source domain to the target domain, and to establish a new feature-object mapping relationship.

#### **6. Conclusions**

In the paper, wide hazardous areas of landslides in mountainous areas in southwestern China were researched, a landslide sample database was set up, the optimal feature extraction model, namely the transfer learning model, is selected by comparison, and a high-resolution remote sensing image landslide extraction method is proposed by combining this model with the object-oriented image analysis method. This method effectively combines the deep learning in the field of computer with the field of disaster remote sensing, and then improves the automation of landslide information acquisition in the field of high-resolution remote sensing. In addition, the landslide sample database that was established in the paper will provide important data reference for the research of the same type of landslides in southwestern China. While considering that historical archived images and surface feature category maps are available for some researched areas, how to further explore the relationship between historical data and current images and establishing the knowledge transfer framework between historical data and current images will be the key research items in the next step.

**Author Contributions:** H.L. drafted the manuscript and was responsible for the research design, experiment and analysis. L.M. and N.L. reviewed and edited the manuscript. X.F., Z.W., C.L. and M.T. supported the data preparation and the interpretation of the results. All of the authors contributed to editing and reviewing the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation of China(41701499), the Sichuan Science and Technology Program(2018GZ0265), the Geomatics Technology and Application Key Laboratory of Qinghai Province, China(QHDX-2018-07), the Major Scientific and Technological Special Program of Sichuan Province, China (2018SZDZX0027), and the Key Research and Development Program of Sichuan Province, China (2018SZ027, 2019-YF09-00081-SN).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Remote Sensing* Editorial Office E-mail: remotesensing@mdpi.com www.mdpi.com/journal/remotesensing

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18