**1. Introduction**

Lakes are generally facing rapid decline in the arid and semi-arid regions of the world [1–4]. The Inner Mongolia Plateau is a semi-arid area with much more evaporation than precipitation [5], but there are hundreds of lakes with an area of more than 1 km<sup>2</sup> distributed on the plateau [6], such as the Daihai Lake, Huangqihai Lake, Dali Lake, and Wuliangsuhai Lake. These lakes are mainly distributed in a bead chain shape in the Altun boreal margin of Yinshan mountain fracture, which is one of the two giant faults in the China zone [7,8]. Although the area of these lakes of the plateau has decreased by 30.3% between 1987 and 2010 [9], the value was even below 35.3% in Central Asia, another typical arid and semi-arid area, from 1990 to 2007 [6]. In order to figure out this phenomenon, determining the recharge and discharge relationship of lakes is a key step.

Faced the situation of shrinking lakes, there are two different perspectives on the relationship between water recharge and emissions. Firstly, local precipitation is the only source of recharge for these lakes. The supply of lakes receives rivers and groundwater runoff, all of which come from region rainfall in the basins [10]. Secondly, local precipitation cannot recharge groundwater through soil infiltration basically and the main source of recharge of lakes comes from exogenous groundwater. External groundwater recharges lakes through fault zones and other transmission channels of water [11,12]. For the first view, researchers calculated the amount of water that recharges into groundwater by precipitation infiltration into soil based on a hydraulic water balance model [13]. Nevertheless, the second view negated this opinion. Researchers verified their standpoint depending on isotope and hydrochemistry analysis. They assumed that groundwater recharge in the basin comes from precipitation, so the weighted average of hydrogen and oxygen isotopes of precipitation should be the same as that of groundwater. However, through experimental analysis, the isotopes of these lakes are obviously di fferent from those of local precipitation and overly high soil salinity in the basin. It has been concluded that precipitation cannot recharge groundwater and lakes to receive a recharging of external water [14].

The current research methods have some shortcomings. The first method assumed that groundwater transportation can only be restricted to the basin. This statement is difficult to satisfy for the plateau region where the basal fault zone develops. In the second method, due to the limited sampling, it is di fficult to collect continuous data on changes of lake water, river water, groundwater, and precipitation. The error of data analysis results is large and it is hard to quantitatively determine the groundwater recharge.

To overcome the above problems in previous studies, this paper will study the continuous change of lake area by remote sensing combined with on-site observation data and isotope analysis results to determine whether these lakes accept an external groundwater supplement. Daihai Lake (the third largest inland lake [15]) and Huangqihai lake (formerly the fourth largest inland lake in the Inner Mongolia Plateau [16]) are selected as typical research objects. Moreover, we explored the causes of lake shrinkage and the impact of lake shrinkage on the surrounding ecological environment through the surface changes of the basin.

## **2. Study Region**

The Daihai basin and Huangqihai basin are located in Inner Mongolia [17], northwest of China, which possess arid and semi-arid climate environment Figure 1. Daihai and Huangqihai are ancient inland tectonic lakes [18], which were born in the early Quaternary and the distance between the two lakes is about 64 km [19,20]. The main characteristics of this region are a dry climate, sparse precipitation, low surface wetness, and poor ecological stability. The average precipitation and evaporation in this area are 350 to 450 mm and 1800 to 2100 mm, respectively [21]. Lakes in this di fficult natural environment often play an extremely important role in the survival of animals and plants and in human activities.

The area of Daihai basin is about 2312 km2. There are Liangcheng County in the Figure 1 and the total population of the area was about 249000 as of 2013. Daihai is the third largest inland lake in Inner closed Mongolia with a maximum depth of 19.1 m [22]. Due to the large number of dams in the Daihai basin, a small number of surface paths flow into the lake [23]. Therefore, ground surface precipitation and groundwater are the main water supply for Daihai Lake. The Daihai Lake is almost shrinking every year, which has become the focus of concern for the local people.

The area of Huangqihai basin is about 4480 km2. There are Ulanchab City, Chaharyouyiqianqi County and Chaharyouyizhongqi County Figure 1 [23]. The total population of the region was about 720,000 as of 2012. The lake is a closed lake with an average water depth of three meters in 1986 [24]. The main water supply is surface precipitation, seven seasonal rivers, and groundwater. It was dried out completely in 2006 [16]. The dry Huangqihai Lake has had significant impacts on local biological, ecological, and human activities.

The surface runo ffs flowing into Huangqihai Lake come from spring water. The hydrogen and oxygen isotope of groundwater of the basin are significantly di fferent from that of precipitation and spring water. Researchers have stated that the multi-year average value of surface precipitation isotopes in Daihai basin and Huangqihai basin is δ<sup>18</sup> O = <sup>−</sup>5.4%, δD = −35%; in addition, they took 44 groundwater samples (including deep well water and spring water) in the study region, with the average value δ<sup>18</sup> O = <sup>−</sup>9.4%, δD = −74.1% [14]. It demonstrates that the main source of supply of the lake is groundwater. The massive exploitation of groundwater for agricultural irrigation is the main reason for the shrinking, or even disappearing, of the lake.

In the third section, this paper would explain that we processed the remote sensing images through deep learning method. In the fourth section, the use of remote sensing data would be combined with other related data for analysis. Finally, we discussed and summarized the analysis result.

**Figure 1.** Location and ground observation station distribution map of Daihai basin and Huangqihai basin, China.

#### **3. Materials and Methods**

#### *3.1. Flowchart and Datasets*

This study had two fundamental aspects. The processing flowchart is shown in Figure 2. There are four parts in the article (different background colors are used to distinguish). First, the light orange part is the data and method of the article. Second, the light blue part is the main evidence of the analysis. Third, the light yellow part is the main angle of the analysis. Fourth, the light purple part is the expansion analysis of the conclusion of the article. The first was to explore on time-series of two different lakes' surface area. From former study [23], we realized the lakes area had changed dramatically. In this study, three satellites are applied, namely Landsat-5 TM satellite, Landsat-7 ETM+ satellite, and Landsat-8 OLI\_TRIS satellite respectively. All Landsat data used in this study are obtained from the United States Geological Survey (USGS) website (http://glovis.usgs.gov/) and the Geospatial Data Cloud, Computer Network Information Center, Chinese Academy of Sciences website (http://www.gscloud.cn/). The necessary image preprocessing steps include radiation calibration and atmospheric correction (top-of-atmosphere, TOA) [25], which are carried out through ENVI 5.3 software. In order to observe the variation of lake area in more detail, we selected 22 remote sensing images about each lake during the past three decades from 1984 to 2018. We selected the images of production time as far as possible from April to June, because the water storage was relatively stable during this period. The lake surface data sources in the study are listed in Table 1.

**Figure 2.** Processing flowchart of this study.


**Table 1.** Remote sensing data source of study region

The other primary field was to go into the relationship between surface material composition and lake surface area in basin. Due to the limitations of clouds and available time, we selected eight images about each basin during three decades from 1986 to 2018. These images are about four to five years apart. The months of the selected samples were concentrated from May to September because this part of the time was in a relatively stable state of summer vegetation and the amount of water was relatively abundant and easy to observe. The surface material composition data sources in the study are listed in Table 2.


**Table 2.** Remote sensing data source of study basin

#### *3.2. Calculation of the Lakes Area*

At present, there are many mature methods for extracting waters from optical images [26]. In this study, we chose a deep learning approach to process images. Deep learning is a domain that has been prevalent in recent years, especially in image classification. The continuous development of many high-quality models has brought higher accuracy to image classification. Considering the multispectral properties of Landsat images and the accuracy of deep learning models, we decided to use the Pyramid Scene Parsing Network (PSPNet) [27]. All experiments are conducted on Python with tensorflow-gpu 1.14.0 and the desktop computer we used is equipped with Windows 10, Intel(R) Core (TM) i7-6800K CPU and NVIDIA GeForce GTX 1080 8G GPU.

The traditional semantic analysis is only to obtain each pixel label of the known object, while ResNet is based on the semantic segmentation of scene analysis, which is to obtain the category label of all pixels in the image. Its integrated global features are more conducive to the accurate acquisition of target pixel tags, and its algorithm e ffect is better than traditional methods [27]. For this work, it is necessary to parse all the pixels in the whole image, so this method was adopted. However, this method has so far been used less in processing optical remote sensing images, so this experiment is a combination of optical image processing and computer vision methods.

The Pyramid Pooling Module combines features of four di fferent scales Figure 3 [27]. The coarsest level highlighted in red square frame in the Figure 3 that is a single bin output generated by global pooling. The remaining three levels divide the input feature map into several di fferent sub-areas, pool each sub-area, and finally combine the pooled single bins containing the location information. In the pyramid pooling module, di fferent levels output di fferent levels of feature maps. In order to maintain the weight of the global features, we employed a 1 × 1 convolution kernel after each pyramid level. If a level dimension is N, this model can reduce the dimension of context feature to 1 / N of original feature. Then, the low dimensional feature map is directly upsampled to be the same as the original feature map by bilinear interpolation. Finally, the feature maps of di fferent levels are stitched into the final pyramid pooled global features.

**Figure 3.** Neural network framework of PSPNet.

Though neural networks can provide good performance through deep pre-training, the increase of network depth may bring additional optimization difficulties for image classification. As a module in PSPNet, the ResNet is used to extract the feature map of the input image, and ResNet solves this problem by using a skip connection in each block. In the deep ResNet, the latter layer mainly learns the residuals thrown by the previous layer. Based on the original ResNet, PSPNet adds an auxiliary loss in the fourth stage in addition to the main branch of the final classifier using softmax loss. Finally, it adds weights to balance the auxiliary loss function. The two losses are then combined, using different weights to optimize the parameters together.

Therefore, for the input image in Figure 3, we used a pre-trained ResNet model with an extended network strategy to extract the feature map. The final feature size is 1/8 of the input image, as shown above Figure 3. This work utilized the pyramid pooling module Figure 3 to obtain context information for the above feature map. The pyramid pooling module was divided into four levels, and the pooled kernel size is the whole, half, and small parts of the image. Eventually, they can be merged into global features. Then, in the final part of the Figure 3 module, this study connects the global features and the original feature map. Finally, the final prediction map was generated by convolutional layers in Figure 3.

Created data set is basic step, and we need to make a data set for the pre-processed Landsat images for training and testing [28]. The pseudo color with the combinations of Band Red, Band SWIR1, and Band SWIR2 are selected for the training process [29]. We selected eight images for each lake as training samples, and selected two images for the test sample as water data sets and manually labeled them. The annotated data set was then cut into 600 sample blocks of size 256 \* 256 \* 3. There is no doubt that the number of these samples is too small to train. Therefore, we used the operations of pan, rotate, zoom, add noise, etc. to expand the number of image samples to 30,000.

Finally, in the experiment, adopt overall accuracy (OA) to assess the accuracy of water extraction area. The overall accuracy (*OA*) can be obtained by the equation

$$OA = \frac{TP + TN}{TP + FN + FP + TN} \tag{1}$$

where *TP* and *TN* represent the pixel points whose labels are positive or negative and corresponding result is predicted to be the same while *FN* and *FP* represent the opposite. The *OA* is about 98.9%.

#### *3.3. Land Surface Classification*

Similarly, we employed PSPNet method in land surface classification. OA1 and OA2 represent the accuracy of land surface classification of Daihai basin and Huangqihai basin respectively. Classification accuracy of each category is listed in Table 3.


**Table 3.** Overall accuracy for ground classification

Obviously, classification accuracy of farmland and woodland is low. We summarized the following reasons, including low training samples because of the general quality original images and farmland pixels and woodland pixels with too many similar features. In addition, the most important reason is over reclamation in forest areas produce complex forest-staggered areas with many error classification pixels. Thus, we had to establish an innovative means to distinguish between two similar cells.

We found those farmlands are relatively gentle area, while woodlands are relatively rugged through an investigation into the basin. In other words, the classification of farmland and woodland can convert to the recognition of land type. Based on this feature, we used ASTER GDEM 30 m data to look for the relationship between DEM and land type. In this part, the random unclassified pixel is selected as the central point and then extended around to form a large cell measuring 5 × 5 (Figure 4).

**Figure 4.** (**a**): Schematic diagram of unclassified pixel processing; (**b**) Standard deviation of farmland (σ1) and woodland (σ2) with DEM.

Taking the central point as the mean value and the peripheral point as the sample value, calculated the fluctuation of the target pixel and the surrounding pixel by the standard deviation of variation (1).

$$\sigma = \frac{1}{25} \sum\_{\mathbf{i}=0, \mathbf{j}=0}^{\mathbf{i}-4, \mathbf{j}-4} (\chi\_{\mathbf{i}\parallel} - \mu), \tag{2}$$

As showed in Figure 4, calculated σ for farmland pixel and woodland pixel with 15 sample points for each in this work. In this picture, σ1 and σ1 represent farmland and woodland respectively. We draw the conclusion that the farmland pixel standard deviation value is generally less than two, and the pixel value of the woodland is the opposite. According to the above method, the OA of farmland and woodland classification is 93.7%.
