**1. Introduction**

Wetlands play an important role in the global ecosystem. Wetlands are easy to degrade, and their shrinkage is more obvious in arid areas. Wetlands, as an important carrier of water resources in arid areas, have irreplaceable ecological functions. In an arid area, wetlands are very sensitive to changes in soil environmental factors. In arid and semi-arid areas, changes in soil physical and chemical properties may lead to changes in wetland ecology and climate [1]. Especially in arid and semi-arid areas, the changes in soil physical and chemical properties may lead to regional ecological and climate change. Comprehensive

**Citation:** Wang, J.; Wang, W.; Hu, Y.; Tian, S.; Liu, D. Soil Moisture and Salinity Inversion Based on New Remote Sensing Index and Neural Network at a Salina-Alkaline Wetland. *Water* **2021**, *13*, 2762. https://doi.org/ 10.3390/w13192762

Academic Editors: Ying Zhao, Jianguo Zhang, Jianhua Si, Jie Xue and Zhongju Meng

Received: 16 August 2021 Accepted: 25 September 2021 Published: 6 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

stress of soil moisture and salinity is one of the principal factors that restrict wetland restoration in arid and semi-arid areas [2]. For example, soil salinization exerts a strong stress on the wetland and its surrounding vegetation, which leads to blocked vegetation growth and land degradation to a certain extent, and may lead to abnormal succession of ecological community, ultimately leading to the decline of wetland ecological function and productivity [3,4]. Soil moisture is used to measure available water in the soil. Soil moisture stress can result in serious desertification, degradation of vegetation productivity and change of regional microclimate in waterless areas [5]. Therefore, it is of great significance to accurately grasp the spatial distribution and formation of soil water and salt in arid and semi-arid areas for controlling natural disasters, such as land desertification and drought, as well as for protecting and restoring wetlands in arid and semi-arid areas.

In order to obtain detailed information of soil moisture and salinity distribution, traditional measurements of soil properties require large numbers of experimental samples, long-term observation of soil characteristics and surrounding vegetation information as support, which requires a huge amount of manpower, financial resources and time [6] and is not suitable for long-term monitoring due to the influence of weather. For example, in the case of bad weather conditions (rainstorm, strong wind, etc.), the field survey work of field surveyors is greatly hindered. Therefore, remote sensing has become an ideal technology for identifying, monitoring and successfully retrieving soil moisture and salinity content with its unique advantages of macroscopic, comprehensive, fast information acquisition, short cycle and dynamic reflection of the changes on the ground [7]. Soil moisture is usually retrieved by optical remote sensing or microwave remote sensing. In optical remote sensing, the common method includes using Landsat, IKONOS or MODIS multispectral data to establish the corresponding water index, drought index or vegetation index (such as the most commonly used vegetation index—the normalized vegetation index, NDVI [8,9]) to extract soil moisture [10–12]. It also contains the use of surface temperature or thermal inertia to realize soil moisture inversion [13]. Sinha et al. studied the emission characteristics of visible and near infrared spectra of alluvial soil, red loam and black cotton soil, and found that the three soils under different conditions were negatively correlated with soil moisture [14]. At the same time, some researchers have carried out studies on soil moisture and soil multispectral characteristics, indicating that visible, mid-infrared and near-infrared lights are significantly correlated with soil moisture [15]. Comparatively, spectral data in the mid-infrared band have the highest correlation with soil moisture, indicating the best effect. Ghulam et al. proposed the modified vertical drought index (MPDI) to monitor soil water content by analyzing the spectral characteristic space of soil water and found that although MPDI had a good monitoring effect, its application scope was limited [16]. Subsequently, Zhang et al. proposed a new index, RDMI, to monitor soil moisture based on visible red light and near-infrared spectral information. After verification, they found that RDMI had a strong negative correlation with soil moisture and the monitoring effect was better than that of the MPDI index [17]. Microwave remote sensing inversion of soil moisture has a certain theoretical basis, which is that there is a definite linear relationship between the backscattering coefficient and soil moisture [18]. Commonly used models are, among others, the Oh model, Dubois model and water cloud model [19]. For remote sensing inversion of soil moisture, optical sensors have a spatial resolution superior to microwave sensors and can also provide vegetation information. The algorithm is mature, but it is restricted to regional studies due to atmospheric influence. Microwave remote sensing can penetrate clouds and fog and observe all weather events. It has a complete theory, but it is severely affected by vegetation and has poor spatial resolution, which is suitable for large-scale research. Wang et al. studied the influence of different surface roughness on soil water inversion [17]. Bindlish et al. retrieved soil moisture by constructing microwave soil water scattering model (AIEM model) and found that the correlation between AIEM model and actual soil moisture was as high as 0.95, indicating that the microwave remote sensing model could well retrieve soil moisture under certain conditions [20]. In addition, some new algorithms, such as neural network [21],

support vector machine [22], have been integrated into soil moisture inversions. There is a spatial correlation between soil salinity and soil moisture, and the inversion method is similar to that of soil moisture. Currently, the commonly used method is to extract salinity indices from multi-spectral data and establish a multivariate model with soil salinity. In addition, innovative modeling techniques (such as genetic algorithm, etc.) are also applied [21]. Microwave remote sensing cannot directly measure soil salinity, but its surface reflectance has a significant correlation with soil conductivity [23]. In addition, microwave remote sensing needs to remove the influence of vegetation in a vegetationcovered area. Therefore, it is most suitable for areas with a bare ground surface.

The Ebinur Lake Basin is suffering from consequences of many severe salt dust storms and the lake is drying up gradually, which is why we chose the Ebinur Lake Basin as the study area. The purpose of this study was to: (1) find a fast index and method to calculate the soil moisture and salinity of the Ebinur Lake Basin; (2) obtain the spatial distribution law of soil moisture and salinity in the Ebinur Lake Basin; (3) explain the reason of the difference of soil moisture and salinity distribution in the Ebinur Lake Basin. In this paper, based on the field measured data, the indexes with high correlation with soil salinity and soil moisture were selected from the common vegetation index, salinity index (including our newly constructed indexes D1, D2) and water index. Prediction models that included regression function model, neural network model and support vector machine regress model were established, from which the models with high fitting accuracy and verification accuracy were selected for spatial inversion of soil characteristics using Landsat 8. The distribution rules and causes of soil water and salt in the study region were analyzed to provide a scientific basis for wetland restoration in arid areas.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The Ebinur Lake is the largest saltwater lake in Xinjiang, China, situated on the southwest of the Junggar Basin in central Asia [24]. The Ebinur Lake Basin boundary used in this study was obtained from DEM hydrological analysis. The Ebinur Lake Basin is located between 44◦240 N and 45◦120 N, and between 81◦560 E and 83◦510 E (Figure 1a). The Ebinur Lake is at the lowest elevation in the basin, about 189 m. The basin belongs to the temperate continental arid climate, with annual average temperature of 7–8 ◦C, annual average precipitation of 90.9 mm and evaporation of 1662 mm. The evaporation is much higher than the precipitation. The Boltala River, Jinghe River, Kuitun River and Akeqisu River flow into the lake from different directions, becoming the main water source for the lake area [25], but the incomes are not sufficient, and the lake is shrinking day by day. From 2004 to 2015, the area of the Ebinur Lake in a dry and wet season decreased by 461.98 km<sup>2</sup> and 322.04 km<sup>2</sup> , respectively [26], which greatly accelerated the desertification process in the surrounding areas of the basin. The speed of desertification has reached 38 square kilometers per year. The northwest part of the study area is the Alashan Pass, with 164 days of annual heavy windy days, up to 185 days at most and 55.0 m/s of maximum wind speed [27]. The Ebinur Lake has a high mineralization degree and a prosperous salt ion content. The shrinkage of the lake surface results in the exposure of a large range of dry lake bottom, a large amount of salt dust in the saltwater lake was dried up and exposed to the ground, which causes serious soil salinization. The unique topography of the Ebinur Lake Basin includes a variety of landscapes, such as rocky desert, gravel desert, desert, salt desert, swamp and salty lake. The corresponding typical zonal soil is grey desert soil, grey brown desert soil and aeolian sandy soil, and the intrazonal soil is salt (salinity) soil, meadow soil and marsh soil [28] (Figure 1c). Built on keen wind and abundant salt sources, sandstorms are very common in this area, and salt dust storms frequently erupt [21]. The Ebinur Lake Basin has been the second largest source of salt and dust storms as well as sandstorms in the world. The drought is intensifying, and the ecological environment is deteriorating.

**Figure 1.** General map of the study area: (**a**) location of the study area; (**b**) land use in the Ebinur Lake Basin; (**c**) soil type in the Ebinur Lake Basin; (**d**) distribution map of soil sampling points.

#### *2.2. Field Soil Sampling Experiment*

Field soil experiment was made on 15 April 2017. In order to better map the soil moisture and salinity situation in the Ebinur Lake Basin, 60 sampling points were randomly distributed. The distribution of sampling points is shown in Figure 1d. At each sampling point, Stevens Portable Hydra Data Reader (Hydra-Reader) was used to obtain soil basic property data directly with Hydra probe, GPS was used to record longitude and latitude. The probe was cleaned with a polishing cloth before use to avoid increasing the error. In addition, soil specific calibration was required to ensure accuracy before measuring soil moisture [29]. At each sample point, the metal probe of the instrument was inserted vertically clockwise into the soil at a depth of about 5 cm to ensure the full contact of the soil with the metal probe. The response time was 10–20 s. The data posted by the instrument were the actual measured values, including soil type, soil temperature, soil volumetric moisture (m<sup>3</sup> m−<sup>3</sup> ), soil electrical conductivity (S/m) and temperature-corrected conductivity (S/m). Hydra-Reader has a data download service for later analysis. The electrical conductivity can be regarded as another form of expression of soil salinity. In this study, soil electrical conductivity was invoked as a measure of soil salinity, and soil volumetric moisture was used as a measure of soil moisture.

In order to ensure the accuracy of the data at the sampling points, we measured each sampling point for 6 times and took the average value as the final value. In addition, a ground object spectrometer was used to measure the reflection spectrum of the sampling point. In order to facilitate the correspondence with Landsat data, we selected the band corresponding to the central wavelength of Landsat band range in the reflection spectrum as the calculation index, so as to reduce the scale problems caused by a direct use of Landsat images. For the measured data of 60 sampling points, we randomly select 35 points as regression fitting data and 25 points as verification analysis data.

#### *2.3. Satellite Image and Data Processing*

The Landsat 8 OLI has become the data source of this study because of its convenient acquisition, moderate spatial resolution and rich band information. Its spatial resolution is 30 m, and it is free to access, including 9 bands of information, which are coastal, blue, green, red, near infrared, two short-wave infrared, two thermal infrared, panchromatic (15 m resolution) and cirrus band [30]. The date of image acquisition was 19 April 2017, which was close to the field experiment time, reducing the time error and providing more accurate soil water and salt information. The datum was obtained from the official website of the U.S. Geological Survey (USGS, https://earthexplorer.usgs.gov/ accessed on 30 July 2019).

Landsat 8 OLI data were pre-processed by the ENVI 5.3 software, including radiation calibration, atmospheric correction and mask. The ENVI 5.3 FLAASH module was used for atmospheric correction.

#### 2.3.1. Calculation of Spectral Index

For this study, we selected some spectral indices with a certain universality and applicability related to water and salt, used by previous research institutes, as shown in Table 1. The criterion for index selection is to select those indexes that have been successfully applied in the soil water-salt inversion based on Landsat 8 data, or indexes whose application regions are in arid and semi-arid areas. SI, SI1, SI2, SI3 (four different salinity indices), S1, S2, S3, S4, S5, S6 (six different salinity calculation indicators) and NDSI (normalized salinity index) are salinity indices calculated from remote sensing data, from which surface salinity information can be obtained [31,32]. OLI\_SI (Landsat 8 OLI salinity index) is a soil salinity index in an irrigation region based on Landsat 8 data. It has a strong correlation with conductivity EC [33]. SIvir (visible light remote sensing salinity index) is a newly developed spectral salinity index, which has the potential to reveal soil salinity in arid climatic conditions. Int1 and Int2 are intensity indices in the visible light range, BI is a brightness index [34], which can highlight and detect different soil salinity levels [35]. NDWI (normalized water index) and LSWI (land surface water index) are water body indices, which can be used to identify land surface water body and to distinguish moisture in soil. VCI is a vegetation status index, which uses crop growth changes to reflect the degree of water information threatening crops in the region, so that it can express the regional soil moisture difference [36]. VSDI (visible and shortwave infrared drought index) is a surface drought index, which is suitable for different land cover types and less affected by vegetation. It can reflect the difference of surface soil moisture from another perspective [37]. ATI is an apparent thermal inertia, which can reflect the thermal characteristics of soil by the difference of surface temperature. It is closely related to soil moisture and has a useful application in bare soil or low vegetation coverage area [38]. EVI (enhanced vegetation index), NDVI (normalized vegetation index), SAVI (soil-adjusted vegetation index) and OSAVI (optimized soil-adjusted vegetation index) are widely used vegetation indices, which can reflect the growth status of vegetation and soil characteristics.

D1 and D2 are the spectral indices newly created for the soil salinity inversion in arid regions based on the Landsat 8 spectral characteristics of the arid region. We analyze the correlation between various bands of Landsat 8 data and the measured data, select several bands with high correlation, and combine them together. The difference between D1 and D2 is that the weight coefficients of each band is different. The weight coefficients are two sets of data obtained by analyzing the regional environment of the Ebinur Lake Basin.

The above spectral indices were calculated in the Band Math module of ENVI5.3 software.


**Table 1.** Spectral indices and their calculation formulas.

Note: RCoastal, RGreen, RBlue, RRed, RNIR, RSWIR are the corresponding bands of Landsat 8, NDVImin is NDVI minimum, NDVImax is NDVI maximum, the pixel-by-pixel value is NDVI<sup>i</sup> , α is full-band reflectance; Td, T<sup>n</sup> are daytime and night temperatures on the same day, respectively.

#### 2.3.2. Inversion of Surface Temperature

The remote sensing inversion method of land surface temperature adopted in this study was the radiation transfer equation method. Its basic principle is to divide the radiation received by the sensor into three parts: surface, up-going and down-going radiation. If the intensity of atmospheric radiation is estimated, the surface radiation intensity can be obtained, and using it, the surface temperature can be calculated. The calculation formula is the Equations (1)–(3) [41].

$$\mathcal{L} = \left[ \varepsilon \mathcal{B}(\mathcal{T}\_s) + (1 - \varepsilon) \mathcal{L}\_{\text{down}} \right] \tau + \mathcal{L}\_{\text{up}} \tag{1}$$

$$\mathbf{B}(\mathbf{T\_s}) = \left[\mathbf{L} - \mathbf{L}\_{\text{up}} - \tau(\mathbf{1} - \varepsilon)\mathbf{L}\_{\text{down}}\right] / \tau\varepsilon \tag{2}$$

$$\mathbf{T\_s} = \mathbf{K\_2} / \ln \left( \frac{\mathbf{K\_1}}{\mathbf{B(T\_s)}} + 1 \right) \tag{3}$$

where L is surface specific emissivity; T<sup>s</sup> is surface temperature, the unit is K; B(Ts) is blackbody thermal radiation brightness, Lup and Ldown are upward and downward atmospheric radiations, respectively; T is atmospheric transmittance in the thermal infrared band. For Landsat 8 TIRS, K<sup>1</sup> = 774.89 w/(m<sup>2</sup> <sup>×</sup> <sup>µ</sup><sup>m</sup> <sup>×</sup> sr), K<sup>2</sup> = 1321.08 <sup>×</sup> <sup>K</sup>1; *<sup>τ</sup>*, *<sup>e</sup>* are available on the NASA website (http://atmcorr.gsfc.nasa.gov/ accessed on 30 July 2019).

#### 2.3.3. Construction of Temperature Vegetation Drought Index (TVDI)

TVDI is based on the simplified NDVI-Ts feature space, which is a parameter to represent soil surface characteristics. They consider that there is a certain soil moisture isoline in the characterized space [44]. In NDVI-Ts space, the pixel drought index is 1 on the dry edge, representing complete water shortage; 0 on the wet edge, representing complete water stress; and 0–1 on the data points between the dry and wet edges. The greater the TVDI value, the more serious the relative drought. The TVDI calculation is presented in Equation (4).

$$\text{TVDI} = \frac{\text{T\_S} - \text{T\_{Smin}}}{\text{T\_{Smax}} - \text{T\_{Smin}}} \tag{4}$$

where T<sup>s</sup> is the surface temperature of a pixel; Tsmin is the lowest temperature (wet edge temperature) at the NDVI value of the pixel; and Tsmax is the highest temperature (dry edge temperature) at the NDVI value of the pixel.

By calculating the maximum and minimum surface temperature corresponding to each NDVI range at a certain interval, the equation of dry edge and wet edge can be fitted (Equations (5) and (6)).

$$\mathbf{T\_{smax}} = \mathbf{a\_l} \times \mathbf{N} \mathbf{D} \mathbf{V} \mathbf{I} + \mathbf{b\_l} \tag{5}$$

$$\mathbf{T\_{smin}} = \mathbf{a\_2} \times \text{NDVI} + \mathbf{b\_2} \tag{6}$$

#### 2.3.4. Spike Cap Transformation

Spike cap transform is an orthogonal transformation of a remote sensing image, projecting image information into multi-dimensional space to obtain six components that are important to vegetation, soil and water body, such as greenness, humidity and brightness, and their shapes are similar in multidimensional space. Its shape in multidimensional space is the same as that of a hat, so it is called spike cap transformation. Spike cap transformation has been applied to observe the relationship between soil moisture, vegetation cover and canopy conditions [45]. Humidity band can be used to extract soil moisture information. For Landsat 8 OLI images, the conversion coefficient of humidity component is shown in Equation (7).

$$\text{Netness } = 0.1511B\_2 + 0.1973B\_3 + 0.3283B\_4 + 0.3407B\_5 - 0.7117B\_6 - 0.4559B\_7 \quad \text{(7)}$$

where B2, B3, B4, B5, B<sup>6</sup> and B<sup>7</sup> are Landsat 8 blue band (Blue), green band (Green), red band (Red), near infrared band (NIR), short wave infrared 1 (SWIR1) and shortwave infrared 2 (SWIR2), respectively.

## *2.4. BP Neural Network Model*

BP neural network is a multilayer feedforward neural network. Its main characteristics are forward signal transmission and back error propagation. In forward transmission, the input signal is processed layer by layer from the input layer to the output layer. The state of neurons in each layer only affects the state of neurons in the next layer [46]. If the output layer cannot get the expected output, it will turn to back propagation, adjust the weights and thresholds of the network according to the prediction error, so that the predictive output of BP neural network keeps approaching the expected output [47]. Before BP neural network prediction, the network must be trained to have associative memory and prediction ability. The main steps are as follows: (1) network initialization; (2) hidden layer output calculation; (3) output layer output calculation; (4) error calculation; (5) threshold updating. The commonly used algorithm of BP neural network model is gradient descent, which adjusts the changes of threshold and weight of neurons along the direction of negative gradient [48].

It can be seen from the above process that the BP neural network model has strong nonlinear mapping ability and can establish a multivariate nonlinear relationship between independent variables and dependent variables, so it is widely used in remote sensing monitoring. Soil water content is a complex nonlinear coupling system, which is influenced by topography, artificial irrigation and natural environment. Only by fully considering the influence of various factors can the inversion accuracy of soil water content and salt content be improved.

Depending on the characteristics of fitting non-linear function, the BP neural network constructed in this study determined an input parameter, namely various spectral indices, an output parameter, namely soil volume moisture or temperature-corrected conductivity. The BP neural network structure was 1–3–1; that is, there were one node in the input layer, three nodes in the hidden layer and one node in the output layer. All operations were conducted in MATLAB R2015.

#### *2.5. SVR Support Vector Machine Regress Model*

Support vector machine (SVM), such as multilayer perceptron network and radial basis function network, can be employed to pattern classification and nonlinear regression [49]. The core idea of an SVM is to establish a classification hyperplane as a decision surface to maximize the isolation edge between positive and negative examples. The theoretical basis of SVR is statistical learning theory, and more precisely, SVR is an approximate realization of structural risk minimization. SVR follows the principle of structural risk minimization and shows advantage in solving small sample, non-linear and high-dimensional pattern recognition problems. Unlike traditional machine learning methods, such as artificial neural networks, which follow the principle of empirical risk minimization, SVR avoids over-fitting, poor local optimization ability, difficulty in parameter adjustment and slow convergence. In support vector machine regression, penalty parameter C and nuclear parameter γ determine the complexity, accuracy and type of the regression model.

The introduction of kernel function can greatly improve the ability of support vector machines to deal with nonlinear problems, and at the same time maintain the intrinsic linearity of support vector machines in high dimensional space. The commonly used kernel functions mainly include linear kernel function, polynomial kernel function, radial basis kernel function and sigmoid kernel function. Since different kernel function types and parameters in SVR have a great influence on the generalization ability of the model, it is necessary to study and determine the kernel function types and parameters. Gaussian radial basis kernel function (RBF kernel function) has a good generalization ability and can support nonlinear regression [50]. In this study, we selected the RBF kernel function.

The grid method is to try the combination of various penalty parameters and RBF kernel function parameters in a certain range, and then conduct data training for each parameter combination and select the parameter combination with the best effect as the optimal parameter [51]. During data training, the k-fold cross-validation method was adopted. N-1 data were set as training data, and the rest were set as test data. The generalization error was determined by the mean value of MSE (root mean square error) after K times of calculation. The grid method parameter optimization is a violent enumeration method; if the amount of data is very large, it is a time-consuming approach, but it is also a very safe approach.

The specific operation was implemented by running the LIBSVM toolkit in the environment of MATLAB R2015b.

#### *2.6. Correlation Analysis Model*

Correlation analysis is a statistical analysis method that studies the correlation between two or more random variables with equal statuses. It is a process of describing the closeness of the relationship between objective things and using appropriate statistical indicators. The degree of correlation between the two variables is represented by the correlation coefficient r. The value of the correlation coefficient r is between −1 and 1; it can be any value within this range. In the case of positive correlation, the r value is between 0 and 1, and the scatter plot is obliquely upward. In this case, one variable increases, and the other variable increases. When the correlation is negative, the r value is between −1 and 0, and the scatter plot is diagonally downward, at which point one variable increases and the other variable decreases. The closer the absolute value of r is to 1, the stronger the correlation between the two variables, and the closer the absolute value of r is to 0, the weaker the degree of association between the two variables.

#### *2.7. Statistical Analysis Method*

The field observation data were randomly divided into analysis data and verification data, their ratio being 2:1. The basic statistical analysis of the analysis data included correlation analysis, regression analysis, linear and non-linear model fitting accuracy analysis. Among them, Pearson correlation coefficient was used as the correlation index, and its significance test was conducted. Regression analysis refers to a statistical analysis

method that determines the quantitative relationship between two or more variables. Regression analysis was performed using analytical data to establish a regression model. The model with superior fitting accuracy was validated and analyzed with validation data, and the index used was root-mean-square error.

#### **3. Results**
