2.1. Study Area
Chuzhou City is located in the easternmost part of Anhui Province, China. Chuzhou City is situated in the lower sections of the Yangtze River plain as well as the mountainous areas between the Yangtze and Huaihe rivers. With four distinct seasons, a pleasant climate, simultaneous rainfall, and heat, Chuzhou City has a humid monsoon climate. The plain region in the lower Yangtze River basin and the hilly Jianghuai region, with the terrain being higher in the west and lower in the east, make up the bulk of Chuzhou City. There are three different types of landforms in the city, making up 8.2%, 40.4%, and 39.2% of the total land area, respectively: plain areas, hilly areas, and downland areas. Chuzhou’s land is split into two major basins: the Yangtze River and the Huai River, which both cross the Jianghuai watershed. Mingguang County, Fengyang County, Dingyuan County, and some portions of Tianchang County belong to the Huai River Basin, which makes up around 66.9% of the city’s total land. Chuzhou urban area, Lai’an County, Quanjiao County, and Tianchang County make up 33.1% of the city’s total area.
The hilly area usually has an elevation of more than 100 m and a relative height of more than 50 m. Most of the slope deposits and alluvium on the hilly platform have exposed bedrock or gravel. However, there are diluvium and coarse slope deposits beneath the slope, along with exposed bedrock or gravel. In the northwest of Dingyuan County, around Fengyang Mountain in the southwest of Fengyang County, in the northwest of Mingguang County, in the northeast and southwest of Nanqiao District, in the northwest of Lai’an County, in the southeast of Quanjiao County, and on the southwest border of Tianchang County, there are platforms and undulating terrain surrounding hills. The average elevation of these areas is between 50 and 100 m. The soil is mostly composed of Xiashu loess and has a deep soil layer. The Xiashu loess is an eolian loess from the Quaternary Late Pleistocene that is found in China’s middle and lower portions of the Yangtze River. The plain area is mostly located along the banks of the Chuhe and Huaihe rivers as well as in the lakefront areas of the Nvshan and Gaoyou lakes. It has a relative height of less than 10 m and an altitude of less than 50 m.
Chuzhou City has a large area of cultivated land, which is dominated by paddy fields, accounting for 68% of the city’s cultivated land. Dingyuan County, Fengyang County, Mingguang City, and other counties have a large area of cultivated land, accounting for 59.15% of the city’s cultivated land. Quanjiao County, Lai’an County, Dingyuan County, Tianchang City, and other counties have larger areas of paddy fields, accounting for 71.10% of the city’s paddy fields. The geographic location of the study area is shown in
Figure 1.
2.3. Image Data Acquisition and Preprocessing
According to the regional crop cultivation practices and remote sensing images without cloud cover in Chuzhou City, the remote sensing images of various periods were chosen for land use information extraction and cultivated land quality inversion. Gaofen-6 is a low-orbit optical remote sensing satellite and the first high-precision agricultural observation satellite in China, characterized by a combination of high resolution and wide coverage. Gaofen-6 is equipped with a 2 m full color/8 m multispectral high-resolution camera (PMS) and a 16 m multispectral medium-resolution wide-frame camera (WFV), with a PMS observation width of 90 km and a WFV observation width of 800 km. The revisit period of the Gaofen-6 satellite is four days, and after networking with Gaofen-1, the revisit period can reach two days. The specific information from the remote sensing images is shown in
Table 1, where the Gaofen-6 PMS images were used to extract land use information and the soil spectral index. Without clouds, the images may cover the entire research region, and the image quality is excellent.
The image was preprocessed with ENVI 5.3 (Exelis Visual Information Solutions, Boulder, CO, USA). The image was first radiometrically calibrated, which converts the image’s DN value to surface spectral reflectance. The calculation formula is as follows:
is the radiance, and
and
are calibration coefficients. Radiometric calibration coefficients are derived from the 2021 Land Observing Satellite Parameters published by the China Center for Resources Satellite Data and Application, and the absence of a labeled Bias value represents a Bias value of 0. The band information and the value of the radiance calibration parameter of the Gaofen-6 image are shown in
Table 2.
The atmospheric adjustment was then performed using ENVI’s FLAASH module to obtain the true reflectance value of surface features. Using on-site control points and one DEM picture with a 30 m resolution, the image of the research region in this study was geometrically rectified. A remote-sensing image covering the full research region was produced after image mosaicking, vector cropping, and mask extraction.
2.4. Spectral Indicators and Topographic Indicators
Using a summary of pertinent study findings, remote sensing spectrum indicators that could be connected to cultivated land quality were chosen. This study selected B1, B3, B4, and B6 from Gaofen-6 WFV images as single-band spectral indicators. These four bands represent the blue band, the red band, the near-infrared band, and the red edge band, which is sensitive to vegetation conditions. When it is difficult to obtain remote sensing images of bare soil due to the influence of weather and surface cover, the quality of cultivated land can be indirectly estimated through surface vegetation. The vegetation index combines the visible light and near-infrared reflectance spectral information sensitive to vegetation with sensors and is an indicator that can be directly obtained through remote sensing to reflect the growth status of vegetation [
23,
24]. The vegetation index, as an important indicator that can reflect soil quality, was selected as the model variable in this study. Five vegetation indices with a wide application range and strong universality were selected for analysis. The vegetation indices and their calculation formulas are shown in
Table 3, which were calculated by using the Band math tool in the ENVI software. For the outliers and invalid values encountered in the calculation of the vegetation indices, the outliers were excluded to improve the accuracy of the data.
The terrain of the study area is divided into hilly areas, downland areas, and plain areas. The altitude and terrain of the research area are shown in
Figure 2. The terrain indicators are added to the inversion model because of the great changes in terrain undulation and the great influence of terrain on the quality of cultivated land [
28]. Slope is one of the topographic factors that can most directly reflect the intensity of terrain undulation and elevation change. In soil erosion and terrain water flow simulation analysis, the slope factor is also a key factor affecting soil erosion resistance and water flow paths. In the study of surface terrain moisture index, the slope is a characterization of the soil’s ability to produce water [
29].
The stream power index (SPI) is an index used to measure the three-dimensional spatial strength of water flow [
30], which can indicate the strength distribution and rate of water flow. The larger the value of SPI, the greater the runoff concentration, which may lead to soil erosion.
The topographic moisture index (TWI) is an important quantitative index of convective path length, flow-producing area, and soil-runoff-generating capacity based on the digital elevation model [
31]. The calculation formulas are as follows:
is the specific catchment area, and is the slope. Specific catchment area (SCA) refers to the upstream catchment area per unit contour length or the runoff area per unit contour, which describes the catchment capacity of surface soil, and is an important parameter for various geomorphic structures and hydrological models.
2.6. Cultivated Land Quality Level Evaluation
Cultivated land quality, as a comprehensive concept, cannot be simply represented by one or two indicators. Cultivated land quality involves many aspects, such as the health status of cultivated land, standing conditions, soil nutrients, soil physicochemical properties, and farmland management [
34,
35]. In this study, 15 evaluation indexes were selected from the above six aspects to establish the evaluation index system of cultivated land quality level when evaluating cultivated land quality level. The classification of cultivated land quality evaluation indicators is shown in
Table 4.
These indicators include farmland forest reticulation and topography, which represent the conditions of farmland, and three profile characteristics, such as obstacle factors, effective soil layer thickness, and texture configuration. The soil bulk density, acidity and alkalinity, and soil texture in the till layer represent soil physicochemical properties, soil available phosphorus, soil available potassium, and soil organic matter content characterizing soil nutrients, cleanliness, and biodiversity, reflecting the health of farmland. It also includes two indexes, drainage capacity and irrigation capacity, which reflect farmland management status.
Farmland forest reticulation is the ratio of the protected area of forest belts around farmland to the total area of farmland, playing an important role in the microclimate, wind and sand prevention, and pollution alleviation of farmland and its surrounding areas. On-site investigation of the protected area of forest belts around farmland and the total area of farmland, calculation of farmland forest network rate, and comprehensive judgment of farmland forest network degree, divided into high, medium, and low.
Cleanliness is an important indicator and reflects the health status of arable soil, mainly referring to the degree to which pollutants in arable soil do not have adverse or harmful effects on the ecosystem and human health. This study calculates the Nemero index based on the heavy metal content of the sampling points to determine the cleanliness level of the study area.
Irrigation is a key factor in ensuring crop water consumption, which directly affects the farming system and farmland production capacity. An on-site investigation of the type, location, irrigation method, and irrigation volume of water sources can be conducted, and the degree to which irrigation water consumption can be met in years of irrigation, which can be divided into fully satisfied, satisfied, basically satisfied, and not satisfied, can be comprehensively judged.
To ensure the normal growth of crops, timely drainage of surface water in farmland, and the ability to effectively control and reduce groundwater levels, an on-site investigation on drainage methods and the current status of drainage facilities can be conducted, the ability of farmland can be comprehensively assessed to ensure normal crop growth, surface water accumulation can be timely removed, and groundwater levels can be effectively controlled and reduced, which can be divided into fully satisfied, satisfied, basically satisfied, and not satisfied.
The 15 evaluation indicators were divided into textual and numerical evaluation indicators, and each map unit was assigned to obtain the corresponding attribute data.
Table 4 displays the types and weights of evaluation indicators. Using techniques such as surface substitution and connecting attribute mapping, the textual indicators (topographic portion, farmland forest network, texture configuration, obstacle factors, tillage texture, biodiversity, cleanliness, irrigation capacity, drainage capacity, etc.) were assigned to each evaluation unit; the numerical indicators (effective soil thickness, bulk density, pH, soil organic matter, soil available phosphorus, soil available potassium, etc.) were spatially interpolated and superimposed on the map of each evaluation unit using the regional statistical method. After processing the spatial interpolation of numerical indicators, values were assigned to the evaluation unit by superimposing the evaluation unit map and the regional statistical method. These indicators included effective soil layer thickness, pH, soil organic matter, soil available phosphorus, and soil available potassium.
The weights of the indicators used in the assessment of the quality level of cultivated land were established by combining the hierarchical analysis approach with the Delphi method. First, the indicators were assigned weights using the Delphi approach. These weights were then utilized to categorize the indicators using the hierarchical analysis method. The judgment matrix was utilized to establish the weights of each indication after comparing the significance of each indicator within each indicator category. The comprehensive cultivated land quality index of each evaluation unit was computed using the weighted sum method following the establishment of the cultivated land quality level evaluation index system and the determination of the weights of each index. The calculation formula is as follows:
represents the comprehensive index of cultivated land quality; is the i-th factor evaluation (score); and is the combined weight of the i-th factor.
Combined with the actual situation of the study area, based on the calculated comprehensive cultivated land quality index of each evaluation unit, the natural breakpoint method was used for grading, which was categorized into five grades, namely, higher, high, medium, low, and lower. The comprehensive index range of cultivated land quality level is shown in
Table 5.
2.7. Cultivated Land Quality Inversion Model
In order to facilitate the use of machine learning algorithms and ensure comparability between different modeling methods, this study uses the evaluation unit, where 1233 sampling points are located, as the sample dataset. The 1233 sampling points are distributed in each topographic sub-district of the study area, and the locations of the distribution of the sampling points are shown in
Figure 1.
This study adopted the 10-fold cross-validation method, in which the dataset was divided into 10 copies, and 9 of them were used as the training set and 1 as the test set in turn, and the validation was carried out. Each validation yielded a corresponding accuracy, and the average of the correctness of the ten results was used as an estimate of the accuracy of the algorithm. The use of 10-fold cross-validation ensures that all sample datasets have been used as both the training and test datasets, reduces chance due to a single division of the training and test datasets, and makes full use of the existing dataset for multiple divisions. Cross-validation is used to reduce chance and improve model generalization.
The random forest machine learning algorithm was used to model with the measured indexes, spectral indexes, vegetation indexes, and topographic indexes as the input variables. The composite cultivated land quality indexes were used as the output variables, and the accuracy of the inversion model was evaluated with R2 and the root-mean-square error (RMSE).
Random forest is a method based on the decision tree combination induced by Breiman, which is an integrated learning algorithm in sample space and feature space simultaneously. Each decision tree in random forest depends on a random vector consisting of parameters determined via training, and each tree forms an independently distributed set of training samples with the Bagging algorithm, uses these sets of training samples for training, and at the same time selects some of the features in the feature set for the construction of the decision tree.
There were 15 variable indicators that were entered into the model, with the measured indicators including soil organic matter, soil bulk density, effective soil thickness, and pH. The overall research workflow is shown in
Figure 3. The variable indicators were further divided into four categories: measured indicators, spectral indicators, vegetation indices, and topographic indicators. The four bands—B1, B3, B4, and B6—of the GF-6 WFV images served as the spectral indicators. The vegetation indices included NDVI, DVI, RVI, EVI, and SAVI. Terrain indices include slope, TWI, and SPI.