*Article* **Tillage-Depth Verification Based on Machine Learning Algorithms**

**Jing Pang 1,\*, Xuwen Zhang 1, Xiaojun Lin 1, Jianghui Liu 2, Xinwu Du <sup>1</sup> and Jiangang Han <sup>2</sup>**


**\*** Correspondence: pangjing@haust.edu.cn

**Abstract:** In an analysis of the penetration resistance and tillage depth of post-tillage soil, four surface-layer discrimination methods, specifically, three machine learning algorithms—Kmeans, DBSCAN, and GMM—and a curve-fitting method, were used to analyze data collected from the cultivated and uncultivated layers. Among them, the three machine learning algorithms found the boundary between the tilled and untilled layers by analyzing which data points belonged to which layer to determine the depth of the soil in the tilled layer. The curve-fitting method interpreted the intersection among data from the fitted curves of the ploughed layer and the un-ploughed layer as the tillage depth. The three machine learning algorithms were used to process a standard data set for model evaluation. DBSCAN's discrimination accuracy of this data set reached 0.9890 and its F1 score reached 0.9934, which were superior to those of the other two algorithms. Under standard experimental conditions, the ability of DBSCAN clustering to determine the soil depth was the best among the four discrimination methods, and the discrimination accuracy reached 90.63% when the error was 15 mm. During field-test verification, the discriminative effect of DBSCAN clustering was still the best among the four methods. However, the soil blocks encountered in the field test affected the test data, resulting in large errors in the processing results. Therefore, the combined RANSCA robust regression and DBSCAN clustering algorithm, which can eliminate interference from soil blocks in the cultivated layer and can solve the problem of large depth errors caused by soil blocks in the field, was used to process the data. After testing, when the RANSCA and DBSCAN combined method was used to process all samples in the field and the error was less than 20mm, the accuracy rate reached 82.69%. This combined method improves the applicability of discrimination methods and provides a new method of determining soil depth.

**Keywords:** Kmeans; DBSCAN; GMM; tilling depth

**Citation:** Pang, J.; Zhang, X.; Lin, X.; Liu, J.; Du, X.; Han, J. Tillage-Depth Verification Based on Machine Learning Algorithms. *Agriculture* **2023**, *13*, 130. https://doi.org/ 10.3390/agriculture13010130

Academic Editors: Mustafa Ucgul and Chung-Liang Chang

Received: 29 November 2022 Revised: 24 December 2022 Accepted: 29 December 2022 Published: 4 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Soft soil is beneficial to the growth of crops, and fields that have been cultivated for many years will form a solid plough bottom, which hinders the penetration of precipitation into deep soil and is not conducive to the absorption of water by the crop roots. The depth of the tillage layer has a significant impact on agroecosystems and on crop yield and quality [1–4]. After the soil is loosened, the plow bottom can be broken, the plow-layer thickness can be increased, the soil structure can be improved, and the quality of cultivated land can be improved. In recent years, China has attached great importance to the quality of subsoiling operations, has encouraged farmers to prepare land for subsoiling operations, and has issued subsidies for subsoiling operations. At present, no unified standard has been accepted for automatically measuring the tillage depth in subsoiling operations. Manual sampling methods are often adopted but require manual cleaning of the ditch bottom, and human factors and soil conditions affect the measurement accuracy [5]. Moreover, manual determination of the tillage depth has a low efficiency, is labor-intensive, takes a long time, and provides a limited description of the tillage depth. Due to the importance of subsoiling land preparation and the need for supervision by the Ministry of Agriculture of China on, subsidies for subsoiling operations must be based on accurate information about the front-line operations of subsoiling machinery; thus, the demand for informatization identification and testing is quite high. Tillage depth is an important evaluation index of subsoiling quality, and accurate tillage-depth detection and control are very important. The determination of tillage depth by agricultural machinery equipment is mostly realized by one or more angle sensors, inclination sensors, attitude sensors, ultrasonic sensors, and infrared sensors.

The latest research on tillage-depth detection by domestic and foreign scholars can be divided into two types according to the detection method: non-contact and contact. Non-contact measurement methods are commonly used in the installation of ultrasonic sensors, particularly during the implementation of optical rangefinders on a replica wheel of the plow. Kim et al. [6] installed inclinometers and optical distance sensors on the left and right axes of linear potentiometers and used linear potentiometers and optical distance sensors to measure the depth of soil penetration. In addition, a single-axis inclinometer was used to measure the inclination angle during the tillage process, and the tillage depth was calculated based on the depth of the soil penetration and the pitch angle of the attached equipment. The conservation tillage popularized and applied in recent decades usually breaks up crop residues and mixes them with soil as nutrients for the next crop. However, neither ultrasonic sensors nor optical rangefinders can accurately distinguish crop residues from the ground. This affects the tillage-depth measurement, which greatly reduces the precision of the non-contact measurement method, and research on non-contact measurement methods has gradually decreased in recent years. In contrast, the contact measurement method has been increasingly studied, specifically, contact between the sensor and the actual soil surface. Mouazen et al. [7] connected a linear variable displacement sensor with a stroke length of 0–2 m to the frame of a metal wheel, with the height sensor located at the axle, and measured the swing-arm frame height. The distance between the soil surface and the frame changes, and an analytical–statistical hybrid model was built to calculate the change in the height of the frame. Xie et al. [8] developed a contact tillage-depth measurement method based on measuring the lift-arm inclination and the geometric relationship between the unit and the tractor. When measuring the tillage depth with this method, it is necessary to ensure that the plane of the plough body or the beam of the plough frame remains horizontal when the implemented frame reaches the maximum tillage depth. An irregular surface or a change in the slope affects the applicability and accuracy of the model. Afterwards, Jia Honglei et al. [9] designed an adaptive tillage-depth monitoring system that uses a photoelectric encoder to measure the angle between the adjustable swing arm. Based on different cases and corresponding mathematical models, they developed a LabVIEW program adapted to the measurement, which was slightly lacking in terms of real-time processing and visualization of the tillage depth. Overall, most non-contact technologies use ultrasonic rangefinders and optical rangefinders to indirectly measure the tillage depth, and most contact technologies use inclination sensors and angle-measuring instruments to indirectly measure the tillage depth. Although these methods provide an effective way to detect tillage depth and tillage quality online, each of these methods has certain problems. For example, when an inclination sensor is used to calculate the tillage depth, the sensor needs to be re-calibrated, which is inconvenient. Additionally, the detection accuracies of the angle-measuring instrument and the ultrasonic sensor are affected by surface debris.

To detect tillage depth, two methods are used: one is real-time monitoring of the tillage depth, and the other is identification detection. The testing equipment is usually placed on the machine for testing a rotary tillage operation in order to obtain real-time data of the operation. The experiments conducted by Kim et al. and Xie Bin et al., as described above, aimed to test the tillage depth in real time, as well as to verify and test the effect of their special testing equipment on rotary tillage. The goals of the abovementioned subsoiling preparation subsidies issued by the Ministry of Agriculture of China are to verify and test the subsoiling-preparation quality by measuring the tillage depth. However, some problems are found in the currently accepted methods of inspection, such as having inconsistent standards, producing large manual-measurement errors, and being laborintensive. With the state strongly advocating for deep loose-land preparation, the Ministry of Agriculture and the Ministry of Finance attaches great importance to the subsoiling operations of the land, and subsidies for subsoiling operations must be issued based on accurate information about the front-line operations of subsoiling machinery. At present, the demand for information acceptance is quite strong. With the rapid development of autonomous navigation technology and information sensing technology in recent years, cultivated-land precision-monitoring technology has begun to become more refined and smarter. Additionally, verification and detection have become development trends in providing a quantitative basis for evaluating the quality of subsoiling operations by using sensors to determine tillage depth.

Machine learning algorithms often solve practical problems more efficiently due to the regularity information inherently found in data. In order to improve the automation and applicability of discriminating tillage depth, data on penetration resistance and tillage depth were analyzed. Finally, we decided to use four discrimination methods, specifically, three machine learning algorithms—Kmeans, DBSCAN, and GMM—as well as the curve fitting method, to analyze and compare the data from the cultivated layer and the uncultivated layer obtained in a laboratory experiment. Among them, the Kmeans algorithm clusters according to the distance similarity between the data points and the data points. The DBSCAN algorithm clusters according to the density of the data points, while the GMM algorithm clusters according to the assumption that the data points of each cluster conform to a Gaussian distribution of the corresponding cluster. The DBSCAN algorithm has the best effect in terms of determining tillage depth without any soil disturbance. In this paper, the Affinity propagation, Mean-shift, and OPTICS algorithms were also considered for use in distinguishing the surface layer. However, as Affinity propagation clustering [10] is carried out by sending messages between sample pairs until they converge, the number of clusters is determined according to the data provided, so distinguishing the binary classification problem between the cultivated layer and the uncultivated layer is ineffective. Mean-shift [11] is a density-based nonparametric clustering algorithm that needs to specify the candidate centroid and automatically sets the number of clusters. As the purpose of this experiment is to classify the number of specified clusters, this algorithm cannot be applied to distinguish the surface layer. The OPTICS algorithm [12] is a generalization of the DBSCAN algorithm. Clustering is performed according to density, but the number of clusters is automatically matched. Compared with DBSCAN, it is too sensitive and will divide the data into several clusters, meaning that it cannot solve the problem of plough layer discrimination. Furthermore, a field experiment was carried out to verify the ability of the methods to determine soil depth without any soil disturbance. The RANSCA robust regression algorithm is a kind of regression model that can be fitted into a regression model even if there are outliers or errors in the model. The use of a combination of this algorithm and the DBSCAN algorithm can solve the problem of soil disturbance discrimination and can accurately determine the tillage depth. The performance of the hybrid algorithm in field discrimination is not bad. The hybrid machine learning algorithm can greatly improve the production efficiency, reduce the labor intensity, and reduce the production cost. Thus, it provides a new method of determining topsoil depth.

#### **2. Principle of Tillage-Depth Identification Based on Soil-Penetration Resistance**

#### *2.1. Test Instruments and Data Collection*

In this paper, soil samples from Luoyang, Henan (34◦39 47 N, 112◦26 4 E), were used as the soil samples. First, the soil water content and density were measured by sampling in the field. The samples were measured with the sieving method and the hydrometer method, and obtained. The soil moisture content in the field ranged from 10% to 20%, and the soil density ranged from 1100 kg·m−<sup>3</sup> to 1300 kg·m<sup>−</sup>3. Before conducting the soilsample test to obtain the experimental data of the soil in the tilled and un-ploughed layers, the soil moisture content and density were divided into three levels for the orthogonal tests; the moisture content was taken as 10%, 15%, and 20%; and the density was taken as 1.1 × 103 kg/m3, 1.2 × <sup>10</sup><sup>3</sup> kg/m3, and 1.3 × <sup>10</sup><sup>3</sup> kg/m3.

The equipment used to measure the soil moisture content and density range during field sampling in the indoor test includes a probe (material 65 Mn steel, length 530 mm, and maximum diameter of tip 14 mm), a universal testing machine (DNS02-1KW), and a barrel (inner diameter 124 mm and height 400 mm).

The specific operation steps in the test process are as follows:


#### *2.2. The Principle of Machine Learning Algorithm in Discriminating the Depth of Cultivation*

Soil-penetration resistance refers to the resistance of a conical or plunger-shaped penetration needle when penetrating into soil at a constant rate in an up–down direction due to the comprehensive effects of soil particle friction and extrusion. At the same sampling frequency, the soil in the cultivated layer is soft, so the penetration resistance of the cultivated layer increases with increasing penetration depth of the probe, and the collected data points are closely distributed. After years of cultivation in the field, a solid plough bottom layer is formed under the plough layer. When the probe enters the plough bottom layer, due to the soil firmness, the penetration resistance of the soil increases steeply, so the data points collected in the plough bottom layer are loosely distributed. Using

a scatter plot and its corresponding mathematical analysis, the value of the cultivated depth of the measured point can be determined, and then, the method of determining the cultivated depth is verified as being feasible.

**Figure 1.** Probe-penetration laboratory test.

**Figure 2.** The relationship between penetration resistance and penetration depth.

#### **3. Basic Discriminant Methods and Their Comparisons**

*3.1. Kmeans Clustering Algorithm to Discriminate Cultivated Layer*

The Kmeans clustering algorithm is one of the most commonly used methods in clustering. According to the similarity of the distance between points, the samples are separated into n groups with equal variance for clustering [13–18]. The Kmeans algorithm first randomly selects two data points as the centroids *μ<sup>t</sup>* <sup>1</sup> and *<sup>μ</sup>*<sup>t</sup> <sup>2</sup> from all the data points of the collected penetration resistance and penetration depth, with t as the number of iterations. The centroids chosen before iteration are marked as *μ*(0) <sup>1</sup> and *<sup>μ</sup>*(0) <sup>2</sup> , and these two centroids serve as the cluster centers before the tilled and un-tilled layers [13]. Based on the relationships between the data points, the objective of the optimization is defined before the clustering starts:

$$\mathbf{J}(\mathbf{c}, \mu) = \min \sum\_{i=1}^{M} \left\| \mathbf{x}\_i - \mu\_{\mathbf{c}\_i} \right\|^2 \tag{1}$$

where *c* is the division of the data into two categories to distinguish between the cultivated layer and the uncultivated layer, *c*<sup>1</sup> and *c*2, and *M* represents all data points, i.e., the total sample size.

The loop iteratively calculates the distance between each data point and the two cluster centers *μ<sup>t</sup>* <sup>1</sup> and *<sup>μ</sup><sup>t</sup>* <sup>2</sup>, and the cluster center with the shortest distance *xi* when obtaining category *cj* is selected by dividing data point a into category *c<sup>t</sup> <sup>j</sup>* at the t-th iteration.

$$\left| c\_{j}^{t} < -\arg\min\_{k} ||\mathbf{x}\_{i} - \boldsymbol{\mu}\_{k}^{t}||^{2} \tag{2}$$

where *k* is *c<sup>t</sup> j* (*j* = 1,2) and the two clusters are obtained by processing all the data. The average distance of all sample points that are assigned to the same cluster is then calculated. The position of the average distance is used as the position of the new centroid of this iteration, that is, the new cluster centers *<sup>μ</sup>t*+<sup>1</sup> <sup>1</sup> and *<sup>μ</sup>t*+<sup>1</sup> <sup>2</sup> of this iteration, which must meet the following:

$$\|\mu\_k^{t+1} < -\arg\min\_u \sum\_{i:c\_\parallel^t=k}^b ||x\_i - \mu||^2 \tag{3}$$

The iterative process is repeated until J converges, that is, the centroids of all clusters no longer change and the Kmeans discrimination of the tillage layer is completed.

Figure 3 shows the effect of data discrimination using Kmeans clustering. The position of the centroid is at "+" in the figure. When the centroid position remains unchanged, Kmeans iteratively processes all data and determines the soil depth of the plough layer.

All data points *xi* of the same cluster *cj* are marked with the same cluster number, and the boundary data of different clusters are found as the result of the discrimination of the soil depth of the tillage layer. After processing 64 groups of simulated surface-soil samples, the results show that four groups obtained absolute errors of less than 5 mm, twelve groups obtained absolute errors of less than 10 mm, twenty-six groups obtained absolute errors of less than 15 mm, and forty-seven groups obtained absolute errors of less than 20 mm.

#### *3.2. DBSCAN Density Clustering Algorithm to Discriminate Tillage Layer*

DBSCAN (density-based spatial clustering of applications with noise) density clustering treats clusters as low-density regions and high-density regions [19–22] and identifies a cluster class by classifying closely connected samples into one class [23,24]. The DBSCAN clustering algorithm is used to discriminate the soil depth of the cultivated layer. The radius eps (Eps-neighborhood of a point) of the cultivated layer data is set to 3.5, and the threshold min\_samples is 4, that is, the data point of the cultivated layer is located in the radius eps of 3.5 and there are at least four samples in the field.

**Figure 3.** Kmeans processing data-discrimination effect diagram.

Since the penetration resistance of the un-ploughed layer increases due to the increased penetration resistance of the plow layer, the Euclidean distance between the collected sample points a(*x*1, *y*1) and b(*x*2, *y*2) increases, and the Euclidean distance *d* is as follows:

$$d = \sqrt{\left(x\_1 - x\_2\right)^2 + \left(y\_1 - y\_2\right)^2} \tag{4}$$

According to the data-distribution law of the cultivated layer and the uncultivated layer, a data point is randomly selected as the core point *x*1, and the Euclidean distance between other data points and the core point is used to determine whether other data points are densely connected. The basic algorithm principle model of DBSCAN is shown in Figure 4 below.

**Figure 4.** Principle model diagram of the basic algorithm of DBSCAN.

The core point *x*<sup>1</sup> is randomly selected; a circle with eps as the radius is drawn; and the point in the center of the circle is the density direct access, that is, *x*<sup>2</sup> is the density direct access of *x*1; *x*3, *x*4, and *x*<sup>1</sup> are density accessible, *x*<sup>3</sup> and *x*<sup>4</sup> are density-connected, and all densities are connected. Cluster points are marked as data points of the same cluster. The density of the core point is compared with the threshold min\_samples. If it is greater than or equal to the threshold min\_samples, it is considered as a point in the same cluster. If it is less than the threshold min\_samples, it is marked as a noise point and the noise point is marked as −1.

Due to the increase in penetration resistance, the data points collected in the un-ploughed layer cannot be densely connected with the data points of the cultivated layer and are marked as −1, which is an outlier. The first data point marked as an outlier, as discussed above, is the demarcation point between the tilled and uncultivated layers. The effect diagram for discriminating the DBSCAN's soil tillage layer is shown in Figure 5 below.

**Figure 5.** Discrimination effect diagram of a soil tillage layer by DBSCAN.

The results obtained after clustering 64 groups of soil samples in the simulated tillage layer using the DBSCAN algorithm show that thirty-two groups had an absolute error of less than 5 mm, fifty-two groups had an absolute error of less than 10 mm, fifty-eight groups had an absolute error of less than 15 mm, and sixty groups had an absolute error of less than 20 mm.

#### *3.3. Gaussian Mixture Model Clustering (GMM)*

The Gaussian mixture model is usually referred to as GMM. It first assumes that there are *k* Gaussian distributions and then assesses the probability that each sample conforms to each distribution [25–27]. To determine the tillage depth, the samples are divided into two clusters: the tilled layer and the un-tilled layer, that is, *k* is 2. The sample data are then divided into the distribution cluster with the highest probability, maximum likelihood estimation is performed, the probability of conforming to the distribution of the cultivated layer and the uncultivated layer is calculated based on the new distribution, and the Gaussian distribution parameters are iteratively updated until the model uses the EM algorithm [28,29]. The convergence reaches the local optimal solution, the Gaussian function has good computational performance, and the probability density is often recorded as follows:

$$\mathbf{f}(\mathbf{x}) = p(\mathbf{x}|\mu\_i, \Sigma) \tag{5}$$

The Gaussian mixture distribution function required to define clustering is as follows:

$$p(X) = \sum\_{i=1}^{k} \alpha\_i p(X | \mu\_{i\prime} \Sigma\_i) \tag{6}$$

Among them, *α* represents the probability of each cluster, and the probability sum is 1, that is, Σ*<sup>k</sup>* <sup>i</sup>=1*α<sup>i</sup>* = 1. The data points are divided into corresponding clusters according to the probability *p*(*zi* = *i*) = *αi*. Finally, the conditional probability formula of whether a data point conforms to the cultivated layer or the uncultivated layer is as follows:

$$p(z\_{j} = i | \mathbf{x}\_{j}) = \frac{P(z\_{j} = i)p(\mathbf{x}\_{j} | z\_{j} = i)}{p(\mathbf{x}\_{j})} = \frac{a\_{i}p(\mathbf{x}\_{i} | \mu\_{i}, \Sigma)}{\sum\_{l=1}^{k} a\_{l}p(\mathbf{x}\_{j} | \mu\_{l}, \Sigma)}\tag{7}$$

Each datum j then is found to conform to the corresponding cluster distribution probability *γ*ji for comparison, and the data points are divided into the corresponding clusters.

When the probability is calculated according to the formula, the probability density of a certain distribution in both the numerator and the denominator conforms to the Gaussian distribution, and the maximum likelihood method is used, that is, the maximum likelihood is performed according to the probability product of the corresponding distribution of the data:

$$\text{LL}(D) = \ln(\prod\_{j=1}^{m} p(\mathbf{x}\_{j})) = \sum\_{j=1}^{m} \ln(\sum\_{i=1}^{k} \mathbf{a}\_{i} p(\mathbf{x}\_{j}|\boldsymbol{\mu}\_{i}, \boldsymbol{\Sigma})) \tag{8}$$

Then, take the derivative of the above function, set the derivative to 0, and find the corresponding *μ* and Σ:

$$\mu\_i = \frac{\Sigma\_{j=1}^m \gamma\_{ji} x\_j}{\Sigma\_{j=1}^m \gamma\_{ji}} \tag{9}$$

$$\sum\_{i} = \frac{\Sigma\_{j=1}^{m} \gamma\_{ji} (\mathbf{x}\_{j} - \mu\_{i}) (\mathbf{x}\_{j} - \mu\_{i})^{T}}{\Sigma\_{j=1}^{m} \gamma\_{ji}} \tag{10}$$

Additionally, when calculating the corresponding Gaussian mixture coefficient *α*, the Lagrange multiplier method needs to be added as a constraint, and the equation is obtained as follows:

$$\alpha\_i = \frac{1}{m} \sum\_{j=1}^{m} \gamma\_{ji} \tag{11}$$

The algorithm is as follows:


The schematic diagram of the GMM clustering density estimation is shown in Figure 6. The darker the color is, the larger the density estimation is, and the lighter the color is, the smaller the density estimation is. The GMM discriminant effect is shown in Figure 7.

**Figure 6.** Schematic diagram of the density estimation of the GMM.

**Figure 7.** GMM clustering discrimination effect diagram.

The clustering result of the GMM algorithm marks the cultivated layer as a cluster 1, which is the yellow data point in Figure 7; the un-ploughed layer data are a cluster marked as 0, which is the black and purple data point in Figure 7. The results obtained after processing 64 groups of soil samples in the simulated tillage layer show that the number of samples with an absolute error less than 5 mm is 31, the number of samples with an absolute error less than 10 mm is 48, the number of samples with an error less than 15 mm is 52, and the number of samples with an error less than 20 mm is 59 groups.

#### *3.4. Data Fitting to Determine Tillage Depth*

After the soil is compacted, physical properties such as bulk density, porosity, and temperature of the soil change compared with the soil before compaction, and the root penetration resistance also increases. The working principle of the rotary tiller is that the blade of the rotary tiller rotates at a certain speed and moves forward in random groups to make up the milling process: soil cutting, crushing, and throwing back. Therefore, physical properties such as bulk density, porosity, and temperature of the soil in the plough layer that have been rotary tilled tend to be the same. Since the soil in the plough bottom layer has not been ploughed for many years, it forms a solid plough bottom layer under the plough layer. Thus, a plow bottom layer 40–50 cm below the plough layer tends to maintain the same physical properties.

According to Yan Ben [29], depth and penetration resistance share a quadratic polynomial relationship, and the soil-sample data were collected at a uniform speed. Therefore, data fitting was performed according to the theory that depth and penetration resistance share a quadratic polynomial relationship.

A portion of the data for the tilled layer and a portion of the data for the uncultivated layer were intercepted, and quadratic curves were fitted to the depth and penetration resistance. The penetration resistance of the soil in the plough layer increases with the penetration depth, so it is inferred that this point is the boundary where the plough layer meets the un-ploughed layer. The original data before data fitting are shown in Figure 8, and the fitted quadratic curve is shown in Figure 9:

**Figure 8.** Schematic diagram of the original data points in the cultivated layer and the un-ploughed layer.

Sixty-four groups of soil samples were processed by fitting quadratic curves to find the intersection points. The number of soil samples with an error of less than 5 mm was 24, the number of samples with errors less than 10 mm was 47, the number of samples with errors less than 15 mm was 55, and the number of samples with errors less than 20 mm was 60.

**Figure 9.** Schematic diagram of MATLAB fitting curve.

#### *3.5. Comparison of Evaluation Results of Three Kinds of Machine Learning*

This paper mainly deals with the problem of binary classification and takes the above curve data as a standard data set of the simulated soil samples. The accuracy rate, accuracy rate, recall rate, and F1 function of the clustering results from the three machine learning algorithms were evaluated [30]:

TP (True Positive) = True positive: a positive sample is correctly predicted to be positive

FP (False Positive) = False positive: a negative sample is incorrectly predicted to be positive FN (False Negative) = False negative: a positive sample is incorrectly predicted to be negative

TN (True Negative) = True negative: a negative sample is correctly predicted to be negative Accuracy refers to the percentage of cases in which predictions are correct:

$$\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{FP} + \text{FN} + \text{TN}} \tag{12}$$

Precision refers to the percentage of positive cases in which the prediction is correct:

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} \tag{13}$$

Recall refers to the percentage of cases that are actually positive and are predicted correctly:

$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} \tag{14}$$

F1 score takes the precision rate and recall rate into consideration. When both are high, the values are balanced:

$$F\_1 = \frac{2 \ast \text{Precision} \ast \text{Recall}}{\text{Precision} + \text{Recall}} \tag{15}$$

The above three machine learning algorithms are evaluated against the results of the standard data set, and the results are shown in Table 1 below.


**Table 1.** Results from the comparison of the three machine learning algorithms.

It can be seen from the above table that the DBSCAN algorithm has the best performance for the classification of this standard data set, with an accuracy of 0.9890 and an F1 score of 0.9934, which are superior to those of the other two algorithms.

*3.6. Comparison of the Effects of Several Methods in Discriminating the Plough Layer*

The 64 groups of simulated soil samples are summarized in Table 2 according to the statistical data on the clustering effect.

**Table 2.** Statistical data on the clustering effect of the simulated soil samples.


According to the above table, the best clustering effect is clearly DBSCAN. When the error is within 15 mm, the accuracy of the DBSCAN clustering algorithm in discriminating the tillage layer is 90.63%; when the error is less than 20 mm, the accuracy rate reaches 93.75%. In terms of the discriminant effect, DBSCAN clustering has the best discriminative effect, data fitting has the second-best discriminant effect, GMM clustering has the third best discriminant effect, and Kmeans clustering has the worst discriminant effect.

#### **4. Field-Test-Specific Situation and Hybrid Algorithm**

#### *4.1. Field-Data Collection*

The equipment used in the field experiment was a multi-point probe-type tillage section detection vehicle designed by our team (Figure 10). The inspection vehicle has six probes, with a side-by-side spacing of 20 cm. Each probe has a force sensor and a displacement sensor to collect data at the same time. The data from the six probes were collected and transmitted to the computer for subsequent data analysis. After collecting the data for a section, the inspection vehicle moves forward by 20 cm. The location of the field experiment was Mengjin County, Luoyang City (34.819◦ N, 112.3901◦ E).

Because the firmness of the soil in the field is much larger than that of the soil samples in the laboratory, it was found in the actual field experiment that the data collected by the probe have a disturbance area; the disturbance area is found in the ploughed layer and is close to the plow bottom. When the probe enters the disturbance area of the soil, the penetration resistance increases significantly before it even touches the plow bottom. In the simulation of soil penetration resistance in the field, our team found that the shear modulus of the soil at the bottom of the plough is significantly larger than that of the tilled layer, such that the soil firmness is greater than that of the tilled layer. When the disturbance area is close to the bottom of the plow, the probe continues to penetrate, and the depth of the disturbance area gradually decreases. However, the existence of a disturbance area causes the machine learning algorithm to make a larger error when discriminating the soil depth of the ploughed layer. Our team simulated a large amount of data on the soil ploughed layer in the field to determine that the disturbed area is located 20–30 mm above the ploughed and un-ploughed layers. In order to determine the applicability of the algorithm, an average value of 25 mm was selected as the systematic error for discriminating the cultivated layer in the field test.

**Figure 10.** Image of multi-point probe-type tillage section inspection vehicle used in the field test.

*4.2. Discrimination Effect of the Four Methods for Discriminating the Plough Layer in the Field Test*

According to the agronomic requirements, the rotary tillage operation allows for the existence of clods within a certain range. When collecting data in the field experiment, since the maximum diameter of the probe is 14 mm, it is inevitable that the probe collects data vertically in a downwards direction and hits the clods. Therefore, the field experiment collects two types of data: one is data that do not encounter the clods, and the other is data that encounter the clods. The data collected without encountering clods and the Kmeans clustering, DBSCAN clustering, and GMM clustering discrimination effect are shown in Figure 11.

Among the methods, Kmeans clustering sets the parameter k as 2 to obtain two clusters, and the data point output of the first cluster is used as the boundary at which Kmeans clustering discriminates the cultivated layer. DBSCAN sets the parameter eps to 7 and min\_samples to 5, that is, when the radius is 7, there are 5 sample points, and the data point is output when the first outlier is used is the boundary at which DBSCAN clustering discriminates the cultivated layer. Parameter k from GMM clustering is set to 2, and the data point output from the first different cluster is used as the boundart at which GMM clustering discriminates the cultivated layer. Additionally, MATLAB data fitting takes the intersection as the cut-off point of the tillage layer. A total of 52 groups of field samples were collected, and 41 groups of field samples were obtained in the case of no soil blocks. The clustering effect statistics for the cultivated layer data are shown in Table 3 below.

**Table 3.** Statistical data on clustering effect of field soil samples.


**Figure 11.** Discrimination-effect of the clustering methods in the field test.

It can be seen from the above table that when the error is less than 20 mm, the accuracy rate of DBSCAN in discriminating the cultivated layer is 90.24%. The effect of DBCSAN clustering when discriminating the cultivated layer is still the best in the field test; followed by data fitting; and then, GMM clustering when discriminating the cultivated layer, and Kmeans still has the worst results.

#### *4.3. RANSCA Robust Regression Algorithm to Deal with the Error of the Soil Block*

When collecting the data, 11 groups of sample data were found to have encountered soil blocks. The sample data collected for the soil blocks are shown in Figure 12.

When encountering clods, several clustering methods determined the collected soil block data as the boundary point of the cultivated layer, resulting in a particularly large discrimination error. Thus, a new method is needed to solve the problem of encountering clods. In this paper, a hybrid RANSCA (random sample consensus) robust regression algorithm and DBSCAN clustering algorithm is proposed to discriminate the tillage layer.

**Figure 12.** The original data collected when encountering clods in the field.

The RANSCA algorithm fits a regression model in the presence of bad data (i.e., there are outliers or errors in the model) [31]. RANSAC is a non-deterministic algorithm that produces a reasonable result with a certain probability [32]. The basic assumption of the RANSCA algorithm is that the sample contains correct data (data that can be described by the model), called "inliers". Data that that cannot fit the mathematical model are called "outliers", i.e., the data set contains noise. The basic assumptions of the RANSCA algorithm are as follows:


Through an analysis of the curve diagram of penetration resistance and penetration depth, it can be found that, when no soil clods are encountered, the model calculated by fitting all the data of the surface layer is identified as the "local point". When dirt clods are encountered, the data points collected do not fit the calculated model and are identified as "external points". Data points that are determined to be "intra-office" after RANSCA robust regression are marked as true, and data points that are determined to be external are marked as false. It can be seen from the above that the data points before those considered true by the RANSAC robust regression algorithm are all points collected in the cultivated layer. RANSCA fits the regression model for the cultivated layer data and allows for noise. Even under discrimination by DBSCAN density clustering, the soil clod data encountered cannot be divided into a cluster with other cultivated layer data [33], but because the data are identified as a cultivated layer by the RANSCA algorithm, the data encountering soil clods are not regarded as the result of DBSCAN discriminating a plough layer. The RANSCA algorithm solves the problem of the inability of the DBSCAN algorithm to cluster soil clod data and other cultivated layer data.

#### *4.4. The Hybrid RANSCA and DBSCAN Algorithm Identifies the Soil Topsoil Layer*

The cultivable layer is simultaneously identified by each part of the hybrid algorithm. That is, the first data point considered an outlier by DBSCAN after RANSCA discriminates the last true data point is used as the boundary between the cultivable layer and the uncultivable layer. The steps in RANSCA robust regression and DBSCAN clustering hybrid algorithm are as follows:


The discriminant flow chart and a diagram of the effect are shown in Figure 13. The rendering diagram of the data set for discriminating the encounter of soil clods is shown in Figure 14.

**Figure 13.** Data flow diagram of RANSCA and DBSCAN hybrid algorithm processing.

**Figure 14.** Schematic diagram of the RANSCA and DBSCAN hybrid algorithm processing the data encountered in the clod. (**a**) The hybrid algorithm processes the encountered clod data set 1. (**b**) The hybrid algorithm processes the encountered clod data set 2.

The hybrid algorithm can discriminate the tillage layer, and the discriminant effect basically is ineffective when no soil block is encountered. The discriminant effect diagram is as follows. The discriminant effect diagram is shown in Figure 15 below.

**Figure 15.** Schematic diagram of the RANSCA and DBSCAN hybrid algorithm processing data that did not encounter clods.

The 41 groups of field samples that did not encounter soil blocks were processed with DBSCAN and the combined RANSCA and DBSCAN algorithm, as well as the combined RANSCA and DBSCAN clustering to process all of the field samples of the 51 groups. The results are shown in Table 4 below.


**Table 4.** Comparison of DBSCAN clustering and hybrid clustering effects.

By comparing the results of DBSCAN clustering and the hybrid algorithm when dealing with unencountered soil blocks, it can be found that the hybrid algorithm has almost no influence on the discrimination of the surface layer when no soil blocks are encountered.

Moreover, the hybrid algorithm can solve the problem of a soil block encountered by the probe in the field, avoids interference from the soil block in the surface discrimination, and makes the discrimination method more applicable. When the error is less than 20 mm, the accuracy of the RANSCA and DBSCAN hybrid algorithm is 82.69%, and the performance of the hybrid algorithm in the field experiment is fair.

#### **5. Conclusions**

In this paper, three machine learning algorithms were used to process a standard data set for model evaluation in an indoor manual soil-sample test. Among them, the discrimination accuracy of DBSCAN for the data set reached 0.9890 and the F1 score reached 0.9934, which was better than those of the other two algorithms. At the same time, 64 groups of simulated soil-sample data were processed with DBSCAN and the discrimination accuracy was 90.63% when the error was less than 15 mm and 93.75% when the error was less than 20 mm, which verified that DBSCAN had a better discrimination effect than Kmeans clustering, GMM clustering, and data fitting.

In the field experiment, since the maximum diameter of the tip of the probe is 14 mm, it is almost impossible to rotate until all of the soil in the tilled layer becomes loose during mechanical operation. Whether rotating the till or the deep till, there can be soil clods within the range allowed by agronomy. That is to say, it is inevitable that there will be soil clods with a diameter of more than 14mm in the soil during a rotary tillage operation. When the probe goes straight down to assess penetration resistance, it inevitably hits small clods that are not completely broken. Therefore, data processing in the field can be divided into two conditions: no clods encountered and clods encountered. When no soil clods were encountered and the error was less than 20 mm, the accuracy of DBSCAN in identifying a plough layer was 90.24%. When comparing the four discriminant methods, DBSCAN still had the best discriminant effect on the soil layer. However, the soil clods encountered in the field experiment will affect the test data and cause significant errors in the processing results. Therefore, the combined RANSCA robust regression and DBSCAN clustering algorithm was used in this paper to process the collected data. When the mixed algorithm processed the data without encountering soil clods, the accuracy was not affected. When the error was less than 20 mm, the accuracy of the mixed algorithm reached 92.68%. At the same time, the hybrid algorithm eliminates the disturbance due to soil clods and determines the ploughing depth, which solves the problem of the large ploughing depth error caused by soil clods in the field. It has been verified that the combined RANSCA and DBSCAN method of discriminating the soil depth of topsoil in the field with an error of less than 20 mm can reach an accuracy of 82.69%. The error in distinguishing soil depth caused by soil clods in the field is thus solved. The combined RANSCA robust regression and DBSCAN clustering algorithm improved the applicability of the discrimination method and provided a new method for verifying and discriminating topsoil depth. It can be widely used to evaluate the quality of subsoiling operations and in assessing whether subsoiling land preparation subsidies should be issued by the Ministry of Agriculture of China. The tillage depth can be accurately determined by measuring only the penetration resistance and depth, making the information acceptance tests refined and smart.

**Author Contributions:** Conceptualization, J.P.; methodology, X.Z. and X.L.; software, X.Z.; formal analysis, J.L., X.D. and J.H.; writing—original draft preparation, J.P. and X.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Key R&D Program of China (Grant No. 2017YFD0700300).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** All data sets in this article were collected by the team.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
