1. Introduction
Sampling design is fundamental in research and monitoring of natural resources. Proximal soil sensing (PSS) technology is currently available to produce soil attribute maps in high spatial resolution, aiming to support sustainable variable rate input management in precision agriculture [
1,
2]. However, optimal sampling designs using continuous PSS surveys are still lacking the definition of operational standards, potentially compromising map uncertainty evaluations. Along with the spatial distribution due to the distance between survey track lines, the sample density in each transect line may affect output map accuracy. These are fundamental parameters in sampling design for detailed soil attribute mapping that can affect efficient use of the so-called on-the-go PSS technology [
3]. Although optimal transect spacing frameworks for soil sensing are not new [
4,
5,
6], recent works show that methodological research is still need on customized approaches according to specific landscape, crop type, soil management, survey strategy, sensor type, and target variable [
7,
8].
Conventional techniques to obtain soil data samples are laborious from field collection to laboratory analysis, which may constrain sampling design and the consequent mapping uncertainties. To provide proper spatial and temporal scales required for within-field variability analysis, PSS technology has been successfully used for management zone delineation and variable rate applications. PSS is described as using field-based sensors to obtain signals from the ground when the sensor’s detector is in contact or close to (within 2 m) the soil [
9].
Similar to sensors mounted in different platforms, such as satellites or aircraft, PSS technology is offered in a wide variety of models and sensing methods regarding operational protocols and physical-chemical principles involved in data being collected and analyzed. One methodology getting widely used in agriculture involves commercial devices to monitor soil apparent electrical conductivity (ECa) by contact or by means of electromagnetic induction (EMI). Over several decades, numerous EMI sensors have been used in soil science research and applied precision agriculture solutions. Popular commercial devices include models from VERIS® (Veris Technologies, Salina, KS, USA), DUALEM (DUALEM Inc., Milton, ON, Canada) and Geonics (Geonics Limited, Mississauga, ON, Canada). All of these companies provide multiple models for different applications as Geonics EM31, EM34, EM38, EM38DD, and EM38-MK2.
Soil ECa mapping has been recognized as a promising method in precision agriculture [
6] to measure the spatial variability of soil properties at field and landscape scales [
8,
9]. Electromagnetic induction (EMI) sensors measure soil ECa based on the behavior of soil chemical elements [
10,
11,
12,
13,
14]. EMI devices do not require direct ground contact, so data collection is relatively easier, faster, and less invasive when compared with contact models. This technology is particularly suitable for on-the-go intensive soil monitoring operations, providing a significant number of sample points and more comprehensive area coverage than would be feasible using traditional core-sampling methods and laboratory analysis.
EMI devices use active ground conductivity meters (GCM) sensors, consisting of transmitting and receiving coils producing alternating electrical currents in the ground. Electrical currents start in the transmitting coil and generate a primary time-varying electromagnetic field. This primary field induces eddy currents to flow through the soil, thus creating a secondary electromagnetic field due to the soil electrical conductivity properties. Resulting electrical-current amplitude and phase generated by the secondary electromagnetic field are measured by the receiving coil [
15]. Under conditions known as “low induction number operation” [
16], the secondary electromagnetic field is proportional to earth induced currents, and can be used to calculate soil ECa, commonly expressed in units of millisiemens per meter (mS/m) [
15].
A popular PSS device to measure soil ECa is the EM38 (including its sub-models, EM38-MK2 and EM38DD), which have been used in several applications. Researchers have used some version of EM38 to monitor soil salinity and relate it to laboratory analysis from the beginning [
17,
18,
19,
20,
21]. Using the principles of data fusion, some studies have used DUALEM-1S and DUALEM-421S to measure soil water concentrations [
22,
23,
24]. Veris
® MSP and 3100 instruments, as well as EM38 and EM31 have been used to measure texture [
25,
26,
27]. Dualem-421S and EM38DD have been used for mapping cation exchange capacity [
28,
29]. More recent applications have used EM38, EM34 and EM31, Veris
® 3100 and KT-5 (SatisGeo, Brno, Czech Republic) for mapping with-in-field spatial variation of soil types [
30,
31,
32]. Soil mapping in 3D has been studied with Dualem-421S [
28] while EM38-MK2 has been employed in tailored data acquisition structures [
33]. Therefore, large georeferenced datasets are gathered by continuous recording during less invasive and high-speed operations.
High density data sampling can provide an efficient characterization of soil property variations [
32,
33]. Sudduth et al. [
34] recorded more than 5000 observations, corresponding to a 4–6 m data spacing, and Islam et al. [
35] recorded more than 14,000 observations in a 1.4 ha ECa dataset. They applied the EM38-MK2 on a 1 s interval, stating that amount of points could enable automatic variogram fitting, provide proper kriging interpolations, optimal-design, and fast-navigation paths as required for cost-effective survey interventions. Whereas high density datasets may reduce bias in sampling designs [
25], sample density directly affects output map uncertainties. Therefore, studies to tailor transect spacings and sample densities should be considered to overcome soil ECa map uncertainties when integrating mobile visualization (on-the-fly) and on-the-go monitoring for variable rate application decision support. In this context, this work aims to contribute to these questions, establishing efficiency thresholds to maintain output map accuracy. Investigations on different sample designs look to the reduction of transect lines and sampling observations as a matter of minimum track distance and maximum operational speed, respectively. The overall objective is to evaluate interpolated on-the-go ECa maps from different sample designs using common validation indices. It is believed that a proper combination of sampling designs can improve operational efficiency, preserving high quality with low uncertainty in map generation. The paper introduces preliminary analysis of continuous EM38 operational frameworks in Brazil, and it could provide basic information on optimal PSS sampling designs in tropical-soils that are relevant to central pivot no-till grain production.
2. Materials and Methods
This section details materials and methods related to soil ECa survey dataset investigations on sample distribution and density influencing output map accuracy. Specific objectives are addressed by considering: four transect spacing subsets; four sampling density subsets using the random point and four using the Douglas-Peucker selection algorithms; and kriging interpolations evaluated by a standard external validation subset for mean error (ME) and root mean square error (RMSE) indexes.
2.1. Study Area
The study was carried on a grain crop rotation production system (i.e., beans, soybeans, wheat, and oats) under central pivot irrigation and no-till soil management. The farm is located at Itaí district, São Paulo State, Brazil (
Figure 1a). It has central coordinates of 23.58544° South latitude and 48.9395° West longitude, in a subtropical climate, with annual average maximum and minimum temperatures of 26 °C and 16 °C, respectively, and an average rainfall of 119 mm [
36]. The paddock area is 72 ha at a maximum elevation of approximately 712 m above sea level (
Figure 1b). The regional soil characterization is an association of LATOSSOLOS VERMELHOS Distróficos (“
Ferralsols”) and ARGISSOLOS VERMELHOS-AMARELOS Distróficos (“
Acrisols”) (Brazilian Institute of Geography and Statistics, 1:5000,000), The pivot area was offseason with wheat straw cover during survey.
2.2. EM38-MK2: Mobile Data Acquisition Structure and Survey Operation
This ECa survey used the EM38-MK2 EMI meter which includes two receiver coils, separated by 1 m and 0.5 m from a single transmitter coil. This device provides two simultaneous ECa datasets with readings in milliSiemens per meter (mS/m), either in vertical or horizontal dipole orientations. The effective depth ranges of ECa readings are 1.5 m and 0.75 m in the vertical position (ECaV), or 0.75 m and 0.375 m in the horizontal position (ECaH).
Field operations started with sensor calibrations at the height of 1.5 m from the ground in both horizontal and vertical dipole orientations. After calibration procedures the sensor was placed perpendicularly to the earth’s surface in a mobile data acquisition system as further detailed, providing vertical dipole readings (ECaV). As no measurements were taken in the horizontal dipole orientation, the resulting dataset is further refereed as ECa.
Data storage was done in a single and continuous run, using Bluetooth connections between an Archer Rugged Handheld (Juniper System Inc., Bromsgrove, UK) PDA and two professional georeferencing devices, a XGPS-100A (Dual Electronics Corporation, Lake Mary, FL, USA) roof top GPS and a GeoExplorer 3000 (Trimble Inc., Sunnyvale, CA, USA)
A mobile data acquisition system was structured in a wooden box with no metal parts used. The box was assembled using wood-glue and Velcro tapes wrapping it over a high-resistance rubber mat (1 cm thick). The rubber mat was attached to long nylon straps connected to the back of a 4 × 4 pickup, dragging the structure 3 m apart to avoid magnetic interference from the metallic body (
Figure 2).
Navigation speed through the entire study area was kept constant at 15 km/h, taking 90 min for the total operation and collecting parallel transect lines in a back-and-forth path. The average distance between transect lines was approximately 40 m (
Figure 1b). The soil ECa survey raw dataset was of 5788 observation points in total. The navigation path included 26 transect lines with the EM38-MK2 sensor set for a 1 sec reading interval.
2.3. EM38-MK2 Data Filtering and External Validation
Exploratory data analysis was applied to the EM38-MK2 raw dataset to investigate for outlier values due to potential electromagnetic interferences by metallic parts of the irrigation pivot framework creating high conductivity at specific locations. Spatial query filters were used to remove ECa observations recursively measured in the same location when brief stops for operational maintenance were necessary. Complementarily, sample points that drifted off transect lines were removed to improve a parallel sampling design path. The remaining clean dataset was of 4306 points in total. The final preprocessing step used an automatic random subset sampling algorithm, in the R statistical packages [
37], to subset points, reserving 400 points for use in external map validation of the simulations from the different sampling designs (
Figure 3).
2.4. Sampling Designs
2.4.1. Approach 1—Different Transect Spacings
The clean dataset of 4306 ECa samples in 26 transect lines was used to generate another three sample subsets by increasing transect lines distances and consequently reducing the total number of parallel lines in each subsequent set. From the most detailed dataset of 26 transects, approximately 40 m apart, with 3906 points after external validation subset extraction (i.e., 4306 − 400 = 3906); the other three simulation datasets used transect distances of 80 m, 150 m, and 300 m, respectively (
Figure 3).
Table 1 details all four resulting simulation datasets with their respective numbers of transect lines and the remaining dataset size.
2.4.2. Approach 2—Different Sample Densities Using Random and Douglas-Peucker Algorithm
From the standard survey dataset (3906 points), another four simulation subsets were extracted for sampling designs using different sample densities. The first algorithm used was the automatic random subset sampling algorithm from the R statistical packages [
37], which eliminated 25%, 50%, 75%, and 95% of sample points from the original dataset (
Table 1 and
Figure 3). The same removal percentages were performed by the
DouglasPeuckerNbPoints function implemented in the
kmlShape package [
38]. This algorithm consists of a proximity rule, where all original data points must be within a certain distance from the estimate. A polyline is created using the input dataset coordinates as polyline vertexes, from which a tolerance distance or an idealized number of points can be predefined. The algorithm strategy recursively creates new segments approximating the original polyline, until all vertices of the polyline satisfy the predefined tolerance condition [
39]. Both sampling design approaches were further evaluated to assess kriging interpolation accuracy metrics using an external validation subset as further detailed in the next section.
2.5. Statistics, Interpolation, and Mapping Uncertainties
Kriging has been used for many decades for spatial interpolation [
40] and is one of the geostatistical tools widely used with good references [
41]. Ordinary kriging uses only one variable and is one of the most robust and widely used types of kriging [
42]. The main objective of kriging is to estimate the value of a random variable,
Z, where it was not measured.
The study considers two assumptions for sampling and geostatistics: (1) sampling by transects to represent the variation in two dimensions; (2) irregular sampling in two dimensions [
42]. However, here, we will briefly summarize some necessary assumptions and equations. The spatial variability of ECa for each group, selected by approach 1 or 2, was analyzed using variograms. In this analysis, the spatial dependence of an observation for a given point
z(x) is comparatively determined for a specific observation given its neighboring points
z (x + h), where
h denotes the distance lag and
N(h) is the number of data pairs separated by a particular lag vector
h. The average distance calculated for each gap of the variogram is given by
γ(h) [
41] Equation (1).
The result of the experimental variogram is the mean of the semivariance of the pairs of points
and,
, sampled over a lag distance
h. A variogram model can be fitted to the experimental variogram. Based on the variogram model, values can be estimated at locations that were not sampled using kriging. Moreover, the variogram model provides a value for the nugget variance
C0, which is the theoretical semivariance at the sampling location. It is extrapolated from the shape of the variogram model at short lag distances to
h equal to zero. The nugget variance includes the variance that is associated with the small-scale variability that cannot be further distinguished by the sampling procedure, and it also includes the variability that is caused by analytical and sampling error [
43].
When the nugget variance is subtracted from the sill, the structured variance C is obtained, i.e., the variance being explainable from neighboring observations. The range (a) is the distance at which neighboring observations become spatially independent.
Therefore, to manifest the spatial continuum of observations, the optimum sampling distance must be taken shorter than the range; as observations become less and less related, the more they approach the range. The maximum sampling distance (upper limit of cell size) can be determined with the “mean correlation distance” (MCD) [
44], and can be calculated for spherical variogram models with Equation (2).
The by-product of ordinary kriging is the kriging variance, and the standard error can be calculated as the square root of the variance. Therefore, this by-product is a spatial variation function of the data (i.e., modeled by the variogram) or the spatial configuration of the data concerning each of its estimated values. The variance of the estimate is the expected value between the and .
Thus, the variance of each map produced from approach two was calculated using the two selection algorithms. Then, to assess the spatial difference between the two methods, each equivalent map in each level of removal density was subtracted from the results of the two different algorithms to identify the spatial variation of the estimated variances.
Statistical analysis of ECa sample subsets for the two approaches was evaluated for normal distribution patterns according to kriging interpolation assumptions. If the data were not normally distributed, they were transformed by the natural logarithm. A manual variogram fitting procedure for all simulations in both approaches used variogram analysis tools from the gstat package [
45] in R software [
37] for isotropic variogram fitting. Ordinary kriging interpolations used the
krigeTg function, applied either to normal distribution or natural logarithm transformed subsets.
ECa map accuracy of all combinations was evaluated with an external validation subset using the mean error index (ME) in Equation (3), and the root mean square error (RMSE) in Equation (4).
where:
N is the number of observations,
is the observed value and
is the predicted value.