The Use of Spatial Interpolation to Improve the Quality of Corn Silage Data in Case of Presence of Extreme or Missing Values

Koutsos, Thomas M.; Menexes, Georgios C.; Eleftherohorinos, Ilias G.

doi:10.3390/ijgi11030153

Open AccessArticle

The Use of Spatial Interpolation to Improve the Quality of Corn Silage Data in Case of Presence of Extreme or Missing Values

by

Thomas M. Koutsos

^*

,

Georgios C. Menexes

and

Ilias G. Eleftherohorinos

School of Agriculture, Faculty of Agriculture, Forestry and Natural Environment, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2022, 11(3), 153; https://doi.org/10.3390/ijgi11030153

Submission received: 3 January 2022 / Revised: 13 February 2022 / Accepted: 21 February 2022 / Published: 22 February 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Agricultural spatial analysis has the potential to offer new ways of analyzing crop data considering the spatial information of the measurements. Moving from farmers’ estimates and crop-cuts techniques to interpolation is a new challenge, and a promising path to achieving more reliable results, especially in the case of field data with extreme or missing values. By comparing the main descriptive statistics of three types of crop parameters (fresh weight, dry weight, and ear weight) in three randomly taken maize plots, we found that the issue of missing values can be addressed by using interpolation to calculate estimated values of given parameters in non-sampling locations. Moreover, based on the descriptive statistics, the implementation of interpolation can reduce crop field variability (extreme values) and achieve an improvement of coefficient of variation (CV) values up to 30%, compared with other methods used, such as the replacing of missing values by the average of all data, or the average of the row or column, with an improvement of only up to 15%. These findings strongly suggest that the implementation of an interpolation method in case of extreme or missing values in crop data is an effective process for improving their quality, and consequently, their reliability. As a result, the application of spatial interpolation to existing crop data can provide more dependable estimations of average crop parameters values, compared to the usual farmers’ estimates.

Keywords:

spatial analysis; experiments; agricultural experimentation

1. Introduction

Crop yield and other crop parameters such as fresh and dry weight are the most commonly computed estimates to provide productivity metrics in both plot-level analysis, and in larger scale design. However, although a wide variety of methodological protocols and sampling methods was used for the estimation of crop parameters, no improvements were observed, and in some cases, the estimated accuracy is not clearly understood or clarified [1,2,3,4,5,6,7]. Among these methods, the crop-cutting techniques have been regarded as the most reliable and objective for estimating crop yield and other crop parameters, although they are usually dependent on a number of factors, such as administrative setup, type and size of field staff, farmer cooperation, land configuration, field shape, soil differences and harvest time, crop type, cropping pattern, available skills, and resources [6,8,9,10,11,12,13]. Furthermore, the existence of missing values and outliers, along with the arbitrary process of removing measurements from field areas with no adequate plant density, are the most important issues that further affect the expected shortcomings in the field estimates.

Data quality issues, such as completeness, consistency, accuracy, and validity [14,15,16,17,18], are needed before the analysis, because data with low quality lead to poor decision-making, and consequently, to misleading conclusions, and, in some cases, do not reflect actual or real-world situations. Completeness is mainly related to the presence of missing values in a data set, which, from a statistical point of view, reduce the representativeness of the sample, and can therefore falsify parameter estimates and inferences about the parent statistical population. In addition, the statistical analysis of data with many missing values becomes complicated and risky, since, according to [19,20], the analysis and the imputation of missing values in a data set derived from an experiment are not easy tasks, even in cases where only one observation is missing. Based on this, [21] reported that there are several ways to overcome the issues of missing values, such as (a) imputation by filling the “gaps” with other values (e.g., mean or median values), (b) omission by discarding cases with missing values from further statistical analyses, and (c) the use of specialized statistical methods that are rather robust and unaffected by the presence of missing values.

The prediction of missing values at non-sampled points, using existing measured data, is of great interest in many scientific disciplines, and is highly appreciated as the best linear unbiased prediction (BLUP) method for delivering accurate predictions at non-sampling locations [22,23,24]. Among the most successful statistical methods, Kriging, a geostatistical interpolation gridding method, has been proven useful and popular in many fields, producing visually appealing contour maps from irregularly spaced data. Regarding agriculture, Kriging is highly recommended as an effective method to derive single-year yield maps, such as durum wheat yield data [25], sugarcane data [26], corn grain and corn silage yield data [24], or multi-year crop yield data, including both spatial and temporal variation in yield. Kriging has also been successfully applied to soil parameter mapping [27], such as iron content, alone or via Co-Kriging, to explore the spatial cross-correlation of soil parameters and pH or electrical conductivity [28].

The main idea of using the effective interpolation Kriging method is to obtain more weight in the prediction of missing data (missing values) near sample points. Kriging is based on the knowledge of the spatial structure of data via the use of variogram or covariance with the calculation of the weighted average of data. New values can be estimated either in non-sampled locations (missing values) or in locations with extreme values (outliers). The estimation of the interpolated value is given as the linearly weighted sum of the values of its surrounding points. The calculation of the weighting factors is done by minimizing the error variance of the model of the spatial continuity for the data regarding the spatial distribution of the data. Therefore, the final aim of the Kriging interpolation method is to always achieve the best linear unbiased prediction based on the calculation of prediction variance, such as the variance of the difference of the linear predictor and the measured data. In general, interpolation has been thoroughly examined for its importance in precision crop production [29] to explore the drivers of the within-field spatial and temporal variability of crop yield [30], or to predict the spatial patterns of within-field crop yield variability [31]. Recently, a new, interesting approach has been proposed for locating and removing extreme values from crop field data using spatial analysis [32].

This work aims to examine whether the implementation of a spatial interpolation technique can improve crop data quality by reducing crop field variability. We also aim to explore the way in which interpolation can manage extreme and missing values in experimental plot data, and therefore its potential to provide more reliable and representative mean values as productivity metrics for crop parameters. The main idea is to take advantage of the power of an exact interpolator, such as Kriging, to transform the measurements taken in the field into a surface of new estimated values throughout the area of interest. If interpolation can improve data quality, then estimates can be more accurate, and therefore, the calculation of mean values as crop metrics can be more dependable, reliable, and representative for the statistical population of any crop parameter under consideration. Therefore, given the most commonly measured maize parameters (fresh weight—FW, dry weight—DW, and ear weight—EW) of three randomly taken experimental plots, we examined the benefit of applying an interpolation method to the original set of measurements and how safe it is to draw conclusions for larger cultivating areas based on small experimental plots.

Therefore, the main objectives of this work are the following:

–: Examine whether an interpolation method can improve field data quality by reducing the expected crop field variability, and to what extent this can be achieved.
–: Examine whether an interpolation method can effectively address the problems of extreme or missing values in data.

2. Materials and Methods

A maize (hybrid AGN720, Italy) crop was established in a three-ha field area of the AUTH Farm (latitude 40°32′1.75″ N and longitude 22°59′26.98″ (A) during the 2016 growing season. The seedbed was prepared according to agricultural practices applied in the area, and fertilization was made with 200 and 100 kg N and P/ha, respectively. In late April (27 April 2016), the field was sown with a 4-row pneumatic sowing machine Gaspardo. Weed control was achieved with the recommended rate of the early post-emergence applied herbicide Modett 25/28 SE (25% terbuthylazine + 28% dimethenamid-p), whereas irrigation was performed according to the requirements of the crop plants. After crop emergence, three randomly taken plots (4 m × 4.25 m) with six rows per plot were marked. The distance between the crop rows was 80 cm, and the distance between plants in the same row was 17 cm. Therefore, there were 25–26 plants/row, 150 plants/plot, and 450 plants in total. The distance between the three plots was 20 m. The individual plants in the three plots (Figure 1), for the purpose of the study, were considered as the units of the target statistical population from which samples had been extracted. A grid of 25 (rows) × 6 (columns) was used for the representation of the harvested plants (Figure 1) in each plot (where: row is x dimension and column is y dimension). A unique ID was assigned to each measurement/plant based on the row and column of each plant in each plot. IDs were also assigned in places where no measurement was available (the missing values appear as places with no dots in Figure 1), to be filled in later with an average value of either the total average of all measurements (RT method), an average of the measurements in the same row (RR method), or an average of the measurements in the same column (RC method).

At the silage stage (when the kernels began to glaze) of maize (14 weeks after sowing), all plants were harvested from each subplot, and the silage yield (fresh weight—FW, dry weight—DW, and ear weight—EW) of each plant was recorded. The silage stage was determined by breaking the ears of maize and visually evaluating the kernels’ stage of development. The silage yield data (measurements for fresh weight, dry weight, ear weight) were then entered in Excel, and organized in one table containing all information available, such as the plot and the exact position of each record (column: x dimension, row: y dimension) based on a 25 (rows) × 6 (columns) grid that represents the harvested plants in each plot (Figure 1). For achieving better visual results, we used the centers of the squares of the grid of each plot, instead of the absolute numbers of rows and columns of the measurements (i.e., for the plant located in the first row and first column, we assigned dimensions x = 0.5 and y = 0.5 instead of x = 1 and y = 1).

The measurements of the three determined crop parameters (FW, DW, EW) obtained from the three field plots were finally used to construct the corresponding Kriging grids and contour maps to perform comparisons and present the spatial variability of data. For each plot and crop parameter (fresh weight, dry weight, and ear weight), a grid of the new estimated values (software used: Surfer for Windows), with size: 100 rows × 24 columns, was constructed using Ordinary Kriging.

For each plot and each crop parameter, contour maps were created based on these new grids (Figure 2). Estimated values were then extracted from each grid to an ascii format file (.grd convert to .dat file), to be used in Excel to calculate metrics for the descriptive analysis and for the boxplots, to present statistics for each distribution, and make the comparison between them easier.

The implementation of Kriging and the production of the new grids of estimated data were done through ‘Surfer for Windows’, a very popular and effective Windows program from Golden Software. Surfer for Windows is one of the best modelling geo-spatial tools for data interpolation, producing accurate grids for several interpolation methods. The aim of using the Kriging interpolation method is to achieve the best linear unbiased prediction, based on the calculation of prediction variance as the variance of the difference of the linear predictor and the measured data. Therefore, through the implementation of Kriging, the value of the random function

Z = Z (x)

at any arbitrary location of interest x0 can be estimated based on the nearest measured observations

z (x_{i})

of

Z (x)

at the

n \in ℕ

sample points

x_{i}

. Kriging uses a weighted average of the nearest measured observations at the sample points

x_{i}

to reveal the spatial structure of data and calculate weights based on the assumptions on

μ (x)

and on the variogram or covariance function of

Z (x)

. As this prediction variance decreases, the accuracy of the linear predictor increases, and the best linear prediction is achieved.

Concerning the general information about the gridding process, the following parameters were used (all the default settings in creating a grid using Surfer, the gridding software used): no trend removal was used, and the automatic variogram fitting mode was selected, which attempts to find a better set of parameters for the current model. Regarding the values used by the autofit feature, the fit criterion was set to “Least Squares” with maximum iterations set to 50, target precision (%) set to 0.0001, and maximum distance set to 1 × 10³⁸. Concerning the experimental variogram, the lag size was set to 0.5, the number of lags was 15, the direction equaled 0, and tolerance equaled 90. For the semi-variogram model: the variogram component type selected was linear for all grids created (anisotropy angle set to 0 and anisotropy ratio set to 1). Concerning the Kriging parameters and gridding rules, Kriging type was set to point (point weight measurements for fresh, dry and ear weight corresponding to plants). Regarding the “search neighborhood”, all data were used for the estimation of the values of the new grids. The main descriptive statistics were calculated for the selected three types of crop parameters (FW, DW, EW) for all the plants harvested in the three plots for three different cases: (a) measured data or data for all harvested plants (M); (b) data without the measurements from the excluded area (MC) due to reduced plant density; (c) data after interpolation (I). The descriptive statistics for the FW, DW, and EW are shown in Table 1, Table 2 and Table 3, respectively.

Among other descriptive statistics, the main metric used to compare the performance (reduction of data variability—improved mean values) of the above sampling methods was the coefficient of variation (CV), which also indicates the degree of precision with which the treatments are compared, and generally is a good index of the reliability of the experiment [19]. Average CV was calculated for each of the three types of crop weights, and for each of the three plots, to explore and compare the variation of the different weights type level per sampling method, parameter, and plot. Suppose that there is n set of co-ordinates, z₁, z₂, …, z_n,

Y (z_{n})

their weight values in a single plot, where

\bar{Y}

the mean value. CV can be defined as (Equation (1)):

C V = \frac{\sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(Y (z_{i}) - \bar{Y})}^{2}}}{\bar{Y}}

(1)

The CV represents the experimental error as a percentage of the total mean, and therefore, the lower the CV, the higher the reliability of the experiment. The performance of each sampling method under consideration was assessed based on the descriptive statistics, and mainly on the mean values and the corresponding CV values in each case.

3. Results

The Kriging contour maps (Figure 2) offer a great representation of the variability of the crop parameter values under consideration, where outliers are easily located (either visually by the red color area or through analyzing the estimated values from the corresponding grids). Based on the different color area pattern of Figure 2, it is clear that the fresh weight in all plots has a similar color areas pattern (estimated data) to that of ear weight, but different than that of dry weight. However, the Kriging standard deviations derived from the implementation of spatial interpolation on the fresh weight were similar in plots 1, 2, and 3, with respective standard deviations of dry weight and ear weight (Figure 3).

Regarding the coefficient of variation (CV) in all data from three plots, the fresh weight, dry weight and ear weight, CV values in most cases of applying interpolation were almost three times as good (CV lower values) as in any other sampling method. Interpolation performed better in all plots and for all types of crop weight, with the only exception for the RR method in plot 3 for dry weight, while for fresh and ear weights, the CV values were lower, but close to those of the RR (missing values replaced by the average of the row where the missing value belongs) method (Table 1, Table 2 and Table 3). In plot 1, the CV values for the interpolation were better than in any other method, and specifically twice as good as those for RT (missing values replaced by the total average of all values) and RC (missing values replaced by the average of the column where the missing value belongs) method, while CV values for MC (data measured cutting off the measurements from the lines 21 to 25), and RR was no different to the CV values in measured data (M). In plot 2, the CV values for the interpolation were also better than in any other method (MC, RT, RC, RR). The CV values for RT and RC were better than those in plot 1, but even so, the CV values for the interpolation were twice as good. The CV values for MC and RR methods were very close to those in measured data (M) that showed no improvement. In plot 3, the CV values for the RR method were (surprisingly) slightly better than those for the interpolation, and markedly better than in the rest of the methods. However, the CV values for the interpolation were still better than those for MC, RT, and RC methods. In general, the improvements in CV values of fresh weight, dry weight, and ear weight data derived from (Kriging) interpolation were 26.3–33.7% (range: 7.4%), 30.0–31.1% (range: 1.1%), and 25.2–33.2% (range: 8%), respectively, compared with the measured data (Table 4). The corresponding min–max values for CVs in RR were from 0.5 to 39.5% (range: 39%) for the fresh weight, 1.4% up to 44.9% (range: 43.5%) for the dry weight, and 1.4%, up to 34.9% (range: 33.5%) for the ear weight (Table 4). Therefore, although the RR method performed better in plot 3 than the interpolation approach, overall, the range of CV values was significantly lower in CV values from the interpolation, which shows a much more robust improvement.

Kriging, compared to the sampling methods under consideration (RT, RC, RR) in all plots, showed a better performance in handling the outliers issue (Table 1) for the types of weight (FW, DW, EW) under consideration, and for all three plots, with the only exception the RR method in plot 1 and 2. For all the data of three plots, the upper and lower outliers % for fresh weight, dry weight, and ear weight in most cases of interpolation were better than in any other method (RT, RC, RR). More specifically, in plot 1 and plot 2, the upper outliers % for the interpolation approach were better than those for RT and RC method, and only the RR method performed better than the interpolation. However, in plot 2 for the lower outliers %, the interpolation approach performed better than RT and RC, and the same as the RR method for all three types of weight. In plot 3, the upper outliers % were better for the interpolation compared to RT, RC, and RR methods, and the lower outliers % were also better for the interpolation, with the only exception being those that correspond to the RC method for the fresh weight. In a nutshell, the fresh weight upper outliers % for the interpolation ranged from 0.3 to 1.5% (range: 1.2%), while for the RT method, it was 2.1 to 5.3% (range: 3.2%), for the RC method, it was 1.4 to 4.0% (range: 2.6%), and for the RR method, it was 0 to 6.3% (range: 6.3%). Respectively, for the dry weight, the upper outliers % for the interpolation ranged from 0.5 to 1.6% (range: 1.1%), while for the RT method, it was 3.3% in all plots, and for the RC method, it was 2.0 to 3.3% (range: 1.3%), and for the RR method, it was 1.3 to 4.7% (range: 3.4%). Moreover, for the ear weight, the upper outliers % for the interpolation ranged from 0 to 0.1% (range: 0.1%), for the RT method, it ranged from 0 to 2.7% (range: 2.7%), for the RC method, it also ranged from 0 to 2.7% (range: 2.7%), and for the RR method, it ranged from 0 to 0.7% (range: 0.7%). Therefore, the above findings indicate that interpolation improved CV values (lower values), replaced missing values, and lowered the number of both upper and lower outliers (%), in most cases compared to the rest of the sampling methods (RT, RC, and RR).

Since the coefficient of variation (CV) values have been improved (lower values), missing values have been replaced by estimated values via interpolation, and the number of both upper and lower outliers (%) was lower compared to the rest of the sampling methods (RT, RC, and RR), we can conclude that the mean values after the implementation of Kriging are more dependable and reliable, and thus, the overall quality of crop data has been improved.

The differences in CV values (%) between all measured data and those derived from the sampling methods MC, RT, RC, RR, and I (Table 4) clearly show the efficiency of each sampling method compared to interpolation (I). It is notable that implementation achieved up to 30% lower CV values (column: I diff% from M), whereas the other sampling methods (RT, RC, RR) of replacing missing values with an average value (total average, row or column average value) achieved only up to 15% lower CV values (Figure 4). Thus, interpolation performed twice as well in reducing data variability than other commonly used methods. It must also be noted that, despite the changes in CV values due to interpolation, mean values were not affected (Figure 5), proving the efficiency of the method.

Concerning the mean values of fresh weight, dry weight, and ear weight, they were more or less similar between the methods used (Table 5), and only the RR showed an overestimation (plot 2 for all weights) or underestimation (plot 3 for all weights). The MC approach (“cutting off” the area with low data density) showed similar mean values to the measured data (mean values ranged from −0.9 to 1.9%), while the RT and RC methods performed quite similarly, with mean values ranging from −2.5 to 3.1%, and −2.2 to 3.0%, respectively. The RR method performed worse than any other method with mean values ranging from −8.5 to 11.7%, whereas the mean values of interpolation method were slightly higher than the RT and RC methods. In a nutshell, for all data of three plots, the RT method had no difference compared to the measured data (M), while MC, RC, and RR methods had similar mean values to the measured data in plot 1, same or slightly overestimated in plot 2, and the same or slightly underestimated in plot 3. For the total values from all plots, the interpolation approach had slightly higher mean values (ranging from 0.5% to 1.5%) compared to the rest of the methods (0 to 0.4%), while, based on handling missing and extreme values, this difference is considered to be a “correction” of the measured mean values. More specifically, in plot 1, the interpolation had same mean values compared to the measured data, in plot 2, slightly overestimated mean values, and in plot 3, the mean values were almost similar to the measured data for the fresh weight, slightly underestimated for the dry weight, and slightly overestimated for the ear weight.

The boxplots (Figure 4, Figure 5 and Figure 6) for the different sampling methods (MC, RT, RC, RR, I) indicated that the approach of excluding areas of measurements with low data density (MC) has no great effect on the data variability (the boxes and lower/upper whiskers are almost identical) compared to the corresponding boxplot resulting from all measured data (M). In addition, the other approaches (RT, RR, RC) to replacing the missing values seem to provide better overall performance (the boxes are shrunken). However, replacing the missing values with the row average (RR) can be unsafe due to the limited number of observations (less than six). This is confirmed in the case of ear weight for plot 3, where the RR approach performed quite well, but for plot 2, the results lead to an increased data variability. Finally, Kriging interpolation (I), for the three crop parameters and for all plots, shows a better performance (more shrunken boxes compared to the corresponding boxplots from all measured data) compared to the previous approaches of replacing missing values by an average (either row, column, or total). In addition, the variability and the symmetry of the distributions in each case (three types of weight, three plots and data from all plots) remain almost stable and consistent.

4. Discussion

The similar estimated CV values for all crop parameters, using either all the measured data (M) of the three plots or those obtained after the removal of the measurements at the upper part of the plots due to low plant density (MC approach, “cutting-off” an area with reduced plant density/measurements), strongly suggests that this approach is not adequate to improve data variability, and it is rather a biased and unsafe method. In addition, these results indicate that the common practice (for agro-scientists and farmers) of excluding areas with low density data to address the issue of the missing values cannot safely lead to improved results.

The reduction in CV values by 7.9–15.8% or 7.8–15.3% in the case of replacing the missing values with the total average value of all the data in the plot or with the total average value of the column (where the missing value belongs to) shows that these two approaches could be safely used for the improvement of the data quality before their statistical analyses. However, the reduction in CV values by 0.5–44.9% in the case of replacing the missing values with the total average value of the row (which the missing value belongs to) indicates that this method is less safe than the other two methods due to the wide percentage range of improvement. This is confirmed by the fact that, for plots 1 and 2, the replacement of the missing values with row average improved CV values by only 0.5–3.7%, while in plot 3, the respective improvement for the three crop parameters was much higher and ranged from 34.9 to 39.6%.

The improvement of CV values by 25.2 to 33.7%, in the case of addressing the issue of missing values through the implementation of an interpolation method such as Kriging, shows a much more solid approach. In particular, the range difference % in CV values was similar for all the plots, and for all the crop parameters, it was 1.3–1.8% in all cases, 1.1% in plot 1 (25.2 to 26.3%), 2.2% in plot 2 (27.6 to 29.8%), and 2.6% in plot 3 (31.1 to 33.7%). The fact that the overall average improvement in CV values for the MC approach is only 1.7%, for the RT 12%, for the RC 11.8%, for the RR 14.5%, and for I (interpolation) 29.8%, suggests that the implementation of Kriging was found to be the best method to reduce data variability as compared to other methods (MC, RT, RC, and RR). In a nutshell, Kriging performed at least twice as well in achieving better CV values compared to all other sampling methods.

The best improvement of CV values in the case of interpolation can be explained by the fact that this process aims to create a grid of estimated data as close as possible to the original measurements at non-sampled locations (Figure 7). This is because Kriging can achieve an accurate and precise prediction of the missing values as it calculates them, by considering the rest of the data, depending on their values and distance (weight) from the estimated value location. The effectiveness of this approach is also confirmed by comparing the Kriging results with those obtained from the other sampling methods that are based on the replacement of missing values with an average (total, row, or column average). Based on these, Kriging performed at least twice as well (up to 30% lower CV values) compared to the other sampling methods that achieved an improvement in CV values of only up to 15%. Therefore, after the implementation of Kriging, the dispersion of data is limited, and the variability is reduced, providing a remarkable improvement in interquartile range (IQR), standard deviation (SD), and coefficient of variation (CV) values, which lead to more consistent means, and make the predictions more dependable than those without spatial interpolation. In general, based on the comparison between measured data and data derived from the interpolation process, it is clear that the implementation of Kriging is a solid way to both estimate values at non-sampling locations, and to abate the impact of outliers, while the process offers an overall improvement on data quality.

The results of this study indicated that the Kriging implementation provides a visual and very useful way of understanding the existing variability of the three types of weight (fresh, dry, and ear weight) measured for all the harvested plants in each plot, through the construction of contour maps for each crop parameter. The Kriging contour maps, apart from the great representation of the spatial variability of crop parameters, have the potential to offer a basis for the further analysis and interpretation of this variability, and to provide insights into a possible decision support making for the delineation of management zones. The differences in variability spotted for the same parameter between plots show that the spatial information of the measurements cannot be ignored, and should be considered before the calculation of crop mean values as metrics for larger areas. The high variability in the values of the crop parameters between the adjacent plants, either within the same plot or between plots, is contrary to the expected, since the crop cultivar used was hybrid and all seeds have the same genotype. These differences in variability, which could be attributed to soil and climatic conditions prevailing during the growing season, along with the agricultural practices applied, were also identified and confirmed by the Kriging contour maps.

5. Conclusions

The results of this study indicated that the quality of experimental data can be improved before the calculation of mean values via the implementation of Kriging interpolation. More specifically, Kriging managed to refine data suffering from missing values by estimating values in non-sampled locations (plants that did not emerge). Moreover, the Kriging spatial interpolation method managed to balance the outliers by estimating new values in these locations based on the value and weight (distance) of their neighboring values that reduced the effect of the existing extreme values. As a result of this process, crop field variability (CV values) in crop parameters measurements was reduced up to 30% compared to other sampling methods (based on the replacement of missing values by an average value) that achieved lower CV values of only up to 15%.

Based on this significant reduction in crop data variability and the effective way of handling the missing values and outliers, we can conclude that the implementation of Kriging can lead to much more reliable and representative crop mean values, that can be used as crop metrics for further calculations and estimates. In addition, the lower CV values provide a greater precision in the data available for decision-making, supporting the vital role of applying interpolation in crop field data for profitability estimates.

Author Contributions

Conceptualization, Thomas M. Koutsos; methodology, Thomas M. Koutsos, Georgios C. Menexes and Ilias G. Eleftherohorinos; software, Thomas M. Koutsos; validation, Thomas M. Koutsos, Georgios C. Menexes and Ilias G. Eleftherohorinos; formal analysis, Thomas M. Koutsos and Georgios C. Menexes; investigation, Thomas M. Koutsos; resources, Thomas M. Koutsos; data curation, Koutsos Thomas; writing—original draft preparation, Thomas M. Koutsos; writing—review and editing, Thomas M. Koutsos, Georgios C. Menexes, Ilias G. Eleftherohorinos; visualization, Thomas M. Koutsos. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

We would like to thank the university students A. Pesios and A. Chatzopoulos for providing the data, and the staff of the A.U.Th. Farm for their assistance throughout the experiment.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kosmowski, F.; Chamberlin, J.; Ayalew, H.; Sida, T.; Abay, K.; Graufurd, P. How accurate are yield estimates from crop cuts? Evidence from smallholder maze farms in Ethiopia. Food Policy 2021, 102, 102122. [Google Scholar] [CrossRef] [PubMed]
Wahab, I.; Jirstrom, M.; Hall, O. An Integrated Approach to Unravelling Smallholder Yield Levels: The Case of Small Family Farms, Eastern Region, Ghana. Agriculture 2020, 19, 206. [Google Scholar] [CrossRef]
Wahab, I. In-season plot area loss and implications for yield estimation in smallholder rainfed farming systems at the village level in Sub-Saharan Africa. GeoJournal 2020, 85, 1553–1572. [Google Scholar] [CrossRef] [Green Version]
Abay, K.A.; Abate, G.T.; Barrett, C.B.; Bernard, T. Correlated non-classical measurement errors, ‘Second best’ policy inference, and the inverse size-productivity relationship in agriculture. J. Dev. Econ. 2019, 139, 171–184. [Google Scholar] [CrossRef] [Green Version]
Carletto, C.; Gourlay, S. A thing of the past? Household surveys in a rapidly evolving (agricultural) data landscape: Insights from the LSMS-ISA. Agric. Econ. 2019, 50 (Suppl. S1), 51–62. [Google Scholar] [CrossRef]
Casley Dennis, J.; Kumar, K. The Collection, Analysis, and Use of Monitoring and Evaluation Data. In Third World Planning Review; Liverpool University Press: Liverpool, UK, 1988; p. 91. [Google Scholar] [CrossRef]
Verma, V.; Marchant, T.; Scott, C. Evaluation of Crop-Cut Methods and Farmer Reports for Estimating Crop Production: Results of a Methodological Study in Five African Countries; Longacre Agricultural Development Centre Limited: London, UK, 1988; p. 75. [Google Scholar] [CrossRef]
Lobell, D.B.; Azzari, G.; Burke, M.; Gourlay, S.; Jin, Z.; Kilic, T.; Murray, S. Eyes in the Sky, Boots on the Ground: Assessing Satellite- and Ground-Based Approaches to Crop Yield Measurement and Analysis. Am. J. Agric. Econ. 2020, 102, 202–219. [Google Scholar] [CrossRef]
FAO—Food and Agriculture Organization of the United Nations. Methodology for Estimation of Crop Area and Crop Yield under Mixed and Continuous Cropping. In Publication Prepared in the Framework of the Global Strategy to Improve Agricultural and Rural Statistics; FAO: Rome, Italy, 2017. [Google Scholar]
Piepho, H.P.; Mohring, J.; Williams, E.R. Why Randomize Agricultrural Experiments? J. Agron. Crop Sci. 2013, 199, 374–383. [Google Scholar] [CrossRef]
Fermont, A.; Benson, T. Estimating Yield of Food Crops Grown by Smallholder Farmers: A Review in the Uganda Context; IFPRI Discuss Pap. 01097; IFPRI: Washington, DC, USA, 2011; pp. 1–57. [Google Scholar]
Hagblad, L. Crop Cutting Versus Farmer Reports–Review of Swedish Findings. In Statistik Rapport 1998, 2; Statistics Sweden: Örebro, Sweden, 1988. [Google Scholar]
Murphy, J.; Casley, D.J.; Curry, J.J. Farmers’ estimations as a source of production data. In World Bank Technical Paper 132; World Bank Publication: Washington, DC, USA, 1991. [Google Scholar]
Liu, Q.; Feng, G.; Zheng, W.; Tian, J. Managing data quality of cooperative information systems: Model and algorithm. Expert Syst. Appl. 2022, 189, 116074. [Google Scholar] [CrossRef]
Srinath, Y.; Vijayakumar, K.; Revathy, S.M.; Rangaraj, A.G.; Sheelarani, N.; Boopathi, K.; Balaraman, K. Automated Data Quality Mechanism and Analysis of Meteorological Data Obtained from Wind-Monitoring Stations of India. In Data Management, Analytics and Innovation; Lecture Notes on Data Engineering and Communications Technologies; Springer: Singapore, 2022; pp. 237–262. [Google Scholar]
Taleb, I.; Serhani, M.A.; Bouhaddioui, C.; Dssouli, R. Big data quality framework: A holistic approach to continuous quality management. J. Big Data 2021, 8, 76. [Google Scholar] [CrossRef]
Desiere, S.; Jolliffe, D. Land productivity and plot size: Is measurement error driving the inverse relationship? J. Dev. Econ. 2018, 130, 84–98. [Google Scholar] [CrossRef] [Green Version]
Kim, T.; Ko, W.; Kim, J. Analysis and Impact Evaluation of Missing Data Imputation in Day-ahead PV Generation Forecasting. Appl. Sci. 2018, 9, 204. [Google Scholar] [CrossRef] [Green Version]
Gomez, K.; Gomez, A. Statistical Procedures for Agricultural Research, 2nd ed.; John Wiley & Sons: New York, NY, USA, 1984; pp. 276–294. [Google Scholar]
Steel, R.G.D.; Torrie, J.H.; Dickey, D.A. Principles and Procedures for Statistics: A Biometrical Approach, 3rd ed.; McGraw Hill: Boston, MA, USA, 1997; pp. 416–420. [Google Scholar]
Li, T.; Hutfless, S.; Scharfstein, D.; Daniels, M.; Hogan, J.; Little, R.; Roy, J.; Law, A.; Diskersin, K. Standards in the Prevention and Handling of Missing Data for Patient Centered Outcomes Research—A Systematic Review and Expert Consensus. J. Clin. Epidemiol. 2014, 67, 15–32. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Webster, R.; Oliver, M.A. Geostatistics for Environmental Scientists, 2nd ed.; John Wiley & Sons Ltd.: West Sussex, UK, 2007. [Google Scholar]
Wiens, D.P.; Zhou, Z. Robust estimators and designs for field experiments. J. Stat. Plan Inference 2008, 138, 93–104. [Google Scholar] [CrossRef]
Cho, J.B.; Guinness, J.; Kharel, T.P.; Sunoj, S.; Kharel, D.; Oware, E.K.; van Aardt, J.; Ketterings, Q.M. Spatial estimation methods for mapping corn silage and grain yield monitor data. Precis. Agric. 2021, 22, 1501–1520. [Google Scholar] [CrossRef]
Buttafuoco, G.; Castrignanò, A.; Cucci, G.; Lacolla, G.; Lucà, F. Geostatistical modelling of within-field soil and yield variability for management zones delineation: A case study in a durum wheat field. Prec. Agric. 2017, 18, 37–58. [Google Scholar] [CrossRef]
Maldaner, L.F.; Molin, J.P. Data processing within rows for sugarcane yield mapping. Sci. Agric. 2020, 77, 1–8. [Google Scholar] [CrossRef]
Guo-Shun, L.; Hou-Long, J.; Shu-Duan, L.; Xin-Zhong, W.; Hong-Zhi, S.; Yong-Feng, Y.; Xia-Meng, Y.; Hong-Chao, H.; Qing-Hua, L.; Jian-Guo, G. Comparison of Kriging Interpolation Precision With Different Soil Sampling Intervals for Precision Agriculture. Soil Sci. 2010, 175, 405–415. [Google Scholar] [CrossRef]
Tziachris, P.; Metaxa, E.; Papadopoulos, F.; Papadopoulou, M. Spatial Modelling and Prediction Assessment of Soil Iron Using Kriging Interpolation with pH as Auxiliary Information. ISPRS Int. J. Geo-Inf. 2017, 6, 283. [Google Scholar] [CrossRef] [Green Version]
Souza, E.G.; Bazzi, C.L.; Khosla, R.; Uribe-Opazo, M.A.; Reich, R.M. Interpolation type and data computation of crop yield maps is important for precision crop production. J. Plant Nutr. 2016, 39, 531–538. [Google Scholar] [CrossRef]
Maestrini, B.; Basso, B. Drivers of within-feld spatial and temporal variability of crop yield across the US Midwest. Sci. Rep. 2018, 8, 106–112. [Google Scholar] [CrossRef]
Maestrini, B.; Basso, B. Predicting spatial patterns of within-field crop yield variability. Field Crops Res. 2018, 219, 106–112. [Google Scholar] [CrossRef]
Vega, A.; Córdoba, M.; Castro-Franco, M.; Balzarini, M. Protocol for automating error removal from yield maps. Precis. Agric. 2019, 20, 1033–1044. [Google Scholar] [CrossRef]

Figure 1. Schematic presentation of all the maize plants (dots) harvested in each plot (M). Τhe upper part of M plots was marked as “excluded” (MC plots) due to the limited number of plants (rows 21–25) in this area (some plants were not emerged) to be distinguished from the rest of the area, where measurements have adequate density, and to improve the overall quality of the mean estimates. Where: M: All measured (all sampled data); MC: Measured cut (without the measurements from the upper part area—lines 21 to 25).

Figure 2. Spatial interpolation using Kriging on the three crop parameters: (a) Fresh weight (FW); (b) Dry weight (DW); (c) Ear weight (EW) of all harvested individual plants.

Figure 3. Kriging standard deviations derived from the implementation of spatial interpolation on the three crop parameters: (a) Fresh weight (FW); (b) Dried weight (DW); (c) Ear weight (EW) of all harvested individual plants. Areas with lower standard deviation (SD) values appear as “data holes” in the plots, and they correspond to the missing values. Areas with even lower data density (upper part of plots) have higher estimated standard deviation values concerning the implementation of Kriging.

Figure 4. Fresh Weight (FW) descriptive statistics for the different sampling methods: M: Measured (all data); MC: Measured cut (without the measurements from the upper part area—lines 20 to 25); RT: Missing values were replaced by the total average of all values; RC: Missing values were replaced by the average of the column, where the missing value belongs to; RR: Missing values were replaced by the average of the row, where the missing value belongs to; I: Interpolation.

Figure 5. Dry weight (DW) descriptive statistics for the different sampling methods: M: Measured (all data); MC: Measured cut (without the measurements from the upper part area—lines 20 to 25); RT: Missing values were replaced by the total average of all values; RC: Missing values were replaced by the average of the column, where the missing value belongs to; RR: Missing values were replaced by the average of the row, where the missing value belongs to; I: Interpolation.

Figure 6. Ear weight (EW) descriptive statistics for the different sampling methods: M: Measured (all data); MC: Measured cut (without the measurements from the upper part area—lines 20 to 25); RT: Missing values were replaced by the total average of all values; RC: Missing values were replaced by the average of the column, where the missing value belongs to; RR: Missing values were replaced by the average of the row, where the missing value belongs to; I: Interpolation.

Figure 7. Schematic illustration of the Kriging interpolating approach on how to address missing and extreme values in original experimental data. This is a real data section across plot 1 (section AB) comparing sampled and estimated (via Kriging interpolation) fresh weight (FW) values. Kriging calculates values at non-sampled locations (missing values) and applies a correction in case of outliers or extreme values (based on the neighboring data values).

Table 1. Descriptive statistics for fresh weight (FW) (g).

	All Plots						Plot 1
	M	MC	RT	RC	RR	I	M	MC	RT	RC	RR	I
Min	114.0	114.0	114.0	114.0	114.0	142.7	250.0	267.0	250.0	250.0	240.0	287.6
Q₁	475.8	479.5	529.0	529.0	529.0	535.7	454.5	459.3	523.3	523.3	454.3	529.1
Median	649.0	652.0	659.0	657.6	654.5	655.3	627.0	651.0	659.0	657.6	634.5	649.7
Q₃	834.5	844.0	770.0	770.0	790.5	783.9	793.5	790.3	747.5	747.5	790.5	759.9
Max	1423.0	1423.0	1423.0	1423.0	1423.0	1370.0	1337.0	1337.0	1337.0	1337.0	1337.0	1299.2
Mean	659.5	661.9	659.4	660.4	660.0	665.5	648.6	657.5	650.7	652.2	646.2	654.7
IQR	358.8	364.5	241.0	241.0	261.5	248.3	339.0	331.0	224.3	224.3	336.3	230.8
IQR/2	179.4	182.3	120.5	120.5	130.8	124.1	169.5	165.5	112.1	112.1	168.1	115.4
SD	254.5	260.3	224.6	225.5	227.3	182.7	239.5	245.1	213.2	214.2	237.5	178.3
CV	38.6	39.3	34.1	34.1	34.4	27.5	36.9	37.3	32.8	32.8	36.8	27.2
KURT	−0.1	−0.1	0.7	0.6	0.5	−0.1	0.0	0.1	0.7	0.6	−0.1	0.3
SKEW	0.3	0.3	0.4	0.4	0.4	0.3	0.5	0.6	0.6	0.6	0.5	0.4
Upper Outliers (%)	0.6	0.3	3.4	3.4	2.3	0.8	0.8	1.1	4.0	4.0	0.7	1.5
Lower Outliers (%)	0.0	0.0	0.9	0.9	0.5	0.0	0.0	0.0	0.0	0.0	0.0	0.0
For the Box
Q₂-Q₁	173.3	172.5	130.0	128.6	125.5	119.7	172.5	191.8	135.8	134.4	180.3	120.7
Q₃-Q₂	185.5	192.0	111.0	112.4	136.0	128.6	166.5	139.3	88.5	89.9	156.0	110.1
For the Whiskers
Q₃+1.5*IQR	1372.6	1390.8	1131.5	1131.5	1182.8	1156.4	1302.0	1286.8	1083.9	1083.9	1294.9	1106.1
Q₁−1.5*IQR	−62.4	−67.3	167.5	167.5	136.8	163.3	−54.0	−37.3	186.9	186.9	−50.1	182.8
Upper Whisker	1372.6	1390.8	1131.5	1131.5	1182.8	1156.4	1302.0	1286.8	1083.9	1083.9	1294.9	1106.1
Lower Whisker	114.0	114.0	167.5	167.5	136.8	163.3	250.0	267.0	250.0	250.0	240.0	287.6
W_upper-Q₃	538.1	546.8	361.5	361.5	392.3	372.4	508.5	496.5	336.4	336.4	504.4	346.2
Q₁-W_lower	361.8	365.5	361.5	361.5	392.3	372.4	204.5	192.3	273.3	273.3	214.3	241.5
	Plot 2						Plot 3
	M	MC	RT	RC	RR	I	M	MC	RT	RC	RR	I
Min	114.0	114.0	114.0	114.0	114.0	142.7	133.0	133.0	133.0	133.0	281.0	166.7
Q₁	420.0	428.8	489.0	489.0	475.8	487.8	564.0	543.5	611.0	582.6	593.5	613.3
Median	568.0	563.5	659.0	651.4	655.5	587.1	708.5	727.0	659.0	680.0	654.5	716.0
Q₃	756.0	737.8	680.3	717.8	859.3	731.0	930.0	930.0	891.3	891.3	720.3	857.6
Max	1352.0	1352.0	1352.0	1352.0	1423.0	1251.1	1423.0	1423.0	1423.0	1423.0	1150.0	1370.0
Mean	597.6	593.3	613.5	614.6	659.3	613.6	729.9	731.1	716.1	716.6	675.2	728.1
IQR	336.0	309.0	191.3	228.8	383.5	243.2	366.0	386.5	280.3	308.7	126.8	244.4
IQR/2	168.0	154.5	95.6	114.4	191.8	121.6	183.0	193.3	140.1	154.3	63.4	122.2
SD	248.7	251.9	215.4	216.8	276.7	179.2	259.8	266.2	234.7	235.1	145.2	171.9
CV	41.6	42.5	35.1	35.3	42.0	29.2	35.6	36.4	32.8	32.8	21.5	23.6
KURT	0.1	0.3	1.0	0.9	−0.1	0.2	−0.1	−0.1	0.6	0.6	2.3	0.1
SKEW	0.5	0.5	0.3	0.3	0.3	0.5	0.0	0.0	0.2	0.2	0.8	0.0
Upper Outliers (%)	0.9	2.0	5.3	4.0	0.0	1.0	0.0	0.0	2.1	1.4	6.3	0.3
Lower Outliers (%)	0.0	0.0	3.3	0.7	0.0	0.0	0.0	0.0	2.1	0.0	3.5	0.4
For the Box
Q₂-Q₁	148.0	134.8	170.0	162.4	179.8	99.4	144.5	183.5	48.0	97.4	61.0	102.7
Q₃-Q₂	188.0	174.3	21.3	66.4	203.8	143.8	221.5	203.0	232.3	211.3	65.8	141.6
For the Whiskers
Q₃+1.5*IQR	1260.0	1201.3	967.1	1061.0	1434.5	1095.8	1479.0	1509.8	1311.6	1354.2	910.4	1224.2
Q₁−1.5*IQR	−84.0	−34.8	202.1	145.8	−99.5	122.9	15.0	−36.3	190.6	119.6	403.4	246.8
Upper Whisker	1260.0	1201.3	967.1	1061.0	1423.0	1095.8	1423.0	1423.0	1311.6	1354.2	910.4	1224.2
Lower Whisker	114.0	114.0	202.1	145.8	114.0	142.7	133.0	133.0	190.6	133.0	403.4	246.8
W_upper-Q₃	504.0	463.5	286.9	343.2	563.8	364.8	493.0	493.0	420.4	463.0	190.1	366.5
Q₁-W_lower	306.0	314.8	286.9	343.2	361.8	345.1	431.0	410.5	420.4	449.6	190.1	366.5

Where: M: Measured (all data); MC: Measured cut (without the measurements from the upper part area—lines 20 to 25); RT: Missing values were replaced by the total average of all values; RC: Missing values were replaced by the average of the column, where the missing value belongs to; RR: Missing values were replaced by the average of the row, where the missing value belongs to; I: Interpolation; Q₁-Q₂-Q₃: Quartiles; IQR: Intra-Quadratic Range; SD: Standard Deviation; CV: Coefficient of Variation. Note: numbers in bold highlight the parameters of interest.

Table 2. Descriptive statistics for dry weight (DW) (g).

	All Plots						Plot 1
	M	MC	RT	RC	RR	I	M	MC	RT	RC	RR	I
Min	46.0	46.0	46.0	46.0	46.0	50.2	65.0	79.0	65.0	65.0	65.0	76.6
Q₁	146.5	147.0	165.0	165.0	165.0	168.1	150.0	163.3	168.5	168.5	150.0	171.9
Median	200.0	200.0	202.0	201.7	200.0	201.3	200.0	203.0	202.0	206.3	200.0	202.6
Q₃	249.0	253.5	233.0	233.0	238.0	236.2	240.5	241.8	231.5	231.5	243.5	229.7
Max	464.0	464.0	464.0	464.0	464.0	447.5	410.0	410.0	410.0	410.0	410.0	392.6
Mean	202.0	202.9	202.0	202.0	202.1	203.1	201.3	205.2	201.5	201.7	200.4	201.3
IQR	102.5	106.5	68.0	68.0	73.0	68.1	90.5	78.5	63.0	63.0	93.5	57.8
IQR/2	51.3	53.3	34.0	34.0	36.5	34.1	45.3	39.3	31.5	31.5	46.8	28.9
SD	75.8	78.2	66.5	66.7	67.3	53.2	65.3	67.1	58.1	58.4	65.9	45.7
CV	37.5	38.5	32.9	33.0	33.3	26.2	32.4	32.7	28.8	29.0	32.9	22.7
KURT	0.4	0.4	1.5	1.4	1.3	0.4	0.7	0.7	1.6	1.5	0.3	0.7
SKEW	0.5	0.5	0.5	0.5	0.5	0.3	0.5	0.6	0.5	0.5	0.4	0.3
Upper Outliers (%)	1.4	1.0	3.3	3.3	3.3	1.2	1.7	3.2	3.3	3.3	1.3	1.6
Lower Outliers (%)	0.0	0.0	1.6	1.6	1.3	0.3	0.0	0.0	0.7	0.7	0.0	0.0
For the Box
Q₂-Q₁	53.5	53.0	37.0	36.7	35.0	33.2	50.0	39.8	33.5	37.8	50.0	30.7
Q₃-Q₂	49.0	53.5	31.0	31.3	38.0	34.9	40.5	38.8	29.5	25.2	43.5	27.1
For the Whiskers
Q₃+1.5*IQR	402.8	413.3	335.0	335.0	347.5	338.4	376.3	359.5	326.0	326.0	383.8	316.4
Q₁−1.5*IQR	−7.3	−12.8	63.0	63.0	55.5	65.9	14.3	45.5	74.0	74.0	9.8	85.2
Upper Whisker	402.8	413.3	335.0	335.0	347.5	338.4	376.3	359.5	326.0	326.0	383.8	316.4
Lower Whisker	46.0	46.0	63.0	63.0	55.5	65.9	65.0	79.0	74.0	74.0	65.0	85.2
W_upper-Q₃	153.8	159.8	102.0	102.0	109.5	102.2	135.8	117.8	94.5	94.5	140.3	86.7
Q₁-W_lower	100.5	101.0	102.0	102.0	109.5	102.2	85.0	84.3	94.5	94.5	85.0	86.7
	Plot 2						Plot 3
	M	MC	RT	RC	RR	I	M	MC	RT	RC	RR	I
Min	48.0	48.0	48.0	48.0	46.0	50.2	46.0	46.0	46.0	46.0	84.0	54.2
Q₁	131.5	133.0	145.3	145.3	140.0	149.1	177.0	170.5	187.8	183.0	185.3	185.6
Median	170.0	169.5	202.0	192.0	192.0	186.1	220.0	220.0	202.0	206.3	203.8	213.7
Q₃	228.0	225.0	210.5	215.3	259.3	223.8	271.5	274.0	259.3	259.3	220.0	256.0
Max	436.0	436.0	436.0	436.0	464.0	398.6	464.0	464.0	464.0	464.0	362.0	447.5
Mean	180.4	179.2	186.0	185.8	201.5	188.6	223.5	223.5	218.6	218.6	204.4	219.4
IQR	96.5	92.0	65.3	70.0	119.3	74.7	94.5	103.5	71.5	76.3	34.8	70.3
IQR/2	48.3	46.0	32.6	35.0	59.6	37.3	47.3	51.8	35.8	38.1	17.4	35.2
SD	75.4	77.0	65.5	65.8	87.4	54.3	80.8	83.1	71.5	71.7	40.7	54.6
CV	41.8	43.0	35.2	35.4	43.4	28.8	36.1	37.2	32.7	32.8	19.9	24.9
KURT	0.5	0.7	1.4	1.3	0.2	0.0	0.4	0.3	1.3	1.3	3.5	0.5
SKEW	0.6	0.7	0.4	0.4	0.5	0.3	0.3	0.3	0.5	0.5	0.7	0.3
Upper Outliers (%)	1.8	2.0	3.3	2.0	1.3	0.5	1.7	1.9	3.3	3.3	4.7	1.0
Lower Outliers (%)	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.7	1.3	4.0	0.5
For the Box
Q₂-Q₁	38.5	36.5	56.8	46.8	52.0	37.0	43.0	49.5	14.3	23.3	18.5	28.0
Q₃-Q₂	58.0	55.5	8.5	23.3	67.3	37.7	51.5	54.0	57.3	53.0	16.2	42.3
For the Whiskers
Q₃+1.5*IQR	372.8	363.0	308.4	320.3	438.1	335.8	413.3	429.3	366.5	373.6	272.1	361.5
Q₁−1.5*IQR	−13.3	−5.0	47.4	40.2	−38.9	37.1	35.3	15.3	80.5	68.6	133.1	80.1
Upper Whisker	372.8	363.0	308.4	320.3	438.1	335.8	413.3	429.3	366.5	373.6	272.1	361.5
Lower Whisker	48.0	48.0	48.0	48.0	46.0	50.2	46.0	46.0	80.5	68.6	133.1	80.1
W_upper-Q₃	144.8	138.0	97.9	105.0	178.9	112.0	141.8	155.3	107.3	114.4	52.1	105.5
Q₁-W_lower	83.5	85.0	97.3	97.3	94.0	98.9	131.0	124.5	107.3	114.4	52.1	105.5

Where: M: Measured (all data); MC: Measured cut (without the measurements from the upper part area—lines 20 to 25); RT: Missing values were replaced by the total average of all values; RC: Missing values were replaced by the average of the column, where the missing value belongs to; RR: Missing values were replaced by the average of the row, where the missing value belongs to; I: Interpolation; Q₁-Q₂-Q₃: Quartiles; IQR: Intra-Quadratic Range; SD: Standard Deviation; CV: Coeffiecient of Variation. Note: numbers in bold highlight the parameters of interest.

Table 3. Descriptive statistics for ear weight (EW) (g).

	M	MC	RT	RC	RR	I	M	MC	RT	RC	RR	I
Min	19.0	19.0	19.0	19.0	19.0	32.0	40.0	40.0	40.0	40.0	40.0	52.2
Q₁	141.3	144.0	170.0	170.0	164.8	171.0	133.5	135.3	150.0	150.0	135.3	158.1
Median	221.0	220.0	227.0	227.5	222.1	225.0	209.0	208.5	227.0	208.9	208.5	216.1
Q₃	302.8	301.5	270.0	270.0	290.0	285.2	270.5	270.0	261.5	261.5	285.0	270.6
Max	572.0	572.0	572.0	572.0	572.0	562.3	572.0	572.0	572.0	572.0	572.0	562.3
Mean	226.9	226.5	226.9	227.9	227.4	230.3	217.3	216.6	219.3	219.7	218.0	221.1
IQR	161.5	157.5	100.0	100.0	125.3	114.3	137.0	134.8	111.5	111.5	149.8	112.5
IQR/2	80.8	78.8	50.0	50.0	62.6	57.1	68.5	67.4	55.8	55.8	74.9	56.3
SD	111.4	112.4	97.6	98.2	99.4	82.5	115.3	115.7	102.7	103.3	112.3	87.7
CV	49.1	49.6	43.0	43.1	43.7	35.8	53.1	53.4	46.8	47.0	51.5	39.7
KURT	−0.4	−0.4	0.4	0.3	0.2	−0.1	0.1	0.3	0.8	0.7	0.0	0.6
SKEW	0.3	0.3	0.3	0.3	0.3	0.3	0.7	0.7	0.7	0.6	0.6	0.6
Upper Outliers (%)	0.3	0.3	4.0	4.0	1.6	0.5	3.4	3.2	4.0	4.0	0.7	2.1
Lower Outliers (%)	0.0	0.0	0.4	0.4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
For the Box
Q₂-Q₁	79.8	76.0	57.0	57.5	57.3	54.0	75.5	73.3	77.0	58.9	73.3	58.0
Q₃-Q₂	81.8	81.5	43.0	42.5	67.9	60.2	61.5	61.5	34.5	52.6	76.5	54.5
For the Whiskers
Q₃+1.5*IQR	545.0	537.8	420.0	420.0	477.9	456.6	476.0	472.1	428.8	428.8	509.6	439.5
Q₁−1.5*IQR	−101.0	−92.3	20.0	20.0	−23.1	−0.4	−72.0	−66.9	−17.3	−17.3	−89.4	−10.7
Upper Whisker	545.0	537.8	420.0	420.0	477.9	456.6	476.0	472.1	428.8	428.8	509.6	439.5
Lower Whisker	19.0	19.0	20.0	20.0	19.0	32.0	40.0	40.0	40.0	40.0	40.0	52.2
W_upper-Q₃	242.3	236.3	150.0	150.0	187.9	171.4	205.5	202.1	167.3	167.3	224.6	168.8
Q₁-W_lower	122.3	125.0	150.0	150.0	145.8	139.0	93.5	95.3	110.0	110.0	95.3	105.9
	Plot 2						Plot 3
	M	MC	RT	RC	RR	I	M	MC	RT	RC	RR	I
Min	19.0	19.0	19.0	19.0	19.0	32.0	29.0	29.0	29.0	29.0	82.0	43.2
Q₁	131.5	131.3	154.3	154.3	144.0	154.5	183.5	181.0	205.5	205.1	195.8	210.0
Median	201.0	196.0	227.0	211.0	230.5	199.3	255.5	260.0	227.0	245.5	230.3	257.2
Q₃	270.5	260.8	249.8	255.8	301.8	258.0	336.3	336.5	316.8	316.8	270.8	318.2
Max	481.0	481.0	481.0	481.0	495.0	464.7	495.0	495.0	495.0	495.0	439.0	469.5
Mean	207.1	205.3	212.3	213.0	227.7	209.5	255.7	255.8	249.2	251.0	236.5	260.3
IQR	139.0	129.5	95.5	101.5	157.8	103.5	152.8	155.5	111.3	111.7	75.0	108.2
IQR/2	69.5	64.8	47.8	50.8	78.9	51.8	76.4	77.8	55.6	55.8	37.5	54.1
SD	103.7	103.9	89.5	90.4	112.3	75.9	109.5	112.0	97.0	97.1	65.9	74.5
CV	50.1	50.6	42.2	42.4	49.3	36.2	42.8	43.8	38.9	38.7	27.9	28.6
KURT	−0.1	0.1	0.8	0.6	−0.5	0.1	−0.6	−0.6	0.0	0.0	1.1	−0.4
SKEW	0.4	0.5	0.3	0.2	0.1	0.5	−0.2	−0.2	0.0	0.0	0.7	−0.2
Upper Outliers (%)	0.9	3.1	4.0	3.3	0.0	0.9	0.0	0.0	1.3	1.3	4.0	0.0
Lower Outliers (%)	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.7	2.7	0.7	0.1
For the Box
Q₂-Q₁	69.5	64.8	72.8	56.8	86.5	44.8	72.0	79.0	21.5	40.4	34.5	47.2
Q₃-Q₂	69.5	64.8	22.8	44.8	71.3	58.7	80.8	76.5	89.8	71.3	40.4	61.0
For the Whiskers
Q₃+1.5*IQR	479.0	455.0	393.0	408.0	538.4	413.3	565.4	569.8	483.6	484.2	383.2	480.5
Q₁−1.5*IQR	−77.0	−63.0	11.0	2.0	−92.6	−0.8	−45.6	−52.3	38.6	37.6	83.3	47.8
Upper Whisker	479.0	455.0	393.0	408.0	495.0	413.3	495.0	495.0	483.6	484.2	383.2	469.5
Lower Whisker	19.0	19.0	19.0	19.0	19.0	32.0	29.0	29.0	38.6	37.6	83.3	47.8
W_upper-Q₃	208.5	194.3	143.3	152.3	193.3	155.3	158.8	158.5	166.9	167.5	112.4	151.3
Q₁-W_lower	112.5	112.3	135.3	135.3	125.0	122.5	154.5	152.0	166.9	167.5	112.4	162.3

Where: M: Measured (all data); MC: Measured cut (without the measurements from the upper part area—lines 20 to 25); RT: Missing values were replaced by the total average of all values; RC: Missing values were replaced by the average of the column, where the missing value belongs to; RR: Missing values were replaced by the average of the row, where the missing value belongs to; I: Interpolation; Q₁-Q₂-Q₃: Quartiles; IQR: Intra-Quadratic Range; SD: Standard Deviation; CV: Coeffiecient of Variation. Note: numbers in bold highlight the parameters of interest.

Table 4. Difference (Diff) in CV values (%) between the measured data (all data) (M) and those values derived from the sampling methods: MC, RT, RC, RR, and I, where: M: Measured (all data); MC: Measured cut (without the measurements from the upper part area—lines 20 to 25); RT: Missing values were replaced by the total average of all values; RC: Missing values were replaced by the average of the column, where the missing value belongs to; RR: Missing values were replaced by the average of the row, where the missing value belongs to; I: Interpolation; FW: Fresh weight; DW: Dry weight; EW: Ear weight. Note: numbers in bold highlight the parameters of interest.

All Plots	M	MC	MC⁺ Diff% from M	RT	RT Diff% from M	RC	RC Diff% from M	RR	RR Diff% from M	I	I Diff% from M
FW	38.6	39.3	1.9	34.1	11.7	34.1	11.5	34.4	10.8	27.5	28.9
DW	37.5	38.5	2.6	32.9	12.3	33.0	12.0	33.3	11.3	26.2	30.2
EW	49.1	49.6	1.1	43.0	12.4	43.1	12.2	43.7	11.0	35.8	27.0
Plot 1	M	MC	MC Diff% from M	RT	RT Diff% from M	RC	RC Diff% from M	RR	RR Diff% from M	I	I Diff% from M
FW	36.9	37.3	1.0	32.8	11.3	32.8	11.0	36.8	0.5	27.2	26.3
DW	32.4	32.7	0.8	28.8	11.1	29.0	10.7	32.9	1.4	22.7	30.0
EW	53.1	53.4	0.7	46.8	11.8	47.0	11.4	51.5	2.9	39.7	25.2
Plot 2	M	MC	MC Diff% from M	RT	RT Diff% from M	RC	RC Diff% from M	RR	RR Diff% from M	I	I Diff% from M
FW	41.6	42.5	2.0	35.1	15.6	35.3	15.2	42.0	0.8	29.2	29.8
DW	41.8	43.0	2.7	35.2	15.8	35.4	15.3	43.4	3.7	28.8	31.1
EW	50.1	50.6	1.1	42.2	15.8	42.4	15.3	49.3	1.4	36.2	27.6
Plot 3	M	MC	MC Diff% from M	RT	RT Diff% from M	RC	RC Diff% from M	RR	RR Diff% from M	I	I Diff% from M
FW	35.6	36.4	2.3	32.8	7.9	32.8	7.8	21.5	39.6	23.6	33.7
DW	36.1	37.2	2.9	32.7	9.5	32.8	9.2	19.9	44.9	24.9	31.1
EW	42.8	43.8	2.2	38.9	9.2	38.7	9.7	27.9	34.9	28.6	33.2

Where Diff% = ((CV(MC or RT or RC or I) − CV(M))/(CV(MC or RT or RC or I)))*100.

Table 5. Difference (Diff) in mean values (%) between the measured data (all data) (M) and the estimated mean values by the other sampling methods: MC, RT, RC, RR, and I, where: M: Measured (all data); MC: Measured cut (without the measurements from the upper part area—lines 21 to 25); RT: Missing values were replaced by the total average of all values; RC: Missing values were replaced by the average of the column, where the missing value belongs to; RR: Missing values were replaced by the average of the row, where the missing value belongs to; I: Interpolation; FW: Fresh weight; DW: Dry weight; EW: Ear weight. Note: numbers in bold highlight the parameters of interest.

All Plots	M	MC	Diff%	RT	Diff%	RC	Diff%	RR	Diff%	I	Diff%
FW	659.5	661.9	0.4	659.4	0.0	660.4	0.1	660.0	0.1	665.5	0.9
DW	202.0	202.9	0.4	202.0	0.0	202.0	0.0	202.1	0.0	203.1	0.5
EW	226.9	226.5	−0.2	226.9	0.0	227.9	0.5	227.4	0.2	230.3	1.5
Plot 1	M	MC	Diff%	RT	Diff%	RC	Diff%	RR	Diff%	I	Diff%
FW	648.6	657.5	1.4	650.7	0.3	652.2	0.6	646.2	−0.4	654.7	0.9
DW	201.3	205.2	1.9	201.5	0.1	201.7	0.2	200.4	−0.4	201.3	0.0
EW	217.3	216.6	−0.3	219.3	0.9	219.7	1.1	218.0	0.4	221.1	1.8
Plot 2	M	MC	Diff%	RT	Diff%	RC	Diff%	RR	Diff%	I	Diff%
FW	597.6	593.3	−0.7	613.5	2.7	614.6	2.9	659.3	10.3	613.6	2.7
DW	180.4	179.2	−0.7	186.0	3.1	185.8	3.0	201.5	11.7	188.6	4.5
EW	207.1	205.3	−0.9	212.3	2.5	213.0	2.9	227.7	9.9	209.5	1.1
Plot 3	M	MC	Diff%	RT	Diff%	RC	Diff%	RR	Diff%	I	Diff%
FW	729.9	731.1	0.2	716.1	−1.9	716.6	−1.8	675.2	−7.5	728.1	−0.2
DW	223.5	223.5	0.0	218.6	−2.2	218.6	−2.2	204.4	−8.5	219.4	−1.8
EW	255.7	255.8	0.1	249.2	−2.5	251.0	−1.8	236.5	−7.5	260.3	1.8

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Koutsos, T.M.; Menexes, G.C.; Eleftherohorinos, I.G. The Use of Spatial Interpolation to Improve the Quality of Corn Silage Data in Case of Presence of Extreme or Missing Values. ISPRS Int. J. Geo-Inf. 2022, 11, 153. https://doi.org/10.3390/ijgi11030153

AMA Style

Koutsos TM, Menexes GC, Eleftherohorinos IG. The Use of Spatial Interpolation to Improve the Quality of Corn Silage Data in Case of Presence of Extreme or Missing Values. ISPRS International Journal of Geo-Information. 2022; 11(3):153. https://doi.org/10.3390/ijgi11030153

Chicago/Turabian Style

Koutsos, Thomas M., Georgios C. Menexes, and Ilias G. Eleftherohorinos. 2022. "The Use of Spatial Interpolation to Improve the Quality of Corn Silage Data in Case of Presence of Extreme or Missing Values" ISPRS International Journal of Geo-Information 11, no. 3: 153. https://doi.org/10.3390/ijgi11030153

APA Style

Koutsos, T. M., Menexes, G. C., & Eleftherohorinos, I. G. (2022). The Use of Spatial Interpolation to Improve the Quality of Corn Silage Data in Case of Presence of Extreme or Missing Values. ISPRS International Journal of Geo-Information, 11(3), 153. https://doi.org/10.3390/ijgi11030153

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Use of Spatial Interpolation to Improve the Quality of Corn Silage Data in Case of Presence of Extreme or Missing Values

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI