Next Article in Journal
A Predator–Prey Two-Sex Branching Process
Next Article in Special Issue
Feasibility of Automatic Seed Generation Applied to Cardiac MRI Image Analysis
Previous Article in Journal
Finite Element Solution of the Corona Discharge of Wire-Duct Electrostatic Precipitators at High Temperatures—Numerical Computation and Experimental Verification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Soil Temperature Estimation with Meteorological Parameters by Using Tree-Based Hybrid Data Mining Models

1
Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz 51666, Iran
2
Institute of Research and Development, Duy Tan University, Danang 550000, Vietnam
3
Department of Agricultural Engineering, Faculty of Agriculture, Ankara University, 06110 Ankara, Turkey
4
Department of Electrical Engineering, Electronics and Computer Science, Technical University of Cluj-Napoca, North University Center of Baia Mare, 400114 Cluj-Napoca, Romania
*
Authors to whom correspondence should be addressed.
Mathematics 2020, 8(9), 1407; https://doi.org/10.3390/math8091407
Submission received: 2 July 2020 / Revised: 10 August 2020 / Accepted: 16 August 2020 / Published: 21 August 2020
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

Abstract

:
The temperature of the soil at different depths is one of the most important factors used in different disciplines, such as hydrology, soil science, civil engineering, construction, geotechnology, ecology, meteorology, agriculture, and environmental studies. In addition to physical and spatial variables, meteorological elements are also effective in changing soil temperatures at different depths. The use of machine-learning models is increasing day by day in many complex and nonlinear branches of science. These data-driven models seek solutions to complex and nonlinear problems using data observed in the past. In this research, decision tree (DT), gradient boosted trees (GBT), and hybrid DT–GBT models were used to estimate soil temperature. The soil temperatures at 5, 10, and 20 cm depths were estimated using the daily minimum, maximum, and mean temperature; sunshine intensity and duration, and precipitation data measured between 1993 and 2018 at Divrigi station in Sivas province in Turkey. To predict the soil temperature at different depths, the time windowing technique was used on the input data. According to the results, hybrid DT–GBT, GBT, and DT methods estimated the soil temperature at 5 cm depth the most successfully, respectively. However, the best estimate was obtained with the DT model at soil depths of 10 and 20 cm. According to the results of the research, the accuracy rate of the models has also increased with increasing soil depth. In the prediction of soil temperature, sunshine duration and air temperature were determined as the most important factors and precipitation was the most insignificant meteorological variable. According to the evaluation criteria, such as Nash-Sutcliffe coefficient, R, MAE, RMSE, and Taylor diagrams used, it is recommended that all three (DT, GBT, and hybrid DT–GBT) data-based models can be used for predicting soil temperature.

1. Introduction

Determination of the temperature in different soil depth is important in terms of planning in many disciplines and engineering fields. It is a parameter that needs to be known or predicted in different fields, such as hydrology, soil science, construction, geotechnology, ecology, meteorology, agriculture, and environmental studies. Frost forecasting in the soil is also important in terms of operating these projects and determining the working season in drinking and agricultural water networks, oil and natural gas distribution networks. In addition, it is necessary to know the soil temperature in the heating and cooling of buildings, solar applications in areas, such as urbanism and construction. However, it is also a very important variable in evaluating the thermal performance of the upper soil temperature of the buildings and estimating the temperature change from the earth to the air.
Soil temperature is one of the important factors in all the events, such as the presence, movement, evaporation, microbiological activity, aeration, and vegetative activity in the inner layers of the soil. Various plant species and their growth are dependent on soil temperature at different depths and soil temperature affects the vegetative growth and yield performance of the plant. Soil temperature varies with the effect of other meteorological variables and especially air temperature. Recently, upper soil temperature may also increase with the increase in air temperatures as a result of global warming.
Daily changes in soil temperature directly affect all of the biological and chemical processes occurring in the soil [1]. Energy is needed in chemical and biological events in the soil. If there is not enough temperature, especially the biological ones of these events cannot continue at a suitable level. Therefore, soil temperature is a vital agro-meteorological factor. For example, nitrification starts when the soil temperature rises above 4.5 °C and continues at the most favorable level at 27–30 °C [2]. Like the release of nitrogen or carbon dioxide, nutrient mineralization of plants also depend on soil temperature [3].
Soil temperature affecting nutrient diffusion in the soil also affects the rate of organic matter in plants. Soil temperature has a significant effect on the functions of plant root, such as water absorption and translocation. As in tropical climates, high soil temperature causes seedling deaths—the plants are small and the plants consume too much water—as well as a wide variety of plant diseases [4].
It may be thought that the models that can make the soil temperature prediction correctly will be beneficial for many areas because the soil temperature that is the subject of the study is so important. Although the studies that predict the soil temperature have increased especially in recent years, these studies are less than the prediction studies of other meteorological parameters, such as temperature, wind, global solar radiation, or precipitation. When studies on soil temperature prediction are examined, it is seen that mostly statistical analysis methods, such as regression and moving averages techniques or artificial neural networks are preferred [5,6,7,8].
In recent years, multiple, accurate, and continuous measurements are made in all branches of science and a large number of data are recorded. At the same time, there have been improvements in computer, software, internet access, and online measurement. In these conditions, regardless of the complex physical structures of the events, it is aimed to make predictions with data-based models like decision tree (DT). Data-based models try to learn the structure of the system by using the historical input and output data previously observed. Then, the test is done on the trained system, and the success rate of the model is calculated [9].
Currently, data-based models have been applied in many events related to hydrology and meteorology. For example, in the simulation of inflow to the reservoir for hydroelectric or irrigation purposes [10,11], estimation and comparison of air temperatures [12], seasonal and annual drought forecast [13], rainfall–runoff forecasting [14], prediction of long-term maximum precipitation [15], groundwater level prediction [16], obtaining reservoir operation rules [17], and class A pan evaporation estimation [18].
Zounemat-Kermani [19] estimated the soil temperature with artificial neural networks in daily and weekly time periods. Three meteorological parameters (air temperature, radiation, and relative humidity) and two hydrological variables (precipitation and flow) were taken as input. It has been observed that artificial neural networks are more successful in soil temperature estimation than multiple linear regression methods. Aslay and Ozen [20] estimated soil temperature at different depths at 88 stations in Turkey using artificial neural networks. Meteorological parameters were taken as the input of the model, and the monthly average soil temperatures of the next year were successfully estimated. Hosseinzadeh [21] successfully predicted the soil temperature in arid and semi-arid regions in Iran with the coactive neuro-fuzzy inference system method. They used average, minimum, and maximum air temperature; relative humidity; sunshine duration, and solar radiation as model inputs in modeling. Kim et al. [22] estimated the soil temperature by MLP-ANN and ANFIS methods. In the study, they used different meteorological parameters as model inputs and obtained successful results. Yener et al. [23] investigated the effect of meteorological parameters on soil temperature in Turkey. It has been observed that soil temperature values are affected by various parameters, such as thermal conductivity, short-term climatic conditions, and humidity. Sattari et al. [10] estimated the soil temperature for different depths in an agricultural region of Iran’s Isfahan province with the help of meteorological parameters. They made successful predictions based on artificial border networks and using the M5 tree model. Samadianfard et al. [24] successfully predicted the daily average soil temperature in Tabriz in Iran with wavelet artificial neural networks and gene expression programming methods. According to the results of the study, it was seen that air temperature, sunshine duration, and radiation parameters were the most important factors on soil temperature. Feng et al. [25] estimated the soil temperature at various depths in the half-hour period in China using meteorological variables, such as wind speed, air temperature, relative humidity, solar radiation, and vapor pressure deficit, and four machine-learning models. Among the models used, the extreme learning machine method was found to be much more successful than artificial neural networks and random forest approaches. Costache et al. [26] successfully used the gradient boosting trees (GBT) and multilayer perceptron (MLP) method to evaluate the flood potential and to predict flood sensitive areas in the Trotus river basin in Romania. Matei et al. [27,28] and Anton et al. [29] used various techniques, such as collaborative or context-aware data mining, for predicting the soil moisture in Transylvania, Romania. Wu et al. [30] used the gradient boosting decision tree (GBDT) algorithm to predict urban floods in Zhengzhou City. In modeling, factors, such as amount of precipitation, duration, intensity, evaporation, land use, permeability, water collection area, and slope, were used.
The aim of this study is to estimate soil temperature at depths of 5, 10, and 20 cm using DT and GBT methods in Divrigi meteorology station in Sivas province in Turkey and compare the results with the proposed GBT–DT hybrid (hybrid DT–GBT) methods. In the study, the effect of meteorological variables on soil temperature will be investigated by using different input combinations.

2. Materials and Methods

2.1. Material

This study was carried out using values measured at the weather station located in Turkey’s Sivas Divrigi district (Figure 1). 27,202 km2 area of Sivas province of Turkey’s 2nd largest province is 66.5% of the active population in the agricultural sector. The province is an important vegetative production center offering a wide variety of agricultural products depending on the presence of a large agricultural land and microclimate agricultural basin. 41% of its land is suitable for agriculture, 27% is pasture, 13% is forest and shrubbery, and 19% is non-agricultural areas. According to the 2018 cultivation areas in Sivas, oats are the first, second is trefoil, third is wheat, sixth is alfalfa, seventh is sugar beet, and eighth is potato agriculture in the country [31,32].
Daily data measured in Turkish State Meteorological Service Sivas Divrigi station between 15 September 2009 and 31 December 2018 were used in the study. Measurements in the meteorological stations operated by the State Meteorological Service in Turkey are conducted according to standards set by the World Meteorological Organization. Measurements made manually in previous years are now made through automatic stations. Automatic meteorology stations consist of sensors sensitive to changes in meteorological parameters and measuring the amount of these changes. These stations have the main (central) processing unit that makes the necessary calculations to convert the measurements obtained by the sensors into meteorological information, the display units that enable the information to be displayed, and the communication units that enable the information to be transmitted to the center. The station also has a data acquisition unit, communication interface, and power supply [33,34].
Basic statistics about the data used are given in Table 1. Soil temperature values at a depth of 5 cm vary greatly compared to soil temperature values of 10 cm and 20 cm. The daily change of the average soil temperature at different depths throughout the year is given in Figure 2. In a sense, the change between the minimum and maximum temperature values is high. The testing was performed using the 70–30 report between training and test data. Data were split chronologically. Initial data had 3395 records. It was used in two separate repositories: the first 70% in the Training Data repository, between September 2009 and March 2016, was used to train the model, while the next 30% part, from March 2016 until December 2018, in the Test Data repository was used for validating it. These two repositories were used in all the created processes. We evaluated the best method and scenario for each of the proposed algorithms in order to implement and run a process that covered all the decided scenarios.

2.2. Methods

The data mining processes were implemented in Rapid Miner Studio (version 9.4–Educational Edition, RapidMiner Inc., Boston, MA, USA). It is a tool that provides a comprehensive set of operators and offers easy to use and understand structures for modelling complex data mining processes [35]. The machine-learning algorithms used for predicting the soil temperature are described below.

2.2.1. Gradient Boosted Trees (GBT)

Gradient boosted trees consists of an ensemble of regression/classification tree models. In the scenarios that we want to test, it is used for regression. According to Freund and Schapire [36], regression GBT is a generalization of boosting to arbitrary differentiable loss functions. These are learned in a sequential manner by a forward stagewise procedure [37]. The GBT implementation in Rapid Miner uses the H2O 3.8.2.6 algorithm. This follows the algorithm that was specified by Hastie et al. [38].

2.2.2. Decision Trees (DT)

Decision trees (DT)—a tree like a collection of nodes used to predict the affiliation to a class or an estimate of a numerical target value. Each node corresponds to a splitting rule for one specific attribute. This is a simple and widely used method in data mining [39].
The output of the model is a tree model, which is later used for prediction. The minimization of the sum of squares is used as a criterion.
As Hastie et al. [38] specified, the tree size will influence the resulted model complexity and the optimal size of the tree should be adaptively chosen. The correspondence for the tree size in Rapid Miner is “maximal depth” for which we tried different values in the optimization part.

2.2.3. Hybrid DT–GBT

The proposed hybrid DT–GBT approach uses the vote operator capabilities offered by Rapid Miner. It is a nested operator, meaning it has a subprocess. It also requires at least two learners, called base learners.
For classification, this operator uses a majority vote, while for regression it uses the average on top of the predictions of the base learners provided in the subprocess. For classification, all the operators in the subprocess accept the given dataset and generate a classification model. For predicting an unknown example, this operator applies all the classification models from its subprocess and assigns the predicted class with maximum votes to the unknown example.
In case of regression, all the operators in the subprocess of the vote operator accept the given dataset and generate a regression model. In the proposed hybrid DT–GBT approach, GBT and DT are included in the subprocess and are considered base learners. To predict an unknown value, the operator uses the average on top of the predictions of the base learners defined.

2.2.4. Metrics Performed for Evaluation

Five different well-known metrics calculated for evaluating the models (Equations (1) and (2)).
  • Root mean squared error (RMSE)—the standard deviation of the residuals (prediction errors).
  • Pearson correlation coefficient (r)—used to obtain the strength and direction of the linear relationship between the predicted value and observed value for the soil temperature.
  • Mean absolute error (MAE)—it is commonly used in forecasting time series.
  • Nash–Sutcliffe coefficient (NS)—used to describe the accuracy of model outputs:
    NS = 1 i = 1 n ( p i d i ) 2 i = 1 n ( p i p ¯ ) 2 ,
    where n is the number of outputs, pi is the i-th predicted output, and di is the i-th desired observed output [40,41].
  • Kling–Gupta efficiency (KGE)—first introduced by Gupta et al. [42] as an improvement to the Nash–Sutcliffe efficiency. It facilitates the separate analysis of the relative importance of correlation, bias, and variability in the process of hydrological modelling.
    KGE = 1 ( r 1 ) 2 + ( σ sim σ obs 1 ) 2 + ( μ sim μ obs 1 ) 2 ,
    where r is the linear correlation between observed and predicted values, σobs is the standard deviation in observations, σsim the standard deviation in simulations, μsim the simulation mean, and μobs the observation mean.

2.2.5. Parameter Setup

To predict the soil temperature at different depths, the time windowing technique was used on the input data. Windowing is used to split time series into input vectors. A time series is a set of measurements performed on a specific process that are registered sequentially in time. As Koskela et al. [43] point out, by using the windowing technique, the problem is translated into deciding the length and type of the window to be used.

2.2.6. Scenarios and Implementation

In the study, 8 different input scenarios were taken into account to determine the meteorological variables that have the most impact on soil temperature and to evaluate the predictive power of the prediction models to be used based on these variables. The scenarios in Table 2 are based on the physics of soil temperature change and a literature search.
For validating the best combination for the machine-learning algorithms, a particularization of the configurable scenarios platform for designing prediction models, described in Avram et al. [44] and Avram et al. [45] was used, if the platform was thought to be general enough to support collaborative and context-aware data mining. As Anton et al. [46] specify, context-aware data mining respects the same steps as classical data mining, just that it includes real-time context in the data mining process, while the collaborative scenario involves having the data of the studied source completed with data taken from similar sources (for example one or more locations in close proximity to the studied one). In the current research, the focus was on the classical data mining approach, applied in the DT, GBT, and hybrid DT–GBT methods.
Below are the steps describing the modelled process behind each machine-learning method. Since there were three chosen models: DT, GBT, and hybrid DT–GBT, there were 3 Rapid Miner processes, following the presented structure:
  • load training data;
  • load testing data;
  • load test scenarios;
  • for each test scenario in the list:
    establish predicted value as specified in the scenario;
    select only attributes specified;
    generate model on the training data using windowing;
    apply generated model on the test data;
    store results.
  • aggregate results.
The aggregated results were then subject to analysis, and conclusions were drawn based on these.

3. Results

To predict the soil temperature at different depths (5 cm, 10 cm, and 20 cm), the machine-learning algorithms were trained using windows of previous days. For establishing the best values for the window size, the values 3, 5, and 7 were tested in the beginning of the experiments. Table 3 presents the RMSE (°C) measured values per each algorithm used. It can be observed that the best results were obtained when using a window of 3 previous days, while increasing the number of days in the window did not improve the results.
Table 4 presents the obtained results for different maximal depth values. We used in the experiments the maximal depth of 10 for the decision tree algorithm applied. For a maximal depth higher than 10, the overall accuracy of the predictions starts to decrease.
Table 5 depicts the results obtained for the combinations tested for GBT on maximal depth and no. of trees. After this phase, the combination 200 trees and 20 as maximal depth was further used in the experiments. For the hybrid DT–GBT approach, the best obtained parameters were used for each algorithm.
In the study, the performance of the models and input scenarios used to estimate the temperature at different soil depths were determined. 70% of all data used in the study were used for training of models and the remaining 30% were used for testing.
RMSE (°C) was computed for all scenarios and algorithms chosen, as seen in Table 6. The results with the lowest RMSE were considered as best scenario combinations and analyzed in more details.
Seen in Table 7, which is only for best selected scenario for each depth given, the DT model was able to predict the soil temperature at a depth of 20, 10, and 5 cm, respectively. The soil temperature at a depth of 5 cm is predicted with a relatively high accuracy and low error (NS = 0.9669, KGE = 0.957, R = 0.9833, MAE = 1.4533 and RMSE = 2.0188). Soil temperature at a depth of 5 cm was more affected by the parameters of Sunshine Intensity and Sunshine Duration than other variables.
In Table 7, the soil temperature at 10 and 20 cm depth was mostly affected by MinT-MaxT-MeanT-Sunshine Duration parameters. The DT model had high accuracy and low error in soil temperature at 10 and 20 cm depth (ST10: NS = 0.9846, KGE = 0.989, R = 0.9922, MAE = 0.9564, RMSE = 1.3165 and ST20: NS = 0.9942, KGE = 0.995, R = 0.9971, MAE = 0.5171, RMSE = 0.7368). As the depth increases according to the evaluation criteria, the accuracy rate of the model has increased, and the margin of error has decreased.
Time series and scatter plots for all three depths are given in Figure 3. The DT model has successfully estimated the soil temperature at different depths.
In Table 8, the performance of the inputs and scenarios that give the best results for 5, 10, 20 cm soil depths according to the GBT model is given. The best estimates in GBT method were for 20, 10, and 5 cm depths, respectively, as in the DT method.
In Table 8, it is sufficient to use the MeanT variable as an input to determine the temperature at a depth of 5 cm (NS = 0.9446, KGE = 0.857, R = 0.9793, MAE = 1.9144, RMSE = 2.6109). However, the input scenario consisting of four variables (MinT-MaxT-MeanT-Sunshine Duration) gave the best results for 10 and 20 cm soil depth. As seen in Table 8, the best results are 10 cm deep (NS = 0.9658, KGE = 0.861, R = 0.9915, MAE = 1.5442, RMSE = 1.9554) and 20 cm deep (NS = 0.9713, KGE = 0.866, R = 0.9939, MAE = 1.2689, RMSE = 1.6389). The success rate of the model increased as the depth of the soil increased in the GBT method.
According to the results of the GBT model, the time series and scatter plots for all three depths are given in Figure 4. A very high level of agreement was achieved between the values predicted from the GBT model and the observed values at all depths except for a few days.
In Table 9, the performance of the input scenarios that give the best results for temperatures at 5, 10, and 20 cm soil depths according to the DT–GBT hybrid model is given.
In Table 9, the best result was obtained when the temperature of 5 cm soil depth was taken as the input of the MeanT variable only (NS = 0.9642, KGE = 0.921, R = 0.9839, MAE = 1.5358, RMSE = 2.1007). The input scenario consisting of two variables (MeanT-Sunshine Duration) for a depth of 10 cm gave the best results. For 20 cm depth, the input scenario consisting of four variables (MinT-MaxT-MeanT-Sunshine Duration) gave the best results. Seen in Table 9, at 10 cm deep NS = 0.9817, KGE = 0.922, R = 0.9934, MAE = 1.1025, RMSE = 1.4334 and at 20 cm deep NS = 0.9890, KGE = 0.930, R = 0.9968, MAE = 0.7779, RMSE = 1.0121 values were obtained. The results seen in Table 9 show that, as soil depth increases, the accuracy rate of the model also increases.
According to the DT–GBT hybrid model results, time series graphics and scatter plots for all three depths are given in Figure 5. Except for a few days, especially at 10 cm and 20 cm depths, a very high agreement was observed between the values estimated from the DT–GBT hybrid model and the observed values.
The methods used in the continuation of the study were compared with each other for different depths.
The performance of the methods in test period for a depth of 5 cm is given in Table 10. In Table 10, the basic statistical values of the three different methods in the best successful scenarios can be compared with the measured values. The results obtained from the methods used are in the second, third, and fourth columns; in the last column, the figures for the measured values are given. The DT–GBT hybrid method with 5 cm depth in terms of R value gave more accurate results than other methods (R = 0.9954). However, DT was accurate in terms of minimum, maximum, and standard deviation values; in terms of mean value, it is seen that the results of GBT method are close to the observed temperature values. In general, it has been proved that all three methods can predict accurate soil temperature at a depth of 5 cm.
The performance of the methods in the test period for the prediction of the soil temperature 10 cm deep is given in Table 11. In terms of R value, the DT method with a depth of 10 cm was more accurate than other methods (R = 0.9983). At the same time, the DT method results are very close to observed temperature values in terms of minimum, maximum, mean, and standard deviation values. In this case, it was proved that all three methods successfully predicted soil temperature at 10 cm deep.
The performance of the methods in test period for a depth of 20 cm is given in Table 12. In terms of the R value, DT method with a depth of 20 cm showed better results with a little difference compared to other methods (R = 0.9994). At the same time, the DT method results are very close to observed temperature values in terms of minimum, maximum, mean, and standard deviation values. The DT method is very closely followed by the hybrid DT–GBT method. In this case, it was proved that all three methods successfully predicted soil temperature at 20 cm deep.
It can be understood from Table 10, Table 11 and Table 12 that the sunshine duration affects the soil temperature, especially at 10 and 20 cm depth compared to other meteorological variables. It is seen that the sunshine duration time variable is the most important variable, since it causes the soil to heat. After the sunshine duration meteorological variable, it is seen that it plays an important role in soil warming, especially at 5 cm depth, in other variables that express the air temperature.
The performance of the models used for different depths is given visually as a Taylor diagram in Figure 6. As can be seen from Figure 6a, hybrid DT–GBT, GBT, and DT methods have best predicted soil temperature at 5 cm soil depth, respectively. As seen in Figure 6b,c, the best results in soil temperature estimation at 10 and 20 cm soil depths were obtained with DT, hybrid DT–GBT, and GBT methods, respectively.

4. Discussion

Estimation of soil temperature is one of the most important factors in the management of economic activities, such as agriculture and construction and agricultural insurance. Soil temperature is a factor that depends on meteorological variables and can be measured at meteorological stations, but it requires a relatively high cost, with expert staff.
Unfortunately, many meteorological parameters have measured at only one location in Turkey’s district, such as Divrigi, except metropolitan areas. The transferability of the data mining model trained at a single point is likely to be low. However, the altitude change is not very high in the district, and there are no other long-term measuring stations. Naturally, the results obtained here cannot be generalized for other regions and other conditions.
The evaluation criteria were taken into account in the selection of the best scenario and the best model. The accuracy rate obtained under these operating conditions is quite good (NS: 0.9446–0.9942, KGE: 0.857–0.995, R: 0.9793–0.9971). Accuracy rate in all data-based models can be increased by discovering hidden patterns and minimizing the noise in the data. It is possible to make data smoother and more predictable with data preprocessing. With various preprocessing and filtering methods, the stochastic feature among the data can be reduced, and the accuracy of the model can be increased.
Using data-based models, soil temperature can be estimated at different depths with meteorological variables measured in the past. In this study, the performance of the hybrid DT–GBT method developed with DT and GBT methods in estimating soil temperature at different depths was compared. While estimating the soil temperature, the meteorological variables associated with the temperature were considered as input scenarios in eight different combinations. According to the results, the hybrid DT–GBT, GBT, and DT methods were best predicted at 5 cm soil depth, respectively. In 10 and 20 cm soil depths, the best estimate was obtained by DT, hybrid DT–GBT, and GBT models, respectively. At the same time, it was observed that the accuracy rate of the models increased with increasing soil depth. It was observed that the sunshine duration was the most important meteorological variable for soil temperature at 10 and 20 cm depth and the air temperature was the most important at 5 cm soil depth. It was observed that precipitation was ineffective on soil temperature in all models and at all depths. As a result, the DT, GBT, and hybrid DT–GBT models have been used successfully for predicting soil temperature.
Soil temperature is important for plant root development and the activity of microorganisms. It is not possible to measure this temperature at different depths, especially in the field conditions where vegetative production is made, because it is a costly process that requires equipment and expert staff. However, if successful models can be established for different regions and conditions, the soil temperature can be predicted without the need for land measurement, equipment, or labor. These predictions can assist in agricultural soil, fertilizer, and water resources management. Although three different artificial intelligence methods were used in this study, we did not have the chance to test them in different climatic and regional conditions. It cannot be generalized that the proposed model that makes the best estimates will be valid in all conditions, but it has been concluded that the methods can be used in the estimation of the soil temperature due to the successful results.

Author Contributions

Conceptualization, O.M. and M.T.S.; methodology, A.A. and M.T.S.; software, A.A.; validation, A.A., M.T.S., and H.A.; formal analysis, A.A.; investigation, A.A.; resources, H.A.; data curation, H.A.; writing—original draft preparation, M.T.S. and A.A.; writing—review and editing, M.T.S. and O.M.; visualization, A.A.; supervision, O.M. and M.T.S.; funding acquisition, O.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work has received funding from the CHIST-ERA BDSI BIG-SMART-LOG and UEFISCDI COFUND-CHIST-ERA-BIG-SMART-LOG Agreement no. 100/01.06.2019.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bond-Lamberty, B.; Wang, C.; Gower, S.T. Spatiotemporal measurement and modeling of stand-level boreal forest soil temperatures. Agric. For. Meteorol. 2005, 131, 27–40. [Google Scholar] [CrossRef]
  2. Buckman, H.O.; Brady, N.C. The Nature and Properties of Soils, 6th ed.; The Mac Millian Co.: New York, NY, USA, 1960. [Google Scholar]
  3. Seyfried, M.S.; Flerchinger, G.N.; Murdock, M.D.; Hanson, C.L.; Van Vactor, S. Long-Term Soil Temperature Database, Reynolds Creek Experimental Watershed, Idaho, United States. Water Resour. Res. 2001, 37, 2843–2846. [Google Scholar] [CrossRef]
  4. Tenge, A.; Kaihura, F.B.; Lal, R.; Singh, B. Diurnal soil temperature fluctuations for different erosion classes of an oxisol at Mlingano, Tanzania. Soil Tillage Res. 1998, 49, 211–217. [Google Scholar] [CrossRef]
  5. Zheng, D.; Hunt, E.; Running, S. A daily soil temperature model based on air temperature and precipitation for continental applications. Clim. Res. 1993, 2, 183–191. [Google Scholar] [CrossRef]
  6. Yang, C.-C.; Prasher, S.O.; Mehuys, G.R.; Patni, N.K. Application of artificial neural networks for simulation of soil temperature. Trans. ASAE 1997, 40, 649–656. [Google Scholar] [CrossRef]
  7. Paul, K.I.; Polglase, P.J.; Smethurst, P.J.; O’Connell, A.M.; Carlyle, C.J.; Khanna, P.K. Soil temperature under forests: A simple model for predicting soil temperature under a range of forest types. Agric. For. Meteorol. 2004, 121, 167–182. [Google Scholar] [CrossRef]
  8. Bilgili, M. Prediction of soil temperature using regression and artificial neural network models. Meteorol. Atmos. Phys. 2010, 110, 59–70. [Google Scholar] [CrossRef]
  9. Sattari, M.T.; Apaydin, H.; Shamshirband, S. Performance Evaluation of Deep Learning-Based Gated Recurrent Units (GRUs) and Tree-Based Models for Estimating ETo by Using Limited Meteorological Variables. Mathematics 2020, 8, 972. [Google Scholar] [CrossRef]
  10. Sattari, M.T.; Dodangeh, E.; Abraham, J. Estimation of daily soil temperature via data mining techniques in semi-arid climate conditions. Earth Sci. Res. J. 2017, 21, 85–93. [Google Scholar] [CrossRef]
  11. Apaydin, H.; Feizi, H.; Sattari, M.T.; Colak, M.S.; Shamshirband, S.; Chau, K.-W. Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting. Water 2020, 12, 1500. [Google Scholar] [CrossRef]
  12. Keskiner, A.; Ibrikci, T.; Cetin, M. Estimation and Comparison of Probabilistic Temperatures through Using Artificial Neural Networks in Geographic Information Systems Media. J. Agric. Sci. 2012, 17, 242–252. [Google Scholar]
  13. Yurekli, K.; Sattari, M.T.; Anli, A.S.; Hinis, M.A. Seasonal and annual regional drought prediction by using data-mining approach. Atmosfera 2012, 25, 85–105. [Google Scholar]
  14. Terzi, O.; Barak, M. Rainfall-Runoff Forecasting with Wavelet-Neural Network Approach: A Case Study of Kızılırmak River. J. Agric. Sci. 2015, 21, 546–557. [Google Scholar]
  15. Nourani, V.; Sattari, M.T.; Molajou, A. Threshold-Based Hybrid Data Mining Method for Long-Term Maximum Precipitation Forecasting. Water Resour. Manag. 2017, 31, 2645–2658. [Google Scholar] [CrossRef]
  16. Sattari, M.T.; Mirabbasi, R.; Sushab, R.S.; Abraham, J.P. Prediction of Groundwater Level in Ardebil Plain Using Support Vector Regression and M5 Tree Model. Ground Water 2018, 56, 636–646. [Google Scholar] [CrossRef]
  17. Rouzegari, N.; Hassanzadeh, Y.; Sattari, M.T. Using the Hybrid Simulated Annealing-M5 Tree Algorithms to Extract the If-Then Operation Rules in a Single Reservoir. Water Resour. Manag. 2019, 33, 3655–3672. [Google Scholar] [CrossRef]
  18. Shabani, S.; Samadianfard, S.; Sattari, M.T.; Mosavi, A.; Shamshirband, S.; Kmet, T.; Várkonyi-Kóczy, A.R. Modeling Pan Evaporation Using Gaussian Process Regression K-Nearest Neighbors Random Forest and Support Vector Machines; Comparative Analysis. Atmosphere 2020, 11, 66. [Google Scholar] [CrossRef] [Green Version]
  19. Zounemat-Kermani, M. Hydrometeorological Parameters in Prediction of Soil Temperature by Means of Artificial Neural Network: Case Study in Wyoming. J. Hydrol. Eng. 2013, 18, 707–718. [Google Scholar] [CrossRef]
  20. Aslay, F.; Ozen, U. Estimating Soil Temperature with Artificial Neural Networks Using Meteorological Parameters. J. Polytech. 2013, 16, 139–145. [Google Scholar]
  21. Hosseinzadeh Talaee, P. Daily soil temperature modeling using neuro-fuzzy approach. Theor. Appl. Climatol. 2014, 118, 481–489. [Google Scholar] [CrossRef]
  22. Kim, S.; Singh, V.P. Modeling daily soil temperature using data-driven models and spatial distribution. Theor. Appl. Climatol. 2014, 118, 465–479. [Google Scholar] [CrossRef]
  23. Yener, D.; Ozgener, O.; Ozgener, L. Prediction of soil temperatures for shallow geothermal applications in Turkey. Renew. Sustain. Energy Rev. 2017, 70, 71–77. [Google Scholar] [CrossRef]
  24. Samadianfard, S.; Asadi, E.; Jarhan, S.; Kazemi, H.; Kheshtgar, S.; Kisi, O.; Sajjadi, S.; Manaf, A.A. Wavelet neural networks and gene expression programming models to predict short-term soil temperature at different depths. Soil Tillage Res. 2018, 175, 37–50. [Google Scholar] [CrossRef]
  25. Feng, Y.; Cui, N.; Hao, W.; Gao, L.; Gong, D. Estimation of soil temperature from meteorological data using different machine learning models. Geoderma 2019, 338, 67–77. [Google Scholar] [CrossRef]
  26. Costache, R.; Pham, Q.B.; Avand, M.; Thuy Linh, N.T.; Vojtek, M.; Vojteková, J.; Lee, S.; Khoi, D.N.; Thao Nhi, P.T.; Dung, T.D. Novel hybrid models between bivariate statistics, artificial neural networks and boosting algorithms for flood susceptibility assessment. J. Environ. Manag. 2020, 265, 110485. [Google Scholar] [CrossRef] [PubMed]
  27. Matei, O.; Rusu, T.; Petrovan, A.; Mihut, G. A data mining system for real time soil moisture prediction. Procedia Eng. 2017, 181, 837–844. [Google Scholar] [CrossRef]
  28. Matei, O.; Rusu, T.; Bozga, A.; Pop, P.; Anton, A. Context-aware data mining: Embedding external data sources in a machine learning process. In International Conference on Hybrid Artificial Intelligence Systems; Springer: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
  29. Anton, C.A.; Avram, A.; Petrovan, A.; Matei, O. Performance Analysis of Collaborative Data Mining vs Context Aware Data Mining in a Practical Scenario for Predicting Air Humidity. In Proceedings of the Computational Methods in Systems and Software; Springer: Cham, Switzerland, 2019; pp. 31–40. [Google Scholar] [CrossRef]
  30. Wu, Z.; Zhou, Y.; Wang, H.; Jiang, Z. Depth prediction of urban flood under different rainfall return periods based on deep learning and data warehouse. Sci. Total Environ. 2020, 716, 137077. [Google Scholar] [CrossRef]
  31. Anoynmous. Sivas Investment Guide; Central Anatolia Development Agency: Kayseri, Turkey, 2017. (In Turkish)
  32. Anoynmous. Activity Report; Republic of Turkey, Sivas Governorship Agriculture and Forest Provincial Directorate: Sivas, Turkey, 2019. (In Turkish)
  33. Anoynmous. Meteorological Instruments; State Meteorological Service. Available online: https://www.mgm.gov.tr/genel/meteorolojikaletler.aspx (accessed on 8 August 2020). (In Turkish)
  34. Anoynmous. Specifications of Meteorological Instruments; State Meteorological Service. Available online: https://www.mgm.gov.tr/FILES/kurumsal/mevzuat/ruzgar-gunes-ek.pdf (accessed on 8 August 2020). (In Turkish)
  35. Hofmann, M.; Klinkenberg, R. RapidMiner: Data Mining Use Cases and Business Analytics Applications; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
  36. Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. In European Conference on Computational Learning Theory; Springer: Berlin/Heidelberg, Germany, 1995; pp. 23–37. [Google Scholar]
  37. Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
  38. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  39. Rokach, L.; Oded, Z.M. Data Mining with Decision Trees: Theory and Applications; World Scientific: Singapore, 2008; Volume 69. [Google Scholar]
  40. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
  41. Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [Google Scholar] [CrossRef] [Green Version]
  42. Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef] [Green Version]
  43. Koskela, T.; Markus, V.; Jukka, H.; Kimmo, K. Timeseries prediction using recurrent som with local linear models. Int. J. Knowl. Based Intell. Eng. Syst. 1998, 2, 60–68. [Google Scholar]
  44. Avram, A.; Matei, O.; Pintea, C.; Pop, P.; Anton, C. Context-aware data mining vs classical data mining: Case study on predicting soil moisture. In International Workshop on Soft Computing Models in Industrial and Environmental Applications; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
  45. Avram, A.; Matei, O.; Pintea, C.; Anton, C. Innovative Platform for Designing Hybrid Collaborative Context-Aware Data Mining Scenarios. Mathematics 2020, 8, 684. [Google Scholar] [CrossRef]
  46. Anton, C.A.; Matei, O.; Avram, A. Collaborative Data Mining in Agriculture for Prediction of Soil Moisture and Temperature. Computer Science On-Line Conference; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
Figure 1. Study region.
Figure 1. Study region.
Mathematics 08 01407 g001
Figure 2. Average soil temperature change at different depths.
Figure 2. Average soil temperature change at different depths.
Mathematics 08 01407 g002
Figure 3. Comparison of observed and predicted: (a) soil temperatures at 5 cm depth; (b) soil temperatures at 10 cm depth; (c) soil temperatures at 20 cm depth; (d) best scenario for ST5; (e) best scenario for ST10 and (f) best scenario for ST20, 10, and 20 cm depth according to DT model.
Figure 3. Comparison of observed and predicted: (a) soil temperatures at 5 cm depth; (b) soil temperatures at 10 cm depth; (c) soil temperatures at 20 cm depth; (d) best scenario for ST5; (e) best scenario for ST10 and (f) best scenario for ST20, 10, and 20 cm depth according to DT model.
Mathematics 08 01407 g003aMathematics 08 01407 g003b
Figure 4. Comparison of observed and predicted: (a) soil temperatures at 5 cm depth; (b) soil temperatures at 10 cm depth; (c) soil temperatures at 20 cm depth; (d) best scenario for ST5; (e) best scenario for ST10 and (f) best scenario for ST20Comparison of observed and predicted soil temperatures at 5, 10, and 20 cm depth according to the GBT model.
Figure 4. Comparison of observed and predicted: (a) soil temperatures at 5 cm depth; (b) soil temperatures at 10 cm depth; (c) soil temperatures at 20 cm depth; (d) best scenario for ST5; (e) best scenario for ST10 and (f) best scenario for ST20Comparison of observed and predicted soil temperatures at 5, 10, and 20 cm depth according to the GBT model.
Mathematics 08 01407 g004aMathematics 08 01407 g004bMathematics 08 01407 g004c
Figure 5. Comparison of observed and predicted: (a) soil temperatures at 5 cm depth; (b) soil temperatures at 10 cm depth; (c) soil temperatures at 20 cm depth; (d) best scenario for ST5; (e) best scenario for ST10 and (f) best scenario for ST20 Comparison of observed and predicted soil temperatures at 5, 10s and 20 cm depth according to the DT–GBT model.
Figure 5. Comparison of observed and predicted: (a) soil temperatures at 5 cm depth; (b) soil temperatures at 10 cm depth; (c) soil temperatures at 20 cm depth; (d) best scenario for ST5; (e) best scenario for ST10 and (f) best scenario for ST20 Comparison of observed and predicted soil temperatures at 5, 10s and 20 cm depth according to the DT–GBT model.
Mathematics 08 01407 g005aMathematics 08 01407 g005b
Figure 6. Taylor diagrams of the models used for different depths. (a) 5 cm depth; (b) 10 cm depth; (c) 20 cm depth.
Figure 6. Taylor diagrams of the models used for different depths. (a) 5 cm depth; (b) 10 cm depth; (c) 20 cm depth.
Mathematics 08 01407 g006aMathematics 08 01407 g006b
Table 1. Statistical properties of daily data related to air and soil.
Table 1. Statistical properties of daily data related to air and soil.
StatisticST5
(°C)
ST10
(°C)
ST20
(°C)
MinT
(°C)
MeanT
(°C)
MaxT
(°C)
Sunshine Intensity
(cal/cm2)
Sunshine Duration
(h)
Precip.
(mm)
Minimum−9.3−8.4−4.6−19.8−15.5−11.3000
Maximum41.636.631.226.331.741.1750.7813.133.6
Mean14.5314.3414.346.4712.4119.09383.667.121.02
Stdev11.1210.539.738.189.6611.30200.124.093.10
Number of
records
331633473347335833583367327533453395
Table 2. Scenarios used and input variables.
Table 2. Scenarios used and input variables.
ScenarioMeteorological Variables
1MinT-MaxT-MeanT-Sunshine Intensity-Sunshine Duration-Precipitation
2MinT-MaxT-MeanT-Sunshine Intensity-Sunshine Duration
3MinT-MaxT-MeanT-Sunshine Duration
4MinT-MaxT-MeanT-Sunshine Intensity
5Sunshine Intensity-Sunshine Duration
6MinT-MaxT-MeanT
7MeanT-Sunshine Duration
8MeanT
Table 3. RMSE (°C) values for different values for window size per algorithm—with bold the lowest values.
Table 3. RMSE (°C) values for different values for window size per algorithm—with bold the lowest values.
Window Value/Algorithm357
DT1.39371.42191.4209
GBT2.09392.09822.1058
Hybrid DT–GBT1.84911.95461.9622
Table 4. RMSE values for different values for decision tree maximal depth.
Table 4. RMSE values for different values for decision tree maximal depth.
Value for Maximal DepthAvg RMSE Per Scenario (°C)
33.8837
52.6878
72.2810
102.2010
152.3768
202.2789
Table 5. RMSE (°C) values for different values for gradient boosted trees number of trees and maximal depth. Background color for emphasizing 3 main groups.
Table 5. RMSE (°C) values for different values for gradient boosted trees number of trees and maximal depth. Background color for emphasizing 3 main groups.
No of TreesMax DepthRMSENo of TreesMax DepthRMSENo. of TreesMax DepthRMSE
30108.0326100104.3900200102.4910
30208.0313100204.3869200202.4892
30308.0313100304.3869200302.4893
50106.6889150103.1269
50206.6864150203.1240
50306.6864150303.1240
Table 6. RMSE results for all scenarios and algorithms chosen for ST5, ST10, and ST20—with bold the lowest values.
Table 6. RMSE results for all scenarios and algorithms chosen for ST5, ST10, and ST20—with bold the lowest values.
ScenarioAlgorithm RMSE (°C)
ST5ST10ST20
MeanTDT2.06241.37950.8209
MeanT-Sunshine DurationDT2.04541.33790.7703
MinT-MaxT-MeanTDT2.04191.36770.7935
MinT-MaxT-MeanT-Sunshine DurationDT2.02891.31650.7368
MinT-MaxT-MeanT-Sunshine IntensityDT2.10411.31960.7522
MinT-MaxT-MeanT-Sunshine Int.-Sunshine Dur.DT2.12261.33060.7481
MinT-MaxT-MeanT-Sunshine Intensity-Sunshine Duration-PrecipitationDT2.12711.32710.7479
Sunshine Intensity-Sunshine DurationDT2.01881.37900.7694
MeanTGBT2.61091.97341.6583
MeanT-Sunshine DurationGBT2.64951.97831.6505
MinT-MaxT-MeanTGBT2.64351.96741.6462
MinT-MaxT-MeanT-Sunshine DurationGBT2.66731.95541.6389
MinT-MaxT-MeanT-Sunshine IntensityGBT2.66861.96541.6509
MinT-MaxT-MeanT-Sunshine Int.-Sunshine Dur.GBT2.67851.96401.6480
MinT-MaxT-MeanT-Sunshine Intensity-Sunshine Duration-PrecipitationGBT2.68031.96631.6473
Sunshine Intensity-Sunshine DurationGBT2.65212.01091.6807
MeanTHybrid2.10071.48511.0770
MeanT-Sunshine DurationHybrid2.14401.43341.0295
MinT-MaxT-MeanTHybrid2.16091.44731.0445
MinT-MaxT-MeanT-Sunshine DurationHybrid2.15051.43511.0121
MinT-MaxT-MeanT-Sunshine IntensityHybrid2.18751.44321.0194
MinT-MaxT-MeanT-Sunshine Int.-Sunshine Dur.Hybrid2.18781.43751.0157
MinT-MaxT-MeanT-Sunshine Intensity-Sunshine Duration-PrecipitationHybrid2.20181.43661.0146
Sunshine Intensity-Sunshine DurationHybrid2.13151.44171.0644
Table 7. Results of DT model at different depths.
Table 7. Results of DT model at different depths.
InputsOutputNSRMAERMSEKGE
Sunshine Intensity-Sunshine DurationST50.96690.98331.45332.01880.975
MinT-MaxT-MeanT-Sunshine DurationST100.98460.99220.95641.31650.989
MinT-MaxT-MeanT-Sunshine DurationST200.99420.99710.51710.73680.995
Table 8. Results of GBT model at different depths.
Table 8. Results of GBT model at different depths.
InputsOutputNSRMAERMSEKGE
MeanTST50.94460.97931.91442.61090.857
MinT-MaxT-MeanT-Sunshine DurationST100.96580.99151.54421.95540.861
MinT-MaxT-MeanT-Sunshine DurationST200.97130.99391.26891.63890.866
Table 9. Results of DT–GBT model at different depths.
Table 9. Results of DT–GBT model at different depths.
InputsOutputNSRMAERMSEKGE
MeanTST50.96420.98391.53582.10070.921
MeanT-Sunshine DurationST100.98170.99341.10251.43340.922
MinT-MaxT-MeanT-Sunshine DurationST200.98900.99680.77791.01210.930
Table 10. Statistic for selected scenarios in used methods ST5.
Table 10. Statistic for selected scenarios in used methods ST5.
MethodsDTGBTHybrid DT–GBTMeasured
(ST5)
Best ScenarioSunshine Intensity,
Sunshine Duration
MeanTMeanT
Minimum−4.21−2.00−3.03−9.30
Maximum35.5532.0533.9541.60
Mean16.2916.1216.2914.53
Stdev10.899.5410.2511.12
Correlation0.98000.99230.99541.0000
Number of records9659659653316
Table 11. Statistic for selected scenarios in used methods ST10.
Table 11. Statistic for selected scenarios in used methods ST10.
MethodsDTGBTHybrid DT–GBTMeasured
(ST10)
Best ScenarioMinT-MaxT-MeanT-Sunshine DurationMinT-MaxT-MeanT-Sunshine DurationMeanT-Sunshine Duration
Minimum−5.25−1.43−2.28−8.40
Maximum33.8330.7031.9533.90
Mean15.8015.6215.7315.84
Stdev10.509.129.7710.58
Correlation0.99830.99000.99001.0000
Number of records9989989983347
Table 12. Statistic for selected scenarios in used methods ST20.
Table 12. Statistic for selected scenarios in used methods ST20.
MethodsDTGBTHybrid DT–GBTMeasured
(ST20)
Best ScenarioMinT-MaxT-MeanT-Sunshine DurationMinT-MaxT-MeanT-Sunshine DurationMinT-MaxT-MeanT-Sunshine Duration
Minimum−2.95−0.06−1.47−4.60
Maximum31.0228.5329.7831.20
Mean15.73915.5615.6815.79
Stdev9.64258.389.009.66
Correlation0.99940.99330.99741.0000
Number of records9989989983347

Share and Cite

MDPI and ACS Style

Sattari, M.T.; Avram, A.; Apaydin, H.; Matei, O. Soil Temperature Estimation with Meteorological Parameters by Using Tree-Based Hybrid Data Mining Models. Mathematics 2020, 8, 1407. https://doi.org/10.3390/math8091407

AMA Style

Sattari MT, Avram A, Apaydin H, Matei O. Soil Temperature Estimation with Meteorological Parameters by Using Tree-Based Hybrid Data Mining Models. Mathematics. 2020; 8(9):1407. https://doi.org/10.3390/math8091407

Chicago/Turabian Style

Sattari, Mohammad Taghi, Anca Avram, Halit Apaydin, and Oliviu Matei. 2020. "Soil Temperature Estimation with Meteorological Parameters by Using Tree-Based Hybrid Data Mining Models" Mathematics 8, no. 9: 1407. https://doi.org/10.3390/math8091407

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop