Next Article in Journal
Lightlike Hypersurfaces of Meta-Golden Semi-Riemannian Manifolds
Previous Article in Journal
Polo: Adaptive Trie-Based Log Parser for Anomaly Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Non-Linear Trend Function for Kriging with External Drift Using Least Squares Support Vector Regression

by
Kanokrat Baisad
1,
Nawinda Chutsagulprom
1,2 and
Sompop Moonchai
1,2,*
1
Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
2
Advanced Research Center for Computational Simulation (ARCCoS), Chiang Mai University, Chiang Mai 50200, Thailand
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(23), 4799; https://doi.org/10.3390/math11234799
Submission received: 27 October 2023 / Revised: 24 November 2023 / Accepted: 24 November 2023 / Published: 28 November 2023

Abstract

:
Spatial interpolation of meteorological data can have immense implications on risk management and climate change planning. Kriging with external drift (KED) is a spatial interpolation variant that uses auxiliary information in the estimation of target variables at unobserved locations. However, traditional KED methods with linear trend functions may not be able to capture the complex and non-linear interdependence between target and auxiliary variables, which can lead to an inaccurate estimation. In this work, a novel KED method using least squares support vector regression (LSSVR) is proposed. This machine learning algorithm is employed to construct trend functions regardless of the type of variable interrelations being considered. To evaluate the efficiency of the proposed method (KED with LSSVR) relative to the traditional method (KED with a linear trend function), a systematic simulation study for estimating the monthly mean temperature and pressure in Thailand in 2017 was conducted. The KED with LSSVR is shown to have superior performance over the KED with the linear trend function.

1. Introduction

Spatial interpolation is a fundamental technique employed in spatial data analysis to estimate the variable of interest at unobserved locations based on available data. Kriging is a geostatistical approach for spatial interpolation that provides the best linear unbiased prediction with the minimum estimation variance. An approximation of the kriging models relies on the assumption that a random process can be decomposed into a trend function and a random residual component. Ordinary kriging (OK) is a commonly used method with a constant trend, which is not suitable in the presence of a strong trend structure. On the other hand, kriging with external drift (KED) allows for the inclusion of auxiliary variables that have a strong spatial correlation with the target variable in order to increase precision in estimates [1,2]. In the KED, a trend model that is fitted to both target data points and significant auxiliary samples is first generated. The empirical variogram is thereafter derived from residuals computed from the difference between the trend estimates and measured values. A final prediction of the target variable is obtained as a weighted linear combination of observations, in which the weights are calculated through the Lagrange multiplier method. The KED method has been applied in various fields, including meteorology [3,4,5,6], geology [7,8,9,10], environmental modeling [11,12,13,14], agronomy [15,16,17], and hydrology [18].
The trend term for the KED is conventionally modeled by polynomial functions of degree one or two. However, in practice, a non-linear relationship often exists between influence factors and response variables where the application of such a polynomial is not adequate. Despite extensive research on the prediction using KED having been conducted, a study of non-linear trend functions for the KED remains scarce. Snepvangers et al. [19] developed a non-linear trend represented by a logarithmic function to interpolate soil water content using the KED technique with net precipitation as an auxiliary variable. Freier and Lieres [20] introduced a novel extension to universal kriging (UK), a specific instance of KED, aimed at handling non-linear trend patterns. They utilized a Taylor-based linearization approach in conjunction with an iterative parameter estimation procedure to construct a non-linear trend model. The method was applied to the Michaelis–Menten equation, which describes an enzymatic reaction. Freier et al. [21] subsequently employed this kriging technique to interpolate biocatalytic data with low and irregular density. Their method can be particularly of use in the presence of an explicit expression of the non-linear trend functions. Nevertheless, an interaction between the design factors and system response in real-world applications is naturally described by diverse and complex behavior, which is difficult to establish in an explicit form.
Machine learning (ML) has recently been gaining attention as a computationally efficient tool for identifying implicit relationships between variables. This allows one to generate and optimize the complex model based on the huge amount of data available for analysis. Support vector machine (SVM) is a kernel-based machine learning approach used for classification and regression. The particular use of SVMs for regression problems is called support vector regression (SVR), which was first introduced by Vapnik in 1992 [22]. The method adopts the structural risk minimization principle by minimizing the upper bound of the generalization errors. This leads to a linear decision function, which is essentially a convex quadratic programming (QP) problem. The core element of SVR is to search for the optimal hyperplane that fits the learning data while maximizing the distance between the hyperplane and the data points. In the case of non-linear problems, the SVR procedure starts by projecting input data into the high-dimensional feature space through some non-linear mapping, and the SVR subsequently performs linear regression to obtain the optimal hyperplane. Apart from producing high prediction accuracy for non-linear data, SVR is also suitable for applications characterized by small datasets [23,24,25]. Furthermore, this SVM regression-based algorithm has the generalization ability to reduce overfitting issues by introducing a regularization term into the loss function. Due to all these advantages, the technique has, therefore, been applied in diverse disciplines, including finance [26,27,28], economics [29,30,31], climate modeling [32,33], and healthcare [34,35]. However, SVR requires substantial computational time and significant memory usage to solve the QP problem. To overcome these limitations, Suykens and Vandewalle [36] proposed a variant of SVR known as least squares support vector regression (LSSVR). This method extends the traditional SVR by using a squared loss function rather than quadratic programming. The LSSVR results yield higher accuracy and require less computational resources compared to the reliability method relying on SVR [37].
There is a notable absence of research on the utilization of ML in geostatistical techniques. In this work, we will present a novel interpolation method in which the LSSVR method is used to compute non-linear trend functions within the context of KED. The proposed technique entails expressing the trend function in a structured form through explicit feature mapping. The purpose of our technique is to enhance the predictive capabilities of the KED model by incorporating the powerful capabilities of LSSVR for capturing non-linear relationships between variables.
The remainder of this paper is outlined as follows. Section 2 reviews the theory regarding the KED methodology and LSSVR technique. A detailed description of the KED using the LSSVR for modeling the non-linear trend functions is provided in Section 3. In Section 4, we conduct a comparative simulation study using the conventional KED model and the proposed method for temperature and pressure estimation in Thailand. Conclusions and discussion are drawn in Section 5.

2. Mathematical Background

2.1. Kriging with External Drift

Kriging is a spatial interpolation method that uses variogram analysis to predict the variable of interest at an unmeasured location based on the values of surrounding measured locations. It is the best linear unbiased estimator (BLUE) for the random function, { Z ( s ) : s D R d } , where D is a defined spatial domain and d is a positive integer representing the number of dimensions in the spatial domain. The value of Z ( s ) can be obtained through
Z ( s ) = μ ( s ) + ϵ ( s ) ,
where a deterministic component μ ( s ) indicates the underlying trend or drift and ϵ ( s ) is a stochastic residual component with a mean of zero and a variogram, which is a function of lag vector [2].
In KED, the trend is modeled by a function of auxiliary variables which can be expressed as
μ ( s ) = l = 0 L a l f l ( X ( s ) ) ,
where a l R { 0 } is the coefficient to be estimated, X ( s ) = X 1 ( s ) , , X η ( s ) T is the vector of η auxiliary variables at location s, f l is the prescribed function of auxiliary variables, and L + 1 is the number of terms used in the approximation. Additionally, f 0 is defined as the constant function with a value of 1 [2].
To determine the unknown coefficients in Equation (2), we can use the ordinary least squares (OLS) estimator or its extension, the generalized least squares (GLS) estimator, which accounts for the spatial correlation between individual observations [38].
Given n observed values, Z ( s 1 ) , , Z ( s n ) , at sample points, s 1 , s 2 , , s n . The attribute Z ( s 0 ) at an ungauged site s 0 is estimated as a linear combination of observed values so that
Z * ( s 0 ) = i = 1 n ω i Z ( s i ) ,
where ω i is the kriging weight assigned to Z ( s i ) . The weights are computed by minimizing the estimation error variance subject to the unbiased constraint. This results in the following optimization problem:
minimum of Var Z * ( s 0 ) Z ( s 0 ) , subject to E Z * ( s 0 ) Z ( s 0 ) = 0 .
The optimal weights of the system (4) for the KED model can be solved by using the Lagrange multiplier method, which leads to the KED system
j = 1 n ω j γ ϵ s i s j + l = 0 L λ l f l ( X ( s i ) ) = γ ϵ s i s 0 , i = 1 , , n , j = 1 n ω j f l ( X ( s i ) ) = f l ( X ( s 0 ) ) , l = 0 , 1 , , L ,
where γ ϵ ( · ) denotes the residual variogram function of Z ( s ) and λ l R is a Lagrange multiplier.
The variogram is a fundamental and important tool that quantifies the spatial correlation structure of the sample points. The variogram model is a smooth function that is reasonably well-fitted to the empirical variogram estimated from the data. In the present study, we use the empirical variogram estimator introduced by Matheron [39], and the parametric variogram is represented by an exponential model [38].
In general, both linear and quadratic functions are usually treated as a trend representation [9,18,40,41]. However, in certain scenarios, the relationship between the target and auxiliary variables is too complex to be captured by simple polynomial functions. In this work, least squares support vector regression (LSSVR) is used to model a non-linear trend function within the KED framework.

2.2. Least Squares Support Vector Regression

Given a dataset { Y i , Z i } i = 1 n , where Y i R η is a η dimensional training data point and Z i R represents a target output. The objective of least squares support vector regression (LSSVR) is to find a function that minimizes the square error between the predicted values and the actual values. In LSSVR, the input data Y i are mapped into a higher-dimensional feature space R η h , in which a linear model is adopted, so that a model function μ is formulated as
μ ( Y ) = a T ϕ ( Y ) + b ,
where ϕ ( Y ) is an η h dimensional feature mapping, a is an η h × 1 weight vector, and b R indicates a bias term.
In Equation (6), the unknown vector a and parameter b can be calculated by solving the following optimization problem:
minimum of 1 2 a T a + ν 2 i = 1 n ζ i 2 , subject to Z i = a T ϕ ( Y i ) + b + ζ i , i = 1 , , n ,
where ν is a regularization constant that constitutes a trade-off between the model complexity and the empirical error, and ζ i is a regression error.
The problem (7) can be reformulated as an unconstrained problem through the Lagrange multiplier method [42]. A set of linear equations corresponding to optimality conditions is consequently obtained, which provides an expression of the weight vector a :
a = i = 1 n α i ϕ ( Y i ) ,
where α i is a Lagrange multiplier. This system of equations can be reduced to the following form:
0 1 n T 1 n Ω + ν 1 I n b α = 0 Z ,
where Ω is referred to as the kernel matrix whose element is ϕ T ( Y i ) ϕ ( Y j ) for i , j = 1 , , n and I n is the identity matrix of size n. The matrix 1 n is an n × 1 unit matrix and Z = Z 1 , , Z n T is the n × 1 matrix of observed values together with the n × 1 matrix of Lagrange multipliers, α = [ α 1 , , α n ] T .
The solutions of Equation (9) are
b = 1 n T A 1 Z 1 n T A 1 1 n ,
α = A 1 ( Z b 1 n ) ,
where A = Ω + ν 1 I n , which is a symmetric and positive semi-definite matrix, thereby ascertaining the existence of its inverse, denoted as A 1 .
By replacing Equation (8) with Equation (6), the model for the LSSVR function, therefore, becomes
μ ( Y ) = i = 1 n α i ϕ T Y i ϕ Y + b , = i = 1 n α i K Y i , Y + b ,
where K ( , ) is the kernel associated with the feature mapping ϕ , and it is defined as [43]
K Y i , Y = ϕ T Y i ϕ Y .
Numerous kernel functions are available for the construction of various models, such as:
Linear kernel: K Y i , Y = Y i T Y .
Polynomial kernel: K Y i , Y = ( k + Y i T Y ) p , k > 0 and p N .
Radial basis function kernel: K Y i , Y = exp ( g Y Y i 2 ) , g > 0 and · means the Euclidean norm.
The LSSVR regarding the KED scheme is used to characterize the underlying trend. The notation for the dataset at the sample point s i R d is represented by X ( s i ) , Z ( s i ) i = 1 n , where X ( s i ) = X 1 ( s i ) , , X η ( s i ) T R η is the vector of η auxiliary variables and Z ( s i ) denotes the observation value.

3. A Novel Trend Function of KED Based on LSSVR

This section introduces a method for constructing the trend function in KED using the LSSVR method. The approach involves identifying the fundamental functions of the trend through explicit feature mapping. Examples of explicit feature mappings derived from the corresponding kernel functions are also demonstrated.

3.1. Construction of the Trend Function

Let ϕ ( X ( s ) ) be an M dimensional feature mapping, such that
ϕ X ( s ) = ϕ 1 ( X ( s ) ) , , ϕ M ( X ( s ) ) T ,
where ϕ m ( X ( s ) ) is the mth component of the feature mapping.
According to Equation (13), the kernel function is in the following form:
K X ( s i ) , X ( s ) = ϕ 1 ( X ( s i ) ) , , ϕ M ( X ( s i ) ) ϕ 1 ( X ( s ) ) ϕ M ( X ( s ) ) .
By substituting Equation (15) into Equation (12), μ ( s ) can be rewritten as
μ ( s ) = μ ( X ( s ) ) = i = 1 n α i ϕ 1 ( X ( s i ) ) , , ϕ M ( X ( s i ) ) ϕ 1 ( X ( s ) ) ϕ M ( X ( s ) ) + b , = i = 1 n α i l = 1 M ϕ l ( X ( s i ) ) ϕ l ( X ( s ) ) + b , = l = 1 M i = 1 n α i ϕ l ( X ( s i ) ) ϕ l ( X ( s ) ) + b , = l = 1 M i = 1 n α i ϕ l ( X ( s i ) ) ϕ l ( X ( s ) ) + b .
The trend function can, hence, be written in the following form:
μ ( s ) = l = 0 M ã l ϕ l ( X ( s ) ) ,
where the coefficient ã l = i = 1 n α i ϕ l ( X ( s i ) ) and ϕ l X ( s ) is the known function for l = 1 , , M in which ã 0 = b and ϕ 0 ( X ( s ) ) = 1 .
By comparing Equation (2) to Equation (17), ã l and ϕ l can be treated as a l and f l , respectively, and ϕ l in (17) is then employed in the KED system (5). This verifies the use of the kernel function as a non-linear trend model for the KED. The process of the KED based on the LSSVR method is provided according to the flowchart shown in Figure 1.

3.2. Examples of Explicitly Feature Mapping

There have been various kernels for the LSSVR method, namely linear, polynomial, and radial basis function kernels [44,45,46]. This section presents the last two kernels, as they are widely used and relatively easy to tune. They will also be applied in our model to formulate the trend component.

3.2.1. Polynomial Kernel

The polynomial kernel function is defined as
K X ( s i ) , X ( s ) = ( k + X T ( s i ) X ( s ) ) p ,
where k > 0 and p N is the degree of polynomial.
The feature mapping for polynomial kernel degree p is given by [47]
ϕ X ( s ) = p ! j 1 ! · · · j η + 1 ! X 1 j 1 ( s ) · · · X η j η ( s ) k j η + 1 | j i 0 with i = 1 η + 1 j i = p ,
where the dimensionality of ϕ X ( s ) is ( η + p ) ( η + p 1 ) . . . ( η + 1 ) p ! [48]. For example, when the degree of the polynomial kernel and the number of auxiliary variables are both equal to 2, with k being 1, then
ϕ X ( s ) = 1 , 2 X 1 ( s ) , 2 X 2 ( s ) , X 1 2 ( s ) , X 2 2 ( s ) , 2 X 1 ( s ) X 2 ( s ) T .
Compared with Equation (14), the components of the feature mapping are as follows: ϕ 1 X ( s ) = 1 , ϕ 2 X ( s ) = 2 X 1 ( s ) , ϕ 3 X ( s ) = 2 X 2 ( s ) , ϕ 4 X ( s ) = X 1 2 ( s ) , ϕ 5 X ( s ) = X 2 2 ( s ) , ϕ 6 X ( s ) = 2 X 1 ( s ) X 2 ( s ) .

3.2.2. Radial Basis Function Kernel

The implicit kernel function, exemplified by the radial basis function (RBF) kernel, assumes the following form:
K X ( s i ) , X ( s ) = exp ( g X ( s ) X ( s i ) 2 ) ,
where g > 0 is an RBF kernel parameter.
The feature mapping for the RBF kernel function can be formulated as
ϕ X ( s ) = exp ( g X ( s ) 2 ) ( 2 g ) r r ! ø r X ( s ) | r = 0 , , ,
where
ø r X ( s ) = r ! j 1 ! · · · j η ! X 1 j 1 ( s ) · · · X η j η ( s ) | j i 0 with i = 1 η j i = r ,
which is described in more detail in [49]. The RBF kernel function, which maps the auxiliary data to an infinite−dimensional space, can be approximated by Taylor polynomial-based monomial feature mapping (TPM feature mapping). In the work of [49], a finite-dimensional approximated feature mapping of the RBF function is obtained as follows:
ϕ X ( s ) = exp ( g X ( s ) 2 ) ( 2 g ) r r ! ø r X ( s ) | r = 0 , , r u ,
where r u is a selected approximation degree and the TPM feature mapping degree r u has ( η + r u ) ( η + r u 1 ) · · · ( η + 1 ) r u ! dimensions.
Although an increase in r u leads to improved estimation as ϕ X ( s ) approaches the true function, it is sufficient enough to use the TPM feature mapping with low dimensions [49]. An example of TPM feature mapping with a degree of two and two auxiliary variables is
ϕ X ( s ) = exp ( g X ( s ) 2 ) 1 , 2 g X 1 ( s ) , 2 g X 2 ( s ) , 2 g X 1 2 ( s ) , 2 g X 2 2 ( s ) , 2 g X 1 ( s ) X 2 ( s ) T .
By comparing Equation (25) with Equation (14), this results in ϕ 1 X ( s ) = exp ( g X ( s ) 2 ) , ϕ 2 X ( s ) = exp ( g X ( s ) 2 ) 2 g X 1 ( s ) , ϕ 3 X ( s ) = exp ( g X ( s ) 2 ) 2 g X 2 ( s ) , ϕ 4 X ( s ) = exp ( g X ( s ) 2 ) 2 g X 1 2 ( s ) , ϕ 5 X ( s ) = exp ( g X ( s ) 2 ) 2 g X 2 2 ( s ) , ϕ 6 X ( s ) = exp ( g X ( s ) 2 ) 2 g X 1 ( s ) X 2 ( s ) .

4. Case Study: Estimations of Temperature and Pressure in Thailand

4.1. Study Area

The evaluation of the efficiency and accuracy of the proposed techniques was carried out to interpolate temperature and pressure in Thailand. The country is located between 5 ° 37 N and 20 ° 27 N latitude and 97 ° 22 E and 105 ° 37 E longitude, with a total area of 513,115 km 2 and a coastline of 3219 km [50,51]. The data used in this study consist of monthly averages of temperature, pressure, relative humidity, digital elevation model (DEM), and geographic locations (coordinates) spanning from January 2017 to December 2017. The summary statistics of the data, including the mean values and standard deviations, are presented in Table 1. These data were acquired from the National Hydroinformatics and Climate Data Center (NHC), developed by Hydro-Informatics Institute (HII) [52]. Figure 2 displays 213 meteorological stations after data preparation and cleaning.

4.2. Evaluation of Model Accuracy

In this study, we compare the accuracy of KED with three different trend functions: the linear trend function estimated using the GLS estimator (KED−GLS), the non-linear trend function based on LSSVR with polynomial feature mapping of degree one and two (KED−Poly1 and KED−Poly2), and the non-linear trend function based on LSSVR with TPM feature mapping of degree one and two (KED−TPM1 and KED−TPM2).
The k fold cross-validation technique was applied to examine the performance of the models. The data were randomly divided into 10 folds. For each iteration, each fold was used as a testing dataset for the model built upon the remaining nine folds. After 10 iterations in which each fold was once selected as testing data, the overall estimation accuracy is an average of the accuracy scores calculated from each iteration [53]. The root-mean-square error (RMSE) [54] and the mean-absolute-percentage error (MAPE) [55] were the model performance indicators, which are formulated as follows:
RMSE = 1 N i = 1 N Z ( s i ) Z * ( s i ) 2 ,
MAPE = 1 N i = 1 N | Z ( s i ) Z * ( s i ) | Z ( s i ) × 100 ,
where N is the number of observations, Z ( s i ) and Z * ( s i ) denote the observed data and the estimated value at coordinate s i , respectively.

4.3. Results

Before proceeding to the KED estimation, a selection of auxiliary factors is required. Table 2 presents the statistical analysis of the interdependence between each selected variable and the target variables through the Pearson and Spearman correlation coefficients [56]. The results indicate a significant positive correlation between temperature and pressure (correlation coefficients greater than 0.5). The pressure is negatively correlated with both DEM and latitude, with correlation coefficients less than −0.5. This suggests that pressure can be chosen as an auxiliary variable for temperature estimation and vice versa. On the other hand, both DEM and latitude are additionally included as auxiliary factors for interpolating pressure. The scatter plots depicting the relationships between the target and auxiliary variables in March, July, and November are presented in Figure 3, Figure 4, and Figure 5, respectively.
The exponential variogram model employed in this study encompasses three parameters: nugget, sill, and range. The fitted residual variogram parameters for temperature data in fold 1 are presented in Table 3, while those for pressure data are shown in Table 4. Moreover, the empirical residual variograms and exponential variogram models of temperature and pressure in Thailand in March, July, and November 2017 are presented in Figure 6 and Figure 7, respectively.
Table 5 reports the estimation efficiency of the KED model with three different trend functions via MAPE and RMSE measures. According to the accuracy statistics, the KED with a non-linear trend function based on LSSVR has a superior estimation performance to that of the KED with a linear trend for both temperature and pressure. Specifically, the prediction errors for temperature generated by the KED−TPM2 are smaller than those of all other methods, with RMSE being 0.8123 and MAPE being 2.2888. These are, respectively, equivalent to 1.5633% and 2.1755% improvement compared to the KED−GLS method. The optimal pressure estimates are achieved by using KED−Poly2 with the RMSE and MAPE equal to 7.7541 and 0.5466, respectively. KED−Poly2 reduces both MAPE and RMSE values by over 10% with respect to the KED−GLS approach.
To further compare the estimation performance of all methods, visual–spatial distribution patterns of the monthly averages of temperature and pressure in Thailand in March, July, and November 2017 in fold 1 are presented. These maps were created using QGIS (Quantum Geographic Information System) software version 3.34.0 and the study area was partitioned into a grid of square cells of 0.05 degrees per side.
Figure 8 shows the spatial distribution patterns of the monthly mean temperature. The panels on the left column depict the results generated by KED−GLS, whereas the panels on the right column illustrate the results obtained from KED−TPM2. Both KED−TPM2 and KED−GLS produce a roughly similar distribution pattern for the average July temperature. This may be due to the fact that there is a small variation in temperature across the country during the rainy season (July–October). The discrepancy in temperature derived from these two models is therefore not significant. On the contrary, clear differences can be observed in March and November, in which the area of high temperature is more broadly distributed in the central part of the country for the KED−TPM2. The model also generates an overall lower temperature level concentrated in the northern region in November. Figure 9 displays spatial distribution maps of monthly mean pressure where the left column again corresponds to the estimates attained from the KED−GLS while those in the right column are derived from the KED−Poly2. The results show a distinct difference between these two methods. In particular, lower pressure values estimated by KED−Poly2 are clearly marked in the northern and western parts of the study area.

5. Conclusions and Discussion

This paper presents the novel KED method that applies the LSSVR technique to improve spatial interpolation accuracy in the presence of non-linear trends. The method involves determining the drift component through explicit feature mapping which is expressed in terms of kernel functions. A comparison between our proposed method and the KED with the linear trend is demonstrated in the case of the temperature and pressure estimation in Thailand in 2017. The results show that the KED with LSSVR outperforms the KED approach with a linear trend function regarding estimation accuracy.
The advantage of the KED with LSSVR can be attributed to its ability to extract implicit non-linear relationships between the target and auxiliary variables. This gives rise to more accurate interpolation results. Furthermore, the LSSVR is a powerful machine learning algorithm that has been proven effective in a variety of regression tasks. This allows our method to adapt to various data types. However, the choice of kernel function in LSSVR can have a significant impact on the estimation accuracy. Although a higher-degree polynomial kernel or a higher-degree TPM feature mapping can model more complex relationships in the data, it can also result in several equations in the kriging system. This can lead to more time-intensive computation and an increase in the likelihood of model overfitting issues.

Author Contributions

Conceptualization, K.B., N.C., and S.M.; methodology, K.B., N.C., and S.M.; software, K.B. and S.M.; validation, K.B., N.C., and S.M.; formal analysis, K.B., N.C., and S.M.; investigation, K.B., N.C., and S.M.; resources, K.B., N.C., and S.M.; data curation, K.B. and S.M.; writing—original draft preparation, K.B., N.C., and S.M.; writing—review and editing, K.B., N.C., and S.M.; visualization, K.B.; supervision, N.C. and S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by Fundamental Fund 2023, Chiang Mai University.

Data Availability Statement

All data were acquired from the National Hydroinformatics and Climate Data Center (NHC), developed by Hydro-Informatics Institute (HII) [52].

Acknowledgments

This research project was supported by (i) Chiang Mai University and (ii) Fundamental Fund 2023, Chiang Mai University.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wackernagel, H. Multivariate Geostatistics: An introduction with Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  2. Webster, R.; Oliver, M.A. Geostatistics for Environmental Scientists; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
  3. Hudson, G.; Wackernagel, H. Mapping temperature using kriging with external drift: Theory and an example from Scotland. Int. J. Climatol. 1994, 14, 77–91. [Google Scholar] [CrossRef]
  4. Bostan, P.; Heuvelink, G.B.; Akyurek, S. Comparison of regression and kriging techniques for mapping the average annual precipitation of Turkey. Int. J. Appl. Earth Obs. Geoinf. 2012, 19, 115–126. [Google Scholar] [CrossRef]
  5. Varentsov, M.; Esau, I.; Wolf, T. High-resolution temperature mapping by geostatistical kriging with external drift from large-eddy simulations. Mon. Weather. Rev. 2020, 148, 1029–1048. [Google Scholar] [CrossRef]
  6. Cantet, P. Mapping the mean monthly precipitation of a small island using kriging with external drifts. Theor. Appl. Climatol. 2017, 127, 31–44. [Google Scholar] [CrossRef]
  7. Bourennane, H.; King, D.; Chery, P.; Bruand, A. Improving the kriging of a soil variable using slope gradient as external drift. Eur. J. Soil Sci. 1996, 47, 473–483. [Google Scholar] [CrossRef]
  8. Bourennane, H.; King, D.; Couturier, A. Comparison of kriging with external drift and simple linear regression for predicting soil horizon thickness with different sample densities. Geoderma 2000, 97, 255–271. [Google Scholar] [CrossRef]
  9. Bourennane, H.; King, D. Using multiple external drifts to estimate a soil variable. Geoderma 2003, 114, 1–18. [Google Scholar] [CrossRef]
  10. Béjar-Pizarro, M.; Guardiola-Albert, C.; García-Cárdenas, R.P.; Herrera, G.; Barra, A.; López Molina, A.; Tessitore, S.; Staller, A.; Ortega-Becerril, J.A.; García-García, R.P. Interpolation of GPS and geological data using InSAR deformation maps: Method and application to land subsidence in the alto guadalentín aquifer (SE Spain). Remote Sens. 2016, 8, 965. [Google Scholar] [CrossRef]
  11. Beauchamp, M.; de Fouquet, C.; Malherbe, L. Dealing with non-stationarity through explanatory variables in kriging-based air quality maps. Spat. Stat. 2017, 22, 18–46. [Google Scholar] [CrossRef]
  12. Beauchamp, M.; Malherbe, L.; de Fouquet, C.; Létinois, L.; Tognet, F. A polynomial approximation of the traffic contributions for kriging-based interpolation of urban air quality model. Environ. Model. Softw. 2018, 105, 132–152. [Google Scholar] [CrossRef]
  13. Troisi, S.; Fallico, C.; Straface, S.; Migliari, E. Application of kriging with external drift to estimate hydraulic conductivity from electrical-resistivity data in unconsolidated deposits near Montalto Uffugo, Italy. Hydrogeol. J. 2000, 8, 356–367. [Google Scholar] [CrossRef]
  14. Idir, Y.M.; Orfila, O.; Judalet, V.; Sagot, B.; Chatellier, P. Mapping urban air quality from mobile sensors using spatio-temporal geostatistics. Sensors 2021, 21, 4717. [Google Scholar] [CrossRef] [PubMed]
  15. Garcia-Papani, F.; Leiva, V.; Ruggeri, F.; Uribe-Opazo, M.A. Kriging with external drift in a Birnbaum–Saunders geostatistical model. Stoch. Environ. Res. Risk Assess. 2018, 32, 1517–1530. [Google Scholar] [CrossRef]
  16. Cafarelli, B.; Castrignanò, A. The use of geoadditive models to estimate the spatial distribution of grain weight in an agronomic field: A comparison with kriging with external drift. Environmetrics 2011, 22, 769–780. [Google Scholar] [CrossRef]
  17. Anand, A.; Singh, P.; Srivastava, P.K.; Gupta, M. GIS-based analysis for soil moisture estimation via kriging with external drift. In Agricultural Water Management; Elsevier: Amsterdam, The Netherlands, 2021; pp. 391–408. [Google Scholar]
  18. Rivest, M.; Marcotte, D.; Pasquier, P. Hydraulic head field estimation using kriging with an external drift: A way to consider conceptual model information. J. Hydrol. 2008, 361, 349–361. [Google Scholar] [CrossRef]
  19. Snepvangers, J.; Heuvelink, G.; Huisman, J. Soil water content interpolation using spatio-temporal kriging with external drift. Geoderma 2003, 112, 253–271. [Google Scholar] [CrossRef]
  20. Freier, L.; von Lieres, E. Kriging based iterative parameter estimation procedure for biotechnology applications with nonlinear trend functions. IFAC-PapersOnLine 2015, 48, 574–579. [Google Scholar] [CrossRef]
  21. Freier, L.; Wiechert, W.; von Lieres, E. Kriging with trend functions nonlinear in their parameters: Theory and application in enzyme kinetics. Eng. Life Sci. 2017, 17, 916–922. [Google Scholar] [CrossRef]
  22. Mozer, M.C.; Jordan, M.I.; Petsche, T. Advances in Neural Information Processing Systems 9: Proceedings of the 1996 Conference, Denver, CO, USA, 2–5 December 1996; MIT Press: Cambridge, MA, USA, 1997; Volume 9. [Google Scholar]
  23. Al-Anazi, A.F.; Gates, I.D. Support vector regression to predict porosity and permeability: Effect of sample size. Comput. Geosci. 2012, 39, 64–76. [Google Scholar] [CrossRef]
  24. Wiering, M.A.; Van der Ree, M.H.; Embrechts, M.; Stollenga, M.; Meijster, A.; Nolte, A.; Schomaker, L. The neural support vector machine. In Proceedings of the BNAIC 2013: Proceedings of the 25th Benelux Conference on Artificial Intelligence, Delft, The Netherlands, 7–8 November 2013; Delft University of Technology (TU Delft): Delft, The Netherlands, 2013. [Google Scholar]
  25. Zhong, H.; Wang, J.; Jia, H.; Mu, Y.; Lv, S. Vector field-based support vector regression for building energy consumption prediction. Appl. Energy 2019, 242, 403–414. [Google Scholar] [CrossRef]
  26. Henrique, B.M.; Sobreiro, V.A.; Kimura, H. Stock price prediction using support vector regression on daily and up to the minute prices. J. Financ. Data Sci. 2018, 4, 183–201. [Google Scholar] [CrossRef]
  27. Zhang, J.; Teng, Y.F.; Chen, W. Support vector regression with modified firefly algorithm for stock price forecasting. Appl. Intell. 2019, 49, 1658–1674. [Google Scholar] [CrossRef]
  28. Qu, F.; Wang, Y.T.; Hou, W.H.; Zhou, X.Y.; Wang, X.K.; Li, J.B.; Wang, J.Q. Forecasting of automobile sales based on support vector regression optimized by the grey wolf optimizer algorithm. Mathematics 2022, 10, 2234. [Google Scholar] [CrossRef]
  29. Mishra, S.; Padhy, S. An efficient portfolio construction model using stock price predicted by support vector regression. N. Am. J. Econ. Financ. 2019, 50, 101027. [Google Scholar] [CrossRef]
  30. Fan, G.F.; Yu, M.; Dong, S.Q.; Yeh, Y.H.; Hong, W.C. Forecasting short-term electricity load using hybrid support vector regression with grey catastrophe and random forest modeling. Util. Policy 2021, 73, 101294. [Google Scholar] [CrossRef]
  31. Wang, Y.; Zhang, Y. Multivariate SVR Demand Forecasting for Beauty Products Based on Online Reviews. Mathematics 2023, 11, 4420. [Google Scholar] [CrossRef]
  32. Arulmozhi, E.; Basak, J.K.; Sihalath, T.; Park, J.; Kim, H.T.; Moon, B.E. Machine learning-based microclimate model for indoor air temperature and relative humidity prediction in a swine building. Animals 2021, 11, 222. [Google Scholar] [CrossRef]
  33. Quan, Q.; Hao, Z.; Xifeng, H.; Jingchun, L. Research on water temperature prediction based on improved support vector regression. Neural Comput. Appl. 2022, 34, 8501–8510. [Google Scholar] [CrossRef]
  34. Jaiswal, P.; Gaikwad, M.; Gaikwad, N. Analysis of AI techniques for healthcare data with implementation of a classification model using support vector machine. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Nagpur, India, 2021; Volume 1913, p. 012136. [Google Scholar]
  35. Al-Manaseer, H.; Abualigah, L.; Alsoud, A.R.; Zitar, R.A.; Ezugwu, A.E.; Jia, H. A novel big data classification technique for healthcare application using support vector machine, random forest and J48. In Classification Applications with Deep Learning and Machine Learning Technologies; Springer: Berlin/Heidelberg, Germany, 2022; pp. 205–215. [Google Scholar]
  36. Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  37. Guo, Z.; Bai, G. Application of least squares support vector machine for regression to reliability analysis. Chin. J. Aeronaut. 2009, 22, 160–166. [Google Scholar] [CrossRef]
  38. Cressie, N. Statistics for Spatial Data; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  39. Vallejos, R.; Osorio, F.; Bevilacqua, M. Spatial Relationships between Two Georeferenced Variables: With Applications in R; Springer Nature: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
  40. Ly, S.; Charles, C.; Degre, A. Geostatistical interpolation of daily rainfall at catchment scale: The use of several variogram models in the Ourthe and Ambleve catchments, Belgium. Hydrol. Earth Syst. Sci. 2011, 15, 2259–2274. [Google Scholar] [CrossRef]
  41. Amini, M.A.; Torkan, G.; Eslamian, S.; Zareian, M.J.; Adamowski, J.F. Analysis of deterministic and geostatistical interpolation techniques for mapping meteorological variables at large watershed scales. Acta Geophys. 2019, 67, 191–203. [Google Scholar] [CrossRef]
  42. Huang, P.; Yu, H.; Wang, T. A Study Using Optimized LSSVR for Real-Time Fault Detection of Liquid Rocket Engine. Processes 2022, 10, 1643. [Google Scholar] [CrossRef]
  43. Yeh, W.C.; Zhu, W. Forecasting by Combining Chaotic PSO and Automated LSSVR. Technologies 2023, 11, 50. [Google Scholar] [CrossRef]
  44. Xie, G.; Wang, S.; Zhao, Y.; Lai, K.K. Hybrid approaches based on LSSVR model for container throughput forecasting: A comparative study. Appl. Soft Comput. 2013, 13, 2232–2241. [Google Scholar] [CrossRef]
  45. Hongzhe, M.; Wei, Z.; Rongrong, W. Prediction of dissolved gases in power transformer oil based on RBF-LSSVM regression and imperialist competition algorithm. In Proceedings of the 2017 2nd International Conference on Power and Renewable Energy (ICPRE), Chengdu, China, 20–23 September 2017; pp. 291–295. [Google Scholar]
  46. Wang, X.; Wang, G.; Zhang, X. Prediction of Chlorophyll-a content using hybrid model of least squares support vector regression and radial basis function neural networks. In Proceedings of the 2016 Sixth International Conference on Information Science and Technology (ICIST), Dalian, China, 6–8 May 2016; pp. 366–371. [Google Scholar]
  47. Shashua, A. Introduction to machine learning: Class notes 67577. arXiv 2009, arXiv:0904.3664. [Google Scholar]
  48. Chang, Y.W.; Hsieh, C.J.; Chang, K.W.; Ringgaard, M.; Lin, C.J. Training and testing low-degree polynomial data mappings via linear SVM. J. Mach. Learn. Res. 2010, 11, 1471–1490. [Google Scholar]
  49. Lin, K.P.; Chen, M.S. Efficient kernel approximation for large-scale support vector machine classification. In Proceedings of the 2011 SIAM International Conference on Data Mining—SIAM, Mesa, AZ, USA, 28–30 May 2011; pp. 211–222. [Google Scholar]
  50. Chariyaphan, R. Thailand’s Country Profile 2012; Department of Disaster Prevention and Mitigation, Ministry of Interior: Bangkok, Thailand, 2012. [Google Scholar]
  51. Laonamsai, J.; Ichiyanagi, K.; Kamdee, K. Geographic effects on stable isotopic composition of precipitation across Thailand. Isot. Environ. Health Stud. 2020, 56, 111–121. [Google Scholar] [CrossRef]
  52. OpenData. Available online: https://data.hii.or.th (accessed on 14 October 2020).
  53. Du, K.L.; Swamy, M.N. Neural Networks and Statistical Learning; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  54. Li, J.; Heap, A.D. A Review of Spatial Interpolation Methods for Environmental Scientists; Record 2008/23; Geoscience Australia: Canberra, Australia, 2008. [Google Scholar]
  55. Bae, B.; Kim, H.; Lim, H.; Liu, Y.; Han, L.D.; Freeze, P.B. Missing data imputation for traffic flow speed using spatio-temporal cokriging. Transp. Res. Part C Emerg. Technol. 2018, 88, 124–139. [Google Scholar] [CrossRef]
  56. Akoglu, H. User’s guide to correlation coefficients. Turk. J. Emerg. Med. 2018, 18, 91–93. [Google Scholar] [CrossRef]
Figure 1. Flowchart of interpolation using the KED method with the proposed trend function.
Figure 1. Flowchart of interpolation using the KED method with the proposed trend function.
Mathematics 11 04799 g001
Figure 2. Spatial distributions of meteorological stations in the study area in 2017.
Figure 2. Spatial distributions of meteorological stations in the study area in 2017.
Mathematics 11 04799 g002
Figure 3. Scatter plots of target versus auxiliary variables in March 2017, (a) temperature and pressure; (b) pressure and DEM; (c) pressure and latitude, and (d) pressure and temperature.
Figure 3. Scatter plots of target versus auxiliary variables in March 2017, (a) temperature and pressure; (b) pressure and DEM; (c) pressure and latitude, and (d) pressure and temperature.
Mathematics 11 04799 g003
Figure 4. Scatter plots of target versus auxiliary variables in July 2017, (a) temperature and pressure; (b) pressure and DEM; (c) pressure and latitude, and (d) pressure and temperature.
Figure 4. Scatter plots of target versus auxiliary variables in July 2017, (a) temperature and pressure; (b) pressure and DEM; (c) pressure and latitude, and (d) pressure and temperature.
Mathematics 11 04799 g004
Figure 5. Scatter plots of target versus auxiliary variables in November 2017, (a) temperature and pressure; (b) pressure and DEM; (c) pressure and latitude, and (d) pressure and temperature.
Figure 5. Scatter plots of target versus auxiliary variables in November 2017, (a) temperature and pressure; (b) pressure and DEM; (c) pressure and latitude, and (d) pressure and temperature.
Mathematics 11 04799 g005
Figure 6. Empirical residual variograms and exponential variogram models of temperature in Thailand in March, July, and November 2017, using: (a1), (b1), and (c1) GLS trend estimation (left panels); (a2), (b2), and (c2) TPM2 trend estimation (right panels).
Figure 6. Empirical residual variograms and exponential variogram models of temperature in Thailand in March, July, and November 2017, using: (a1), (b1), and (c1) GLS trend estimation (left panels); (a2), (b2), and (c2) TPM2 trend estimation (right panels).
Mathematics 11 04799 g006
Figure 7. Empirical residual variograms and exponential variogram models of pressure in Thailand in March, July, and November 2017, using: (a1), (b1), and (c1) GLS trend estimation (left panels); (a2), (b2), and (c2) Poly2 trend estimation (right panels).
Figure 7. Empirical residual variograms and exponential variogram models of pressure in Thailand in March, July, and November 2017, using: (a1), (b1), and (c1) GLS trend estimation (left panels); (a2), (b2), and (c2) Poly2 trend estimation (right panels).
Mathematics 11 04799 g007aMathematics 11 04799 g007b
Figure 8. Spatial distribution of temperature in Thailand in March, July, and November 2017, interpolated using: (a1), (b1), and (c1) KED−GLS (left panels); (a2), (b2), and (c2) KED−TPM2 (right panels).
Figure 8. Spatial distribution of temperature in Thailand in March, July, and November 2017, interpolated using: (a1), (b1), and (c1) KED−GLS (left panels); (a2), (b2), and (c2) KED−TPM2 (right panels).
Mathematics 11 04799 g008
Figure 9. Spatial distribution of pressure in Thailand in March, July, and November 2017, interpolated using: (a1), (b1), and (c1) KED−GLS (left panels); (a2), (b2), and (c2) KED−Poly2 (right panels).
Figure 9. Spatial distribution of pressure in Thailand in March, July, and November 2017, interpolated using: (a1), (b1), and (c1) KED−GLS (left panels); (a2), (b2), and (c2) KED−Poly2 (right panels).
Mathematics 11 04799 g009
Table 1. Descriptive statistics of the data.
Table 1. Descriptive statistics of the data.
VariablesMean ValuesStandard Deviations
Temperature28.31461.2584
Pressure989.546014.4254
Relative humidity75.32584.4412
Latitude15.10743.5582
Longitude100.97691.8696
DEM159.7653148.1880
Table 2. Correlation coefficients between the target and auxiliary variables.
Table 2. Correlation coefficients between the target and auxiliary variables.
Auxiliary VariablesTemperaturePearson
PearsonSpearmanPearsonSpearman
Temperature1.00001.00000.55370.5092
Pressure0.55370.50921.00001.0000
Relative humidity−0.2474−0.23890.15400.1869
Latitude−0.1322−0.2425−0.5338−0.5867
Longitude0.01980.03450.0283−0.0399
DEM−0.4501−0.4421−0.7350−0.7470
Table 3. Theresidual variogram parameters for temperature data.
Table 3. Theresidual variogram parameters for temperature data.
MonthsGLS Trend EstimationTPM2 Trend Estimation
NuggetSillRangeNuggetSillRange
March0.63140.8036575.46460.62820.8526522.9156
July0.22790.3905178.43600.23470.4214161.7477
November0.15870.3813138.09780.26080.5346201.1555
Table 4. The residual variogram parameters for pressure data.
Table 4. The residual variogram parameters for pressure data.
MonthsGLS Trend EstimationPoly2 Trend Estimation
NuggetSillRangeNuggetSillRange
March18.819568.200668.451315.966768.588655.4515
July22.450272.184093.599019.843770.917776.0110
November10.016232.922965.627010.297133.189371.1975
Table 5. Prediction errors of kriging with external drift and three different trend functions for temperature and pressure data in 2017.
Table 5. Prediction errors of kriging with external drift and three different trend functions for temperature and pressure data in 2017.
Target
Variables
Auxiliary
Variables
ErrorsKED with
Linear Trend
KED with Non-Linear Trend Based on LSSVR
Polynomial Feature MappingTPM Feature Mapping
KED-GLSKED-Poly1KED-Poly2KED-TPM1KED-TPM2
TemperaturePressureRMSE0.82520.82750.82320.85120.8123
MAPE2.33972.34862.34392.40412.2888
PressureDEMRMSE8.62128.61117.75419.38548.7372
Latitude
Temperature
MAPE0.61700.61810.54660.66980.6329
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Baisad, K.; Chutsagulprom, N.; Moonchai, S. A Non-Linear Trend Function for Kriging with External Drift Using Least Squares Support Vector Regression. Mathematics 2023, 11, 4799. https://doi.org/10.3390/math11234799

AMA Style

Baisad K, Chutsagulprom N, Moonchai S. A Non-Linear Trend Function for Kriging with External Drift Using Least Squares Support Vector Regression. Mathematics. 2023; 11(23):4799. https://doi.org/10.3390/math11234799

Chicago/Turabian Style

Baisad, Kanokrat, Nawinda Chutsagulprom, and Sompop Moonchai. 2023. "A Non-Linear Trend Function for Kriging with External Drift Using Least Squares Support Vector Regression" Mathematics 11, no. 23: 4799. https://doi.org/10.3390/math11234799

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop