Next Article in Journal
Solitary Wave Solutions of the Generalized Rosenau-KdV-RLW Equation
Previous Article in Journal
General Local Convergence Theorems about the Picard Iteration in Arbitrary Normed Fields with Applications to Super–Halley Method for Multiple Polynomial Zeros
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Numerical Non-Linear Modelling Algorithm Using Radial Kernels on Local Mesh Support

by
Francisco José Navarro-González
1,
Yolanda Villacampa
1,*,
Mónica Cortés-Molina
1 and
Salvador Ivorra
2
1
Department of Applied Mathematics, University of Alicante, 03690 Alicante, Spain
2
Department of Civil Engineering, University of Alicante, 03690 Alicante, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(9), 1600; https://doi.org/10.3390/math8091600
Submission received: 24 August 2020 / Revised: 12 September 2020 / Accepted: 15 September 2020 / Published: 17 September 2020
(This article belongs to the Section Computational and Applied Mathematics)

Abstract

:
Estimation problems are frequent in several fields such as engineering, economics, and physics, etc. Linear and non-linear regression are powerful techniques based on optimizing an error defined over a dataset. Although they have a strong theoretical background, the need of supposing an analytical expression sometimes makes them impractical. Consequently, a group of other approaches and methodologies are available, from neural networks to random forest, etc. This work presents a new methodology to increase the number of available numerical techniques and corresponds to a natural evolution of the previous algorithms for regression based on finite elements developed by the authors improving the computational behavior and allowing the study of problems with a greater number of points. It possesses an interesting characteristic: Its direct and clear geometrical meaning. The modelling problem is presented from the point of view of the statistical analysis of the data noise considered as a random field. The goodness of fit of the generated models has been tested and compared with some other methodologies validating the results with some experimental campaigns obtained from bibliography in the engineering field, showing good approximation. In addition, a small variation on the data estimation algorithm allows studying overfitting in a model, that it is a problematic fact when numerical methods are used to model experimental values.

1. Introduction

Obtaining a numerical or analytical representation of a given unknown relationship is a frequent problem in the study and modelling of systems, with application in a wide variety of fields ranging from experimental and social sciences to engineering. In addition, in most cases, the only information available is a set of experimental data from the variables that define the relationship. In other words, without loss of generality, given
y = f ( x 1 , , x d )
with
x = ( x 1 , , x d ) Ω d
usually only one set of experimental data is known for the variables involved in the relationship (1):
{ ( x [ k ] 1 , x [ k ] 2 , x [ k ] 3   , .. , x [ k ] d ) } k = 1 , 2 , . , p
These values are usually affected by several factors that induce variations between the real and the observed quantities. That is, for any k:
y [ k ] = f ( x [ k ] 1 + ε [ k ] 1 ,   x [ k ] 2 + ε [ k ] 2 , . ,   x [ k ] d + ε [ k ] d ) +   ϵ [ k ]   =   f ( x [ k ] + ε [ k ] ) +   ϵ [ k ] .
If Ω is a bounded domain of d , and f C 1 , developing the function f ( x ) in x [ k ] = ( x [ k ] 1 , x [ k ] 2 , x [ k ] 3   , .. , x [ k ] d ) , from (4):
y [ k ] = f ( x [ k ] 1 ,   x [ k ] 2 , . ,   x [ k ] d ) + i f x i ( ξ [ k ] 1 ,   ξ [ k ] 2 , . ,   ξ [ k ] d ) · ε [ k ] i +   ϵ [ k ]
where ξ [ k ] = ( ξ [ k ] 1 ,   ξ [ k ] 2 , . ,   ξ [ k ] d ) = x [ k ] + λ · ε [ k ] , with λ [ 0 , 1 ] .
Defining a new value e [ k ] that includes all perturbations:
e [ k ] = i f x i ( ξ [ k ] 1 ,   ξ [ k ] 2 , . ,   ξ [ k ] d ) · ε [ k ] i + ϵ [ k ]
Therefore,
y [ k ] = f ( x [ k ] 1 ,   x [ k ] 2 , . ,   x [ k ] d ) + e [ k ]
Expression (7) is frequently assumed implying that y is the only variable affected by the error, while the variables { x i } i = 1 d correspond to the exact values. It is also frequent to include some considerations on the distribution of probability that generates the values of the perturbation, especially the assumption of normality.
Sometimes the form of the relations is known as occurs in linear regression where the equation corresponding to the studied system is:
y = α + λ 1   x 1 + λ 2   x 2   + .. + λ d x d
The problem to determine the values of lambda parameters, that give the best fit between the linear expression and the data, is now reduced minimizing a previously defined error function.
In real cases, the effects of non-linearity dominate the dynamics of the system and far from the equilibrium points linear relations as in (8) are not useful. In the cases when the relation is not linear but known:
y = z ( x 1 , x 2 , . , x d , .. , λ 1 , λ 2 , .. , λ Q )
The problem consists in the calculation of the unknown parameters ( λ 1 , λ 2 , .. , λ Q ) using the data.
In (8) and (9), determining the values of the parameters of the model can be done by minimizing an error function defined over the parameters. One common method is to consider the mean square error (MSE) of the estimated values on the experimental points.
e ( λ 1 , λ 2 , .. , λ Q ) = k = 1 p ( y [ k ] z ( x [ k ] 1 , . x [ k ] d , λ 1 , λ 2 , .. , λ Q ) ) 2
However, in real practice, the expression for the relationship is unknown and the problem is to obtain an analytical expression or a numerical approximation to the relation (1).
The finite element method (FEM) is a numerical method used initially for solving structural problems in civil and aeronautical engineering that is applied in a wide variety of fields, see [1,2]. This method allows finding approximate solutions to differential equations and boundary problems in partial differential equations.
Given a differential equation defined through a differential operator, D :
D ( f ) = v
where f , v     V with V a function space the finite elements method replaces V by a finite dimensional subspace V h V whose elements are continuous piecewise polynomial functions of degree K (this is a Sobolev Space). In addition, the domain of the problem, Ω is divided in N e parts called elements Ω i .
Ω = i = 1 N e Ω i
The original Equation (11) changes now to:
D ( f h ) = v h
where f h ,   v h   ϵ   V h .
If dim V h = Q , taking a basis of V h :
V h = φ 1 ( x ) , φ 2 ( x ) , . , φ Q ( x )
the solution can be approximated as:
f h ( x ) = i = 1 Q u i · φ i ( x )
The values of u i must be determined to get a good approximation for the solution to the differential equation.
The basis (14) is selected considering the behaviour of their functions in a set of Q points called nodes ( ζ 1 ,   ζ 2 , .. , ζ Q ) that satisfy the following conditions:
φ i ( ζ j ) = δ i j
In addition, an important property is that φ i ( x ) = 0 if the node ζ i does not form part of the element containing the point x.
The functions of the basis are named shape functions and they are used to interpolate in points different from the nodes.
The selection of the nodes is a crucial point for the precision of the results. The operation that construct the set of nodes and elements is called meshing and is determined by the set of elements and nodes ( { ζ r } r = 1 Q , { Ω i } i = 1 N e ) . The resulting mesh can be locally characterized by a parameter related with the size of the corresponding local element and the error of the approximated solution in the element.
The authors in previous papers, using different methodologies, have studied regression methods based on the FEM. In [3], the regression algorithm is based on a least squares approach comparing experimental data and the model generated using FEM. In [4], the model is built from a physical analogy that solves some problems on the linear system. One approach more on the line of the original FEM can be seen in [5] and applied by an example in [6,7]. Although different modifications of the initial algorithm have been developed, see [8] in order to improve the computing speeds for the regression model, the basic methodology can be resumed as follows.
The numerical regression model for the relation consists of the set formed by the nodal values { u i } , which is called a representation model for the system. Then, the value of the relationship in any point can be estimated using the expression:
f h ( x ) = i = 1 Q u i   φ i ( x )
By (16), the sum is restricted to the nodes that determine the element which contains point x. For simplicity in the subsequent calculations the authors have normalised the data in the domain, [ 0 , 1 ] d , and used a mesh formed by regular hyper-cubic elements with edge length h . Introducing the complexity c as the number of elements in each dimension, h = 1 / c , the total number of elements is c d and the total number of nodes is ( c + 1 ) d .
The key point in FEM-based methods is the determination of the values { u i } considering the definition of an error function and its minimization. That problem is equivalent to solve a linear system of size, ( c + 1 ) d . The results and precision have been checked with good performance for low dimensions, but there is a practical problem when the dimension of the problem becomes greater because of the order of the linear system solution’s algorithms.
The problem with this approach is the conversion of a local problem, which is the determination of the function value in a point x o , into a global question solving a system to obtain the ( c + 1 ) d nodal values.
As a conclusion, the direct geometric interpretation of FEM regression methods makes them very interesting compared to other approaches. The techniques related with finite elements are equivalent to the determination of the tangent subspace of the manifold defined by y = f ( x ) at the objective point x 0 . The natural evolution of the methods previously considered in the introduction is the restriction of the calculations to the boundary of x 0 , determined by the underlying FEM methodology. The only requirement is the determination of estimations on nodal points. These estimations can be as simple as the k-nearest neighbor (k-NN). However, k-NN presents the problem of determining the optimal value of k, to include as much representative values as possible, minimizing the influence of outliers. This can be solved by introducing the distance-based nearest neighbor.
To do this, the proposed algorithm is based on radial basis functions (and radial basis function networks) and the results on random fields, given that the regression problem on experimental data is equivalent to removing the influence of this stochastic term over the samples generated from the relationship under study.
After the introduction in Section 1, the paper is organised as follows. Section 2.1 considers the main characteristics and use of radial basis functions in problems of interpolation and regression, while Section 2.2 presents an introduction and some results on random fields. In Section 3, the method proposed by the authors is developed and applied to different problems. Section 4 corresponds to the conclusions and analysis of future investigations.

2. Materials and Methods

2.1. Radial Basis Functions

Radial functions are functions, which depend only on the distance between their arguments:
θ ( x , y ) = Φ ( x y )
These kinds of functions can be used as a basis of functional space (radial basis function), see [9], introducing an interpolation method for any function f on a set of pairs composed by points called centres and the values of the function on that points, { ( c i , f i ) } i = 1 P given by:
f ^ ( x ) = i = 1 P α i Φ ( x c i )
The values of the alpha-coefficients are calculated imposing P conditions given by:
f ^ ( c i ) = f i
This expression can be used to approximate the solution of a differential equation, using by example a Galerking approach [10]. The determination of centres is a central part of the problem to consider, because they are not fixed points as in other methods as finite elements, finite boundaries, etc., so these techniques receive the name of meshless methods. Several works have studied these approaches for different fields (fluid dynamics in [11]) or solving problems related with the corresponding systems as in [12,13].
Another frequent use of radial functions is in the problem of functional where an unknown function f(x) must be approximated from a set of pairs { ( x k , y k ) } k = 1 P provided by discrete measurements, where x corresponds to a vector and y are values related to the value of the function in the x-points. In addition, usually there exists errors between y and the value of f ( x ) , that is y k = f ( x k ) + e k .
The direct use of radial basics function (RBF) as an interpolation method is not adequate here because of the need of exact interpolation on the available points. For solving this issue, some neural networks models are developed [14] with the addition of radial functions [15,16] because they are faster compared to the classical sigmoid neurons. This better behaviour is due to the improvements of the algorithm [17] or the determinations of radial centres [18,19] because it is capable of obtaining arbitrarily good approximations to the functions [20,21]. An additional advantage of these radial basis function networks (RBFN) is the more transparent interpretation of the results compared to the case of multilayer perceptron (MLP) with the non-linearities associated with the sigmoidal functions, [19,20].
Compared with other techniques there are different assets depending on the problem, from very good results [22] to other studies affirming its inferior behaviour [23]. Other authors have also cited the need of a greater number of points to obtain similar accuracies to sigmoid networks [20].
RBFN and variations on them have been used widely to study time series [24]; signal processing [25]; data fitting [26]; system modelling [16,27,28]; system control [29]; system identification [30]; faults analysis [31]; computer vision [32,33,34], and speak recognition [35].
In a neural network with n neurons, the output is:
f o ( x ) = i = 1 n f i   R i ( x )
Following [24], a necessary and sufficient condition for a function to act as an activation function in a RBFN is not to be an even polynomial. A typical but not the only, see [36], selection for these radial functions is a set of Gaussian functions given by:
R i ( x ) = e ( x c i σ i ) 2
Figure 1 represents a diagram of this kind of network.
The use of these kinds of functions was initially applied in the study of some structures of the cerebral cortex [37].
The output is obtained from the parameters { c i ,   σ i , f i } i = 1 n and several algorithms have been developed to obtain an optimum selection of those parameters (learning process). Some of them use a preliminary algorithm to determine the centres c i (genetic algorithms [38,39] or classification trees [40] and the widths σ i of each element of the network [19,31]. However, widths are not usually considered as important as the rest of the parameters, so some authors have focused on the development of algorithms to optimize them [41]. Some more recent papers have used a variety of algorithms to improve the RBFN learning capabilities using hybrid approaches [42], including techniques to reduce the dimensionality as the principal component analysis [43]. Between the modifications of the basic version of RBFN the most developed are the addition of a linear term to improve the behaviour on the regions that are out of the cloud of available points [26]. In addition, the use of a normalising operation defines the normalised radial function neural networks (NRFNN) [44,45], where the network output has the expression:
f o ( x ) = i = 1 n f i · R i ( x ) i = 1 n R i ( x )
NRBFN present some good properties as the partition of unity, covering the whole input space and greater stability with respect to the selection of the network parameters [46].
The use of NRBFN has increased in the last times involving problems as electricity price forecasting [47], non-stationary time series [48], system control [49], system modelling [46], magnetic fields [50], and integral equations [51].
For a general study of the families of models related to (23), papers [52,53] can be of interest.

2.2. Random Fields

Given a domain Ω , a random field defined on it is a collection of random variables defined on each point of Ω .
Random fields are widely used in problems where local correlations exist on data perturbations as speech recognition, computer vision [54,55], statistical analysis of MEG images [56], and magnetic resonances imaging [57]. Some of these treatments are related with the problem of the determination of a threshold value, which has been generalized, as the determination of the frontier in a binary classification from a set of known points [58]. Additional applications are referenced in [59], etc., including particle physics, agriculture, economics, etc.
An interesting further step is the use of random fields and neural networks as considered in [60].
An important part of the investigations is devoted to specific types of random fields: Gaussian random fields, see [61], Markovian random fields, and conditional random fields used in structured learning problems where relationships exist between the regressed variables in different spatial-temporal points, by example, in the case of temporal data [62].
As pointed in [63], some of the considered problems can be seen as an incomplete data problem where part of the relevant data is hidden and must be estimated. Frequently, this estimation is made using maximum-likelihood related algorithms or using the mean field theory approach.
Following [64], let us consider a set of labelled points L × = { ( x 1 1 , , x 1 d , y 1 ) , , ( x l 1 , , x l d , y l ) } and a set of unlabeled points U = { ( x 1 1 , , x 1 d ) , , ( x u 1 , , x u d ) } . The considered problem is to determine a real function defined on the set V = { ( x 1 1 , , x 1 d ) , , ( x l 1 , , x l d ) , ( x l + 1 1 , , x l + 1 d ) , , ( x u 1 , , x u d ) } , f : V that will be used to assign labels to the points in U, under the condition f ( x i ) = y i for x i L . The introduction of a weight function
w i j = e x p ( k = 1 d ( x i k x j k ) 2 σ k 2 )
to preserve ‘local smoothness’ and the definition of an energy function
E ( f ) = 1 2 i , j w i j · ( f ( x i ) f ( x j ) ) 2
gives as a result the harmonic property for the minimum energy function calculated from E(f), that is, the value of f at any unlabeled point can be written as an average of the rest of the points
f ( x j ) = 1 δ j · i j w i j · f ( x i )
In [65], the problem of calculating the regression of a random field from a set of observations is considered. Under the conditions of a finite domain and distribution of the points over that domain, the authors suggest considering the problem as a continuous case.
Other studies of local regression of random fields are considered in [66], where given a realization of a random field { ( x i , y i ) : i N } the problem is the determination of the spatial regression function given by f : x f ( x ) = E [ y i | x i = x ] . An approach to the nonparametric regression can be seen in [67,68], where the regression problem is determining the function f defined on a lattice domain and given by the expression y i = f ( x i ) + e i . The estimation expression for the function f on the domain [ 0 , 1 ] d is related with the expression:
f N ( x ) = i y i · K ( x x i h N ) i K ( x x i h N )
where K is a kernel function and N is a parameter that determines the number of points in the lattice and the value of the parameter h N .
Under some general conditions, the difference between the expected value of f N ( x ) and the value of the function, | E [ f N ( x ) ] f ( x ) | is bounded by the value of h N . Moreover, when h N is fixed as a determined expression of N, the upper bound for the difference | f N ( x ) f ( x ) | is related to ( ln   N ) α / N d . These results are very interesting, but they are obtained using a dataset defined on a lattice.

3. Results

3.1. Radial Kernel Weighted Average

Definition 1.
A parameterized radial function is a function Φ : + × + + characterized by a parameter ω that accomplishes the conditions:
ω 0     lim r Φ ( r , ω ) = 0 r 0     lim ω 0 Φ ( r , ω ) = 0 }
Following the definition, an example of parameterized radial function is the Gaussian:
Φ ( r , ω ) = e ( r ω ) 2
Definition 2.
Given a function f ( x ) defined on a domain, Ω , the Φ -weighted average of f ( x ) in a point x o , c * ( x o ) , is a number implicitly defined as:
Ω [ c * ( x o ) f ( x ) ]   Φ ( x x o , ω )   d x = 0
where Φ ( . , . ) is a parameterized radial function.
Definition 3.
The J-th moment of the radial function Φ is defined as:
W J ( x 0 ) = Ω x x 0 J   · Φ ( x x 0 , ω )   d x
Using this expression, Equation (30) can be written as:
c * ( x o ) · W 0 ( x 0 ) = Ω f ( x ) ·   Φ ( x x 0 , ω )   d x
Definition 4.
Given a domain Ω , the J-th core of width χ (represented as K Ω J ( ω , χ ) ), is the subdomain K Ω J ( ω , χ ) Ω   where x 0 K Ω J ( ω , χ ) , the function W J ( x 0 ) satisfies:
0 W J W J ( x 0 ) χ
where W J = d   x J Φ ( x , ω )   d x .
Definition 5.
In the same conditions of Definition 4, the axial core of width χ (represented as A K Ω ( ω , χ ) ), is the subdomain A K Ω ( ω , χ ) Ω where x 0 A K Ω ( ω , χ ) and for every i = { 1 , 2 , .. , d   } :
| Ω ( x x 0 ) i   · Φ ( x x 0 , ω )   d x     | χ
For practical effects, Expression (33) implies considering W J ( x ) as the constant value W J for points belonging to K Ω J ( ω , χ ) within the precision given by χ . Whereas, (34) is equivalent to using a zero value for the integral inside A K Ω ( ω , χ ) with the same consideration with respect to the precision, properties that will be used to obtain an upper bound for the error of the estimation calculated with the algorithm on points belonging to K Ω 0 ( ω , χ )     A K Ω ( ω , χ ) .

3.2. Interpolation over a Mesh of Finite Elements

Let the decomposition of the domain in N e elements Ω = i = 1 N e Ω i and Q nodes { ζ r } r = 1 Q with the associated shape functions { N r ( x ) } r = 1 Q . Then, the following relations are true:
r = 1 Q N r ( x )   = 1 N r ( ζ S ) = δ r S }
Given a point x o ϵ   Ω l Ω with nodes { ζ r } r = 1 Q and a function f ( x ) , the interpolated value of the function in the point f ^ ( x o ) is defined as:
f ^ ( x o ) = r = 1 Q N r ( x o )   · f ( ζ r )
Definition 6.
Given a function f ( x ) defined on Ω , and a point x o ϵ   Ω l the regression estimation of f ( x ) on x o associated to the radial kernel Φ and the mesh ( { ζ r } r = 1 Q , { Ω i } i = 1 N e ) is defined as the interpolation of the weighted averages of f on the nodes of Ω l :
z ^ ( x o ) = r = 1 Q N r ( x o ) c * ( ζ r )
This expression is closely related with the approaches presented in (23)–(25) and (27).
Definition 7.
Given a function f ( x ) defined on a domain Ω and a random field e ( x ) ~ N ( 0 , σ ( x ) ) defined on Ω , an experimental realisation of f associated with a sample e S ( x ) of e ( x ) is a function y s : Ω , given by:
y s ( x ) = f ( x ) + e S ( x )
Proposition 1.
The expected value (denoted as E [ ] ) of the weighted average corresponding to the values y s in any point x o is:
E [ c * s ( x o ) ] = c * ( x o )
Proof. 
Following Expression (30), the regression of the experimental realisation on each node is:
Ω [ c * s ( x o ) y s ( x ) ] .   Φ ( x x o , ω )   d x = 0
Using (38), the last expression can be written as:
Ω [ c * s ( x o ) f ( x ) ] .   Φ ( x x o , ω )   d x =   Ω e s ( x ) .   Φ ( x x o , ω )   d x
Therefore, by (30) and using the definition (31) of W 0 ( x o ) :
( c * s ( x o ) c * ( x o ) ) · W 0 ( x o ) =   Ω e s ( x ) · Φ ( x x o , ω )   d x
Under the conditions of Definition 7, E [ e S ( x ) ] = 0 , and the expected value of (42) is:
E [ c * s ( x o ) c * ( x o ) ] W 0 ( x o ) = 0
Therefore,
E [ c * s ( x o ) ] = c * ( x o )
 □
Proposition 2.
Let e ( x ) be a distribution not correlated for different points of Ω , E [ e S ( x ) · e S ( y ) ] = σ 2 ( x ) . δ ( x y ) . Then, the variance of the regression corresponding to the values y s is given by:
V a r ( c s ( x o ) c * ( x o ) ) .   ( W 0 ( x o ) ) 2 = Ω σ 2 ( x )   Φ 2 ( x x o , ω )     d x
Proof. 
Taking (42) to the square and calculating the expected values:
E [ ( c * s ( x o ) c * ( x o ) ) 2 ] ( W 0 ( x o ) ) 2 = Ω Ω E [ e S ( x ) e S ( y ) ] Φ ( x x o , ω )   · Φ ( y x o , ω ) d x   d y =   Ω σ 2 ( x )   Φ 2 ( x x o , ω )   d x
That is:
V a r   ( c * s ( x o ) c *   ( x o ) ) .   ( W 0 ( x o ) ) 2 = Ω σ 2 ( x ) · Φ 2 ( x x o , ω )   d x
 □
Proposition 3.
(Upper bound of the error on inner points). Let f be a function f ϵ C 2 ( Ω ) defined on Ω d , x o   K Ω 0 ( ω , χ )     A K Ω ( ω , χ ) and y S a Gaussian experimental realisation of f . Then:
E [ | z ^ S ( x o ) f ( x o ) | ]     M · d · h 2 + 2 π   · σ
Proof. 
From Equation (37):
W 0 ( x 0 ) · [ z ^ S ( x o ) f ( x o ) ] = W 0 ( x 0 ) · [ r = 1 Q N r ( x o ) c * s ( ζ r ) f ( x o ) ]
If the element containing x o accomplishes that Ω i K Ω 0 ( ω , χ ) up to precision χ , W 0 ( x ) W 0 ( ζ r ) W 0 , and using (32) and (41):
W 0 ( x 0 ) [ z ^ S ( x o ) f ( x o ) ] = W 0 ( ζ r ) [ r = 1 Q N r ( x o ) c * s ( ζ r ) f ( x o ) ] = r = 1 Q N r ( x o ) Ω [ y s ( x ) f ( x o ) ]   Φ ( x ζ r , ω ) d x = r = 1 Q N r ( x o ) Ω [ f ( x ) f ( x o ) ] Φ ( x ζ r , ω ) d x + r = 1 Q N r ( x o ) Ω e S ( x ) Φ ( x ζ r , ω ) d x
Developing now f ( x ) around x = x o , the last expression takes the form:
W 0 ( x o ) · ( z ^ S ( x o ) f ( x o ) ) = r = 1 Q N r ( x o ) Ω ( f ( x o ) ( x x o ) + 1 2 D α f ( p r ( x ) ) ( x x o ) α ) ·   Φ ( x ζ r , ω )   d x     + r = 1 Q N r ( x o ) Ω e S ( x ) ·   Φ ( x ζ r , ω )   d x
where α is a two-dimensional multi-index. Writing x x o = x ζ r + ζ r x o , the first part of the integral takes the form:
Ω i f ( x o ) ( x x o ) i ·   Φ ( x ζ r , ω )   d x     = i f ( x o ) Ω ( x ζ r ) i · Φ ( x ζ r , ω )   d x     + i f ( x o ) Ω ( ζ r x o ) i ·   Φ ( x ζ r , ω )   d x
If that ζ r A K Ω ( ω , χ ) the first integral is approximately zero, while for the second:
r = 1 Q N r ( x o )   · i f ( x o ) ( ζ r x o ) i · W 0 ( ζ r ) = W 0 ( x o ) · i f ( x o ) · r = 1 Q N r ( x o )   ( ζ r x o ) i = 0
That is:
W 0 ( x o ) · ( z ^ S ( x o ) f ( x o ) ) = r = 1 Q N r ( x o ) 1 2 Ω D α f ( p r ( x ) ) ( x x o ) α ·   Φ ( x ζ r , ω )   d x     + r = 1 Q N r ( x o ) Ω e S ( x ) ·   Φ ( x ζ r , ω )   d x
Taking the absolute value:
| W 0 ( x o ) · ( z ^ S ( x o ) f ( x o ) ) | 1 2 r = 1 Q N r ( x o ) Ω D α f ( p r ( x ) ) · ( x x o ) α · Φ ( x ζ r , ω )   d t   +   r = 1 Q N r ( x o ) Ω | e s   ( x ) | ·   Φ ( x ζ r , ω ) d x
If M is an upper bound for the norm of the hessian matrix:
W 0 · | z ^ S ( x o ) f ( x o ) | 1 2 r = 1 Q N r ( x o ) Ω M · x x o 2 ·   Φ ( x ζ r , ω )   d t   W 0 · 1 2 M · d · h 2 +   r = 1 Q N r ( x o ) Ω | e s   ( x ) | ·   Φ ( x ζ r , ω ) d x
For the last integral, the expected value is:
E [ Ω | e s   ( x ) | ·   Φ ( x ζ r , ω ) d x ] = Ω 0 | ϵ | · π ( x , | ϵ | ) ·   Φ ( x ζ r , ω ) d e   d x
where π ( x , | ϵ | ) is the probability of an error ϵ in the point given by x. In the case when this probability is independent of the point π ( x , | ϵ | ) = π ( | ϵ | ) :
E [ Ω | e s   ( x ) | ·   Φ ( x ζ r , ω ) d t ] = W 0 ( ζ r ) · 0 | ϵ | · π ( | ϵ | ) d ϵ = W 0 · 0 | ϵ | · π ( | ϵ | ) d ϵ
In the case of a Gaussian distribution of ϵ :
E [ | z ^ S ( x o ) f ( x o ) | ] 1 2 M · d · h 2 +   2 π · σ
 □

3.3. Computational Algorithm

A natural selection for the parameter ω of the radial function is the value corresponding to the element width h. In addition, in real cases, complete sets of values of y S ( x ) are not available, so one must use just a subset of P points. Then, from this subsample, the integrals are calculated using finite sums, so:
Ω   Φ ( x x o , ω )     d x μ ( Ω ) P · k = 1 P Φ ( x k x o , h )
Ω   y s ( x ) · Φ ( x x o , ω )     d x μ ( Ω ) P · k = 1 P y k S · Φ ( x k x o , h )
Now, the finite sample version of (40) is:
k = 1 P [ c S ( x o ) y k S ] ·   Φ ( x k x o , h ) = 0
Following Expression (62), the proposed algorithm can be condensed in the next schema (Algorithm 1):
Algorithm 1.
1. Begin
2. Step 1: for i =1:P do
   3. Initialize the point Q_1 with X[i].
   4. Initialize estimation[i] with 0.
   5. for k = 1:2dimension do
     6. Initialize the point Q_node with Node[k].
     7. Initialize dist and nodeEstim with 0.
     6. for j = 1:P do
       7. Initialize the point Q_2 with X[j].
       8. Increment dist with Radial_Kernel(Q_node,Q_2)
       9. Increment nodeEstim with Z[j] * Radial_Kernel(Q_node,Q_2).
     10. end for j
   11. end for k
   12. Update the estimate node as nodeEstim = nodeEstim/dist.
   13. Increment estimation[i] with nodeEstim * ShapeFunction(k,Q_1).
14. end for i
15. End.
The calculation of radial kernels in lines 9 and 10 for a radial function as that in Equation (29), dimension d implies the sums of the d-coordinate differences r i = x i x k i . The loop from 7 to 10 iterates over the number of points P, so given that these steps are repeated in the 2 d nodes of the finite element for each objective points, the computational time of the algorithm is t c ~ O ( P 2 · 2 d · d ) . The memory complexity corresponds to saving the experimental points, that is, m ~ O ( P ) . An analysis of the relation between execution times and number of points will be considered in the applications section.
The analysis developed above shows clearly the improvement of the current algorithm compared with those presented in [4,5] where the dominant computational orders were O ( ( c + 1 ) 3 d ) and O ( ( c + 1 ) d ) . Remembering that the complexity is related with the precision of the estimation (without considering overfitting problems), the previous methodologies were highly limited in their application by the dimensionality of the problem. However, this limit disappears in this new approach, with the number of points being the main limiting factor.
Let us consider a normalised dataset. Then, the models are determined by the parameter h = 1 c that represents the length of the edge of the hypercubes that form the discretization of the domain Ω . Each value of c (complexity) determines a unique model, but following (37), when h 0 , the regression estimation accomplishes that:
z ^ ( x o ) c * s ( x o )
In that limit, z ^ ( x o ) corresponds to a weighted average of the K-nearest points to x o , with K depending on the parameter ω in Φ , which has been fixed as ω = h .
Let us consider the case of an estimation on a point ( x 0 1 , x 0 2 , .. , x 0 d ) , based on the model defined by a dataset { ( x k 1 , x k 2 , .. , x k d ; y k ) } k = 0 P containing x o . For commodity, also let the dataset be ordered with respect to the distance to x o , with Φ i = Φ ( x i x o , h ) the parameterized radial function. The weighted average regression is then obtained as:
c * s ( x o ) · ( 1 + k = 1 P   Φ k ) = y 0 S + k = 1 P y k S · Φ k
However, if the estimation is calculated without using the self x o , the restricted weighted average regression c r * s takes the form:
c r * s ( x o ) · k = 1 P   Φ k = k = 1 P y k S ·   Φ k
Therefore, there exists a relationship between both expressions:
c * s ( x o ) · ( 1 + k = 1 P   Φ k ) = y 0 S + c r * s ( x o ) · k = 1 P   Φ k
If the model dataset has a low density, the sum vanishes compared with the constant terms, obtaining:
c * s ( x o ) y 0 S
Joining (63) and (67), in the limit h 0 , the result of the estimation tends to z ^ ( x o ) y 0 S , so there will be a trend to the overfitting as the complexity grows in the full model.
The restricted model obtained from Equation (65) does not incorporate the information about the experimental value, if present, on the objective point, so the model is not using all the available information and it is obviously under fitted.

3.4. Applications

To test the algorithm behaviour, several problems have been selected from the UCI Machine Learning Datasets [69] to compare the results with those presented in other papers obtained using different techniques. UCI is a well-known repository and widely used data source for problems related to machine learning, regression, and time series prediction.
The fit of a model can be measured using different parameters, where y i is the observed and y i ˜ is the estimated value:
  • R2 coefficient: The strict use of R2 is limited to the linear regression. Out of this case, the value of R2 can vary in the interval ] , 1 ] . However, the expression that defines R2 is related to the other family of quality indicators as Legates-McCabe, Nash and Sutcliffe, and Willmott:
    R 2 = 1 i = 1 P ( y i y i ˜ ) 2 i = 1 P ( y i E [ y ] ) 2
  • Mean squared error (MSE): This indicator is associated to the second moment of the error, incorporating information about its variance and bias. Its principal problem is the overweighting of outliers, so usually, other parameters as MAE are preferred:
    M S E = 1 P · i = 1 P ( y i y i ˜ ) 2
  • Root mean squared error (RMSE): Squared root of MSE. Its main advantage, with respect to it, is that it is measured in the same units of the predicted variable. However, it suffers of the same problems with respect to the outlier influence:
    R M S E = 1 P · i = 1 P ( y i y i ˜ ) 2
  • Mean absolute error (MAE): The best characteristic of MAE compared to RMSE is the equal weight of the deviations on the error. The ratio RMSE/MAE serves as an indication of the outlier presence on the sample:
    M A E = 1 P · i = 1 P | y i y i ˜ |
  • Mean absolute percentage error (MAPE): This parameter is frequently used by its simple interpretation as a relative error. However, it presents a bias that tends to select the model with smaller forecasts. One option could be the symmetric MAPE, but is not so widely used:
    M A P E = 100 P · i = 1 P | y i y i ˜ y i |
  • Regression error characteristic (REC) curve: It is a curve that obtains plotting the error tolerance on the X-axis versus the percentage of points predicted within the tolerance on the Y-axis.
The proposed methodology is a deterministic algorithm and the obtained model (and in consequence the quality parameters) depends only on the available experimental data, so there is no difference in them between several runnings of the model.

3.4.1. Air foil Self Noise

The study of self-noise is an active topic in aerodynamics with a great number of research papers devoted to its prediction or improving the design of air foils and other structures. The phenomena related to this problem are complex involving vortex, turbulence, and flows in the linear-limit.
The NASA “Air foil Self Noise” dataset [70,71] was obtained from a series of aerodynamic and acoustic tests of two and three-dimensional air foil sections conducted in an anechoic wind tunnel. The data set comprises different size NACA 0012 air foils (with sizes from 2.5 to 61 cm) at various wind tunnel speeds (up to Mach 0.21) and angles of attack (from 0 to 25.2°). The span of the air foil and the observer position were the same in all of the experiments. The measurements were made in a wind tunnel.
The data is composed of 1503 rows with five independent variables to determine the sound pressure level as indicated in Table 1.
Table 2 shows the parameters of quality for the models obtained in previous research using this dataset. The techniques are linear regression (LR), decision tree (DT), support vector machine regression (SVMR), random forest regression (RF), AdaBoost regression (ABR), XGBoost regression (XGBR), several types of neural networks (NN), and Stochastic Gradient Tree Boosting (SGTB).
The first step is to study the behaviour of a cross validation using (80% of the data as train and 20% as test sets) and 100 iterations. The result for different values of the complexity is shown in Figure 2.
Apart from the expected result of a smaller error in the train subsets, an important behaviour can be observed in the MAE corresponding to the test subsamples with a stable value further than complexity 90 (with a mean MAE of 1.958). This value corresponds to a complexity close to 30 on the train subsamples.
The next step is the study of the models generated by the full dataset to test if this trend is verified. To do this, for each complexity two estimations have been done for the points of the dataset, corresponding to the full and the restricted estimations introduced in Equations (64) and (65). The results of MAE, MAPE, and R2 coefficients are shown in Figure 3.
The behaviour is consistent with the one observed in Figure 2, with a stable value in MAE around complexity 100 and MAPE from complexity 80. In addition, it is clear how the full model has diminishing values for MAE and MAPE coefficients when the complexity grows (making evident in that case the existence of overfitting because improvements in the full model are not parallel to improvements in the values of the restricted model).
Let us then study the characteristics of the model with complexity 90.
Figure 4 shows the errors as pairs observed/estimated for the full and restricted models for complexity 90.
The parameters of quality for this model are shown in Table 3.
The results are comparable to those presented in Table 2 corresponding to previous papers.
The computation time over 10 iterations has been measured for different complexities, obtaining a result shown on Figure 5. Given the expression of the computational order O ( P 2 · 2 d · d ) no dependence on complexity is expected, a result that is ratified by the plot results.
To confirm the quadratic dependence on the number of points O ( P 2 · 2 d · d ) , a study of the computing time for different values of P in the dataset, for a high complexity (in this study complexity 100 has been selected) has been realised. The result for the value of the variable has been obtained, the computing time divided by the square of the number of points is shown in Figure 6.
The constant behaviour at a greater number of points of the last figure certifies the hypothesis introduced on the dependence t c ~ P 2 .

3.4.2. Combined Cycle Power

The problem of effective prediction of power output for electric power plants is of great importance in order to optimize the economic profit by the generated megawatt. For a gas turbine, the relationship between the power available under full load conditions and variables as temperature, pressure, relative humidity, and the exhaust vacuum measured from steam turbine, is complex enough to be studied using machine learning methods. Specifically, given the diversity in the values of the different periods (summer and winter) regression and prediction methods based on locality will have a better performance. This makes this dataset a good benchmark for the current methodology.
The dataset contains 9568 data points collected from a combined Cycle Power Plant over six years (2006–2011), when the power plant was set to work with a full load. The variables that correspond to the hourly average values are presented in Table 4.
For additional details, consider the original papers [76,77]. The models considered in [76] are: (i) Least median square (LMS); (ii) support vector poly kernel regression (SMOReg); (iii) KStar (K *); (iv) bagging REP tree (BREP); (v) model trees rules (M5R); (vi) model trees regression (M5P) and; (vii) reduced error pruning (REP) trees. The results presented in Table 5 for [77] are: (i) k-nearest neighbour (k-NN) smoother + feedforward error backpropagating ANN and; (ii) the best of several combinations of K-means clustering + feedforward error backpropagating ANN.
Following the steps done in the first example, the cross validation results can be seen in Figure 7.
Now, the value of the stabilised MAE for the test subsamples (calculated from complexities above 70) is 2.479, corresponding to a complexity of 50 in the train subsample. Running the full and restricted models for complexities between 10 and 200, it can be seen that the stable behaviour for MAE and MAPE starts before that value, and can be fixed around complexity 70. Figure 8 shows the comparative of the quality parameters (MAE, MAPE, and R2) for the full and restricted models on the dataset, for different complexities from 10 to 200.
To be conservative with respect to the possibility of overfitting, complexity 60 is selected to run the final model. The corresponding plot of estimated versus experimental values can be seen in Figure 9.
The parameters of quality for this model are shown in Table 6.
The computation time has been also obtained from 10 iterations for complexities between 10 and 200, showing its independence in Figure 10.
Again, a representation of t c / P 2 for complexity 100 is shown in Figure 11.

4. Discussion

Several questions must be considered in this step. Firstly, the presented algorithm continues the line of work developed by the authors in previous papers, see [3,4,78], solving the common bottleneck caused by the dependence of the time of computation on the complexity of the model at orders O ( ( c + 1 ) 3 d ) and O ( ( c + 1 ) d ) . Now, the dependence on the dimension has an expression of d · 2 d . This represents an important advance, allowing the treatment of a greater number of problems.
Secondly, given that for complexities less than 100, (for the most part of the studied problems this value is usually around 70–80), there exists a direct relationship between the complexity of the model and the precision of the estimation, the independence of the computing time of current algorithm from this parameter is an important result. However, the problem has been now displaced to the number of points, which contribution to the total computing time was hidden in the previous methodologies. Therefore, the two new challenges for future investigations will be the use of more efficient algorithms and data structures to improve the dependence of the number of points and the dimensionality of the problem that is present as d · 2 d .
On the other side, the characteristics of the methodology allow a cheap parallel calculation of the estimated values for points belonging to the sample that determines the model in two different ways: The full and the restricted models. The study of their behaviour for different complexities allows the inclusion of a benchmark to capture the overfitting in the model. Overfitting of models is a very common problem that appears where the model tends to treat in-model and out-of-model points in a different way causing the estimations to be useless. Following in this direction, considering the development of algorithms for automatic detection of overfitting, and the selection of optimum complexity will be another pending task for future investigations. Considering that the process of selecting the most adequate complexity for the model is equivalent to optimising the error of a weighted combination of both models, its automation seems to be a feasible problem.
However, at the moment, obtaining an adequate combination of the full and restricted models introduced in 4.3 is a complex problem. This makes the comparisons with other available methodologies diffuse because the quality parameters of the optimal model would be an intermediate case between those corresponding to the full and restricted cases.
Another characteristic that could be considered as a distinctive aspect of the methodology is its ‘liquidity’, in the sense of the absence of a model as such, that is, the methodology does not generate any additional structure beyond the estimations and the optimum value for the h parameter. Thus, it can also be viewed as a positive aspect.
Finally, the proposed methodology has been used on well-known problems in order to compare the results with those obtained in other papers and methods.
For the Air Foil Self Noise dataset, the quality parameters of the restricted model, considered as the worst prediction are outperformed by decision trees, XGBoost regression, and some models of NN. However, the full model, considered as the most optimistic model (in occasions presenting overfitting) obtain results similar to the best available models.
In the case of Combined Cycle Power dataset, both models (full and restricted) obtain better quality parameters than the methodologies previously considered. This very interesting result should be studied more in depth to understand it, especially in the case of the restricted model.
In addition, a wider use of the regression method must be faced by increasing the number of studied datasets, and comparing the results with other available methods, especially those based on radial basis functions, to compare their performance.

Author Contributions

Conceptualization, F.J.N.-G. and Y.V.; methodology, Y.V. and F.J.N.-G.; software, F.J.N.-G. and M.C.-M.; validation, Y.V. and F.J.N.-G.; formal analysis, F.J.N.-G., Y.V., M.C.-M. and S.I.; investigation, F.J.N.-G., Y.V., M.C.-M. and S.I.; data curation, F.J.N.-G. and Y.V.; writing—original draft preparation, F.J.N.-G., Y.V., M.C.-M. and S.I.; writing—review and editing, F.J.N.-G., Y.V., M.C.-M. and S.I.; supervision, Y.V. and S.I.; project administration, Y.V. and S.I.; funding acquisition, S.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially funded by the Spanish Ministry of Science, Innovation and Universities, grant number RTI2018-101148-B-I00.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gallagher, R.H. Finite element analysis: Fundamentals. Int. J. Numer. Methods Eng. 1975, 9, 732. [Google Scholar] [CrossRef]
  2. Brenner, S.; Scott, R. The Mathematical Theory of Finite Element Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  3. Navarro-González, F.J.; Villacampa, Y. A new methodology for complex systems using n-dimensional finite elements. Adv. Eng. Softw. 2012, 48, 52–57. [Google Scholar] [CrossRef]
  4. Navarro-González, F.J.; Villacampa, Y. Generation of representation models for complex systems using Lagrangian functions. Adv. Eng. Softw. 2013, 64, 33–37. [Google Scholar] [CrossRef]
  5. Navarro-González, F.J.; Villacampa, Y. A finite element numerical algorithm for modelling and data fitting in complex systems. Int. J. Comput. Methods Exp. Meas. 2016, 4, 100–113. [Google Scholar] [CrossRef] [Green Version]
  6. Palazón, A.; López, I.; Aragonés, L.; Villacampa, Y.; Navarro-González, F.J. Modelling of Escherichia coli concentrations in bathing water at microtidal coasts. Sci. Total Environ. 2017, 593–594, 173–181. [Google Scholar] [CrossRef] [Green Version]
  7. Aragonés, L.; Pagán, J.I.; López, I.; Navarro-González, F.J. Galerkin’s formulation of the finite elements method to obtain the depth of closure. Sci. Total Environ. 2019, 660, 1256–1263. [Google Scholar] [CrossRef]
  8. Migallón, V.; Navarro-González, F.; Penadés, J.; Villacampa, Y. Parallel approach of a Galerkin-based methodology for predicting the compressive strength of the lightweight aggregate concrete. Constr. Build. Mater. 2019, 219, 56–68. [Google Scholar] [CrossRef] [Green Version]
  9. Buhmann, M.D. Radial Basis Functions. Acta Numer. 2000, 9, 1–38. [Google Scholar] [CrossRef] [Green Version]
  10. Mai-Duy, N.; Tran-Cong, T. Approximation of function and its derivatives using radial basis function networks. Appl. Math. Model. 2003, 27, 197–220. [Google Scholar] [CrossRef] [Green Version]
  11. Kansa, E.J. Multiquadrics-A scattered data approximation scheme with applications to computational fluid-dynamics-II solutions to parabolic, hyperbolic and elliptic partial differential equations. Comput. Math. Appl. 1990, 19, 147–161. [Google Scholar] [CrossRef] [Green Version]
  12. Kansa, E.J.; Hon, Y.C. Circumventing the ill-conditioning problem with multiquadric radial basis functions: Applications to elliptic partial differential equations. Comput. Math. Appl. 2000, 39, 123–137. [Google Scholar] [CrossRef] [Green Version]
  13. Schaback, R.; Wendland, H. Adaptive greedy techniques for approximate solution of large RBF systems. Numer. Algorithms 1999, 24, 239–254. [Google Scholar] [CrossRef]
  14. Broomhead, D.S.; Lowe, D. Multivariable Functional Interpolation and Adaptive Networks. Complex Syst. 1988, 2, 321–355. [Google Scholar]
  15. Orr, M.J.L. Introduction to Radial Basis Function Networks; Center for Cognitive Science, Univ. of Edinburgh: Edinburgh, UK, 1996. [Google Scholar]
  16. Elanayar, S.; Shin, Y.C. Radial Basis Function Neural Network for Approximation and Estimation of Nonlinear Stochastic Dynamic Systems. IEEE Trans. Neural Netw. 1994, 5, 594–603. [Google Scholar] [CrossRef] [Green Version]
  17. Saha, A.; Keeler, J.D. Algorithms for better representation and faster learning in radial basis function networks. In Advances in Neural Information Processing Systems, 2nd ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1990; pp. 482–489. [Google Scholar]
  18. Wettschereck, D.; Dietterich, T. Improving the performance of radial basis function networks by learning center locations. In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1992; pp. 1133–1140. [Google Scholar]
  19. Gomm, J.B.; Yu, D.L. Selecting radial basis function network centers with recursive orthogonal least squares training. IEEE Trans. Neural Netw. 2000, 11, 306–314. [Google Scholar] [CrossRef]
  20. Park, J.; Sandberg, I.W. Universal Approximation Using Radial-Basis-Function Networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef]
  21. Poggio, T.; Girosi, F. Regularization algorithms for learning that are equivalent to multilayer networks. Science 1990, 247, 978–982. [Google Scholar] [CrossRef] [Green Version]
  22. Howell, A.J.; Buxton, H. Learning identity with radial basis function networks. Neurocomputing 1998, 20, 15–34. [Google Scholar] [CrossRef]
  23. Scholkopf, B.; Sung Kah-Kay Burges, C.J.; Girosi, F.; Niyogi, P.; Poggio, T.; Vapnik, V. Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers. IEEE Trans. Signal Process. 1997, 45, 2758–2765. [Google Scholar] [CrossRef] [Green Version]
  24. Chng, E.S.; Chen, S.; Mulgrew, B. Gradient radial basis function networks for nonlinear and nonstationary time series prediction. IEEE Trans. Neural Netw. 1996, 7, 190–194. [Google Scholar] [CrossRef] [Green Version]
  25. Chen, S. Nonlinear time series modelling and prediction using Gaussian RBF networks with enhanced clustering and RLS learning. Electron. Lett. 1995, 31, 117–118. [Google Scholar] [CrossRef] [Green Version]
  26. Li, M.M.; Verma, B. Nonlinear curve fitting to stopping power data using RBF neural networks. Expert Syst. Appl. 2016, 45, 161–171. [Google Scholar] [CrossRef]
  27. Han, H.G.; Chen, Q.L.; Qiao, J.F. An efficient self-organizing RBF neural network for water quality prediction. Neural Netw. 2011, 24, 717–725. [Google Scholar] [CrossRef] [PubMed]
  28. Yilmaz, I.; Kaynar, O. Multiple regression, ANN (RBF, MLP) and ANFIS models for prediction of swell potential of clayey soils. Expert Syst. Appl. 2011, 38, 5958–5966. [Google Scholar] [CrossRef]
  29. Fabri, S.; Kadirkamanathan, V. Dynamic structure neural networks for stable adaptive control of nonlinear systems. IEEE Trans. Neural Netw. 1996, 7, 1151–1167. [Google Scholar] [CrossRef]
  30. Chen, S.; Billings, S.A.; Grant, P.M. Recursive hybrid algorithm for non-linear system identification using radial basis function networks. Int. J. Control. 1992, 55, 1051–1070. [Google Scholar] [CrossRef]
  31. Yu, D.L.; Gomm, J.B.; Williams, D. Sensor fault diagnosis in a chemical process via RBF neural networks. Control Eng. Pract. 1999, 7, 49–55. [Google Scholar] [CrossRef]
  32. Yang, F.; Paindavoine, M. Implementation of an RBF Neural Network on Embedded Systems: Real-Time Face Tracking and Identity Verification. IEEE Trans. Neural Netw. 2003, 14, 1162–1175. [Google Scholar] [CrossRef]
  33. Tsai, Y.-T.; Shih, Z.-C. All-frequency precomputed radiance transfer using spherical radial basis functions and clustered tensor approximation. ACM Trans. Graph. 2006, 25, 967–976. [Google Scholar] [CrossRef]
  34. Cho, S.-Y.; Chow, T.W.S. Neural computation approach for developing a 3D shape reconstruction model. IEEE Trans. Neural Netw. 2001, 12, 1204–1214. [Google Scholar]
  35. Oglesby, J.; Mason, J.S. Radial basis function networks for speaker recognition. In Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing, Toronto, ON, Canada, 14–17 April 1991; pp. 393–396. [Google Scholar]
  36. Schilling, R.J.; Carroll, J.J.; Al-Ajlouni, A.F. Approximation of nonlinear systems with radial basis function neural networks. IEEE Trans. Neural Netw. 2001, 12, 1–15. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Jang, J.-S.; Sun, C.-T. Functional equivalence between radial basis function networks and fuzzy inference systems. IEEE Trans. Neural Netw. 1993, 4, 156–159. [Google Scholar] [CrossRef] [PubMed]
  38. Billings, S.A.; Zheng, G.L. Radial basis function network configuration using genetic algorithms. Neural Netw. 1995, 8, 877–890. [Google Scholar] [CrossRef]
  39. Chen, S.; Wu, Y.; Luk, B.L. Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks. IEEE Trans. Neural Netw. 1999, 10, 1239–1243. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Kubat, M. Decision trees can initialize radial-basis function networks. IEEE Trans. Neural Netw. 1998, 9, 813–821. [Google Scholar] [CrossRef]
  41. Benoudjit, N.; Archambeau, C.; Lendasse, A.; Lee, J.A.; Verleysen, M. Width optimization of the Gaussian kernels in Radial basis function networks. In Proceedings of the ESANN 2002, 10th Eurorean Symposium on Artificial Neural Networks, Bruges, Belgium, 24–26 April 2002. [Google Scholar]
  42. González, J.; Rojas, I.; Ortega, J.; Pomares, H.; Fernandez, F.J.; Diaz, A.F. Multiobjective evolutionary optimization of the size, shape, and position parameters of radial basis function networks for function approximation. IEEE Trans. Neural Netw. 2003, 14, 1478–1495. [Google Scholar] [CrossRef] [PubMed]
  43. Er, M.J.; Wu, S.; Lu, J.; Toh, H.L. Face recognition with radial basis function (RBF) neural networks. IEEE Trans. Neural Netw. 2002, 13, 697–710. [Google Scholar]
  44. Bugmann, G. Normalized Gaussian radial basis function networks. Neurocomputing 1998, 20, 97–110. [Google Scholar] [CrossRef]
  45. Hofmann, S.; Treichl, T.; Schroder, D. Identification and observation of mechatronic systems including multidimensional nonlinear dynamic functions. In Proceedings of the 7th International Workshop on Advanced Motion Control, Maribor, Slovenia, 3–5 July 2002; pp. 285–290. [Google Scholar]
  46. Saha, P.; Tarafdar, D.; Pal, S.K.; Srivastava, A.K.; Das, K. Modelling of wire electro-discharge machining of TiC/Fe in situ metal matrix composite using normalized RBFN with enhanced k-means clustering technique. Int. J. Adv. Manuf. Technol. 2009, 43, 107–116. [Google Scholar] [CrossRef]
  47. Mori, H.; Awata, A. Data mining of electricity price forecasting with regression tree and normalized radial basis function network. In Proceedings of the 2007 IEEE International Conference on Systems, Man and Cybernetics, Montreal, QC, Canada, 7–10 October 2007; pp. 3743–3748. [Google Scholar]
  48. Deco, G.; Neuneier, R.; Schümann, B. Non-parametric data selection for neural learning in non-stationary time series. Neural Netw. 1997, 10, 401–407. [Google Scholar] [CrossRef]
  49. Deng, H.; Li, H.-X.; Wu, Y.-H. Feedback-linearization-based neural adaptive control for unknown nonaffine nonlinear discrete-time systems. IEEE Trans. Neural Netw. 2008, 19, 1615–1625. [Google Scholar] [CrossRef] [PubMed]
  50. Ranković, V.; Radulović, J. Prediction of magnetic field near power lines by normalized radial basis function network. Adv. Eng. Softw. 2011, 42, 934–938. [Google Scholar] [CrossRef]
  51. Golbabai, A.; Seifollahi, S.; Javidi, M. Normalized RBF networks: Application to a system of integral equations. Phys. Scr. 2008, 78, 15008. [Google Scholar] [CrossRef]
  52. Nelles, O. Axes-oblique partitioning strategies for local model networks. In Proceedings of the 2006 IEEE Conference on Computer Aided Control System Design, 2006 IEEE International Conference on Control Applications, 2006 IEEE International Symposium on Intelligent Control, Munich, Germany, 4–6 October 2006; pp. 2378–2383. [Google Scholar]
  53. Hartmann, B.; Nelles, O. On the smoothness in local model networks. In Proceedings of the 2009 American Control Conference, St. Louis, MO, USA, 10–12 June 2009; pp. 3573–3578. [Google Scholar]
  54. Brett, M.; Penny, W.D.; Kiebel, S. Human Brain Function II, an Introduction to Random Field Theory; Academic Press: London, UK, 2003. [Google Scholar]
  55. Pieczynski, W.; Tebbache, A.-N. Pairwise Markov random fields and segmentation of textured images. Mach. Graph. Vis. 2000, 9, 705–718. [Google Scholar]
  56. Pantazis, D.; Nichols, T.E.; Baillet, S.; Leahy, R.M. A comparison of random field theory and permutation methods for the statistical analysis of MEG data. NeuroImage 2005, 25, 383–394. [Google Scholar] [CrossRef]
  57. Zhang, Y.; Brady, M.; Smith, S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 2001, 20, 45–57. [Google Scholar] [CrossRef]
  58. De Oliveira, V. Bayesian prediction of clipped Gaussian random fields. Comput. Stat. Data Anal. 2000, 34, 299–314. [Google Scholar] [CrossRef]
  59. Adler, R.J.; Taylor, J.E. Random Fields and Geometry; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  60. Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H. Conditional random fields as recurrent neural networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
  61. Abrahamsen, P. A Review of Gaussian Random Fields and Correlation Functions. 1997. Available online: https://www.researchgate.net/profile/Petter_Abrahamsen/publication/335490333_A_Review_of_Gaussian_Random_Fields_and_Correlation_Functions/links/5d68cc0692851c154cc5b9bd/A-Review-of-Gaussian-Random-Fields-and-Correlation-Functions.pdf (accessed on 17 September 2020). [CrossRef]
  62. Radosavljevic, V.; Vucetic, S.; Obradovic, Z. Continuous Conditional Random Fields for Regression in Remote Sensing. In Proceedings of the ECAI 2010-19th European Conference on Artificial Intelligence, Lisbon, Portugal, 16–20 August 2010; pp. 809–814. [Google Scholar]
  63. Zhang, J. The mean field theory in EM procedures for Markov random fields. IEEE Trans. Signal Process. 1992, 40, 2570–2583. [Google Scholar] [CrossRef]
  64. Zhu, X.; Ghahramani, Z.; Lafferty, J.D. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the Twentieth International Conference Machine Learning (ICML 2003), Washington, DC, USA, 21–24 August 2003; pp. 912–919. [Google Scholar]
  65. Cohen, A.; Jones, R.H. Regression on a random field. J. Am. Stat. Assoc. 1969, 64, 1172–1182. [Google Scholar] [CrossRef]
  66. Hallin, M.; Lu, Z.; Tran, L.T. Local linear spatial regression. Ann. Stat. 2004, 32, 2469–2500. [Google Scholar] [CrossRef] [Green Version]
  67. Carbon, M.; Francq, C.; Tran, L.T. Kernel regression estimation for random fields. J. Stat. Plan. Inference 2007, 137, 778–798. [Google Scholar] [CrossRef]
  68. El Machkouri, M. Nonparametric regression estimation for random fields in a fixed-design. Stat. Inference Stoch. Process. 2007, 10, 29–47. [Google Scholar] [CrossRef] [Green Version]
  69. Dua, D.; Graff, C. UCI Machine Learning Repository. 2019. Available online: https://archive.ics.uci.edu/ml/citation_policy.html (accessed on 19 December 2019).
  70. Brooks, T.F.; Pope, D.S.; Marcolini, M.A. Airfoil Self-Noise and Prediction; NASA: Washington, DC, USA, 1989.
  71. NASA-Airfoil Noise, Revisiting Machine Learning Datasets. 2018. Available online: https://www.simonwenkel.com/2018/11/06/revisiting-ml-NASA-airfoil-noise.html (accessed on 20 September 2006).
  72. Patri, A.; Patnaik, Y. Random forest and stochastic gradient tree boosting based approach for the prediction of airfoil self-noise. Procedia Comput. Sci. 2015, 46, 109–121. [Google Scholar] [CrossRef] [Green Version]
  73. The NASA Data Set, Machine Learning Examples. Predict the Noise Generated by Airfoil Blades. 2018. Available online: https://www.neuraldesigner.com/learning/examples/airfoil-self-noise-prediction (accessed on 19 December 2019).
  74. Gonzalez, R.L. Neural Networks for Variational Problems in Engineering. Ph.D. Dissertation, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain, 2009. [Google Scholar]
  75. Errasquin, L. Airfoil Self-Noise Prediction Using Neural Networks for Wind Turbines. Ph.D. Dissertation, Virginia Tech, Blacksburg, VA, USA, 2009. [Google Scholar]
  76. Tüfekci, P. Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. Int. J. Electr. Power Energy Syst. 2014, 60, 126–140. [Google Scholar] [CrossRef]
  77. Kaya, H.; Tüfekci, P.; Gürgen, S.F. Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine. In Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering (ICETCEE 2012), Dubai, UAE, 24–25 March 2012. [Google Scholar]
  78. Navarro-González, F.J.; Compañ, P.; Satorre, R.; Villacampa, Y. Numerical determination for solving the symmetric eigenvector problem using genetic algorithm. Appl. Math. Model. 2016, 40, 4935–4947. [Google Scholar] [CrossRef]
Figure 1. Radial basis function network as a set of Gaussian functions.
Figure 1. Radial basis function network as a set of Gaussian functions.
Mathematics 08 01600 g001
Figure 2. Mean absolute error (MAE) of train and test samples on cross validations with a train size of 80% for complexities between 10 and 300.
Figure 2. Mean absolute error (MAE) of train and test samples on cross validations with a train size of 80% for complexities between 10 and 300.
Mathematics 08 01600 g002
Figure 3. Quality parameters for the full and restricted models on the dataset, for different complexities from 10 to 200. (a) MAE. (b) Mean absolute percentage error (MAPE). (c) R2.
Figure 3. Quality parameters for the full and restricted models on the dataset, for different complexities from 10 to 200. (a) MAE. (b) Mean absolute percentage error (MAPE). (c) R2.
Mathematics 08 01600 g003aMathematics 08 01600 g003b
Figure 4. Results for full (solid) and restricted models (x-marks) for complexity 90. (a) Estimated vs. experimental values. (b) Errors vs. experimental. (c) Regression error characteristic (REC) curves of full, restricted, and null (mean) models.
Figure 4. Results for full (solid) and restricted models (x-marks) for complexity 90. (a) Estimated vs. experimental values. (b) Errors vs. experimental. (c) Regression error characteristic (REC) curves of full, restricted, and null (mean) models.
Mathematics 08 01600 g004aMathematics 08 01600 g004b
Figure 5. Computing time (in seconds) as a function of the complexity for the air foil self noise problem.
Figure 5. Computing time (in seconds) as a function of the complexity for the air foil self noise problem.
Mathematics 08 01600 g005
Figure 6. Relative computing time over the squared number of points (time over 1002 points as unit) as a function of the problem size for complexity 100.
Figure 6. Relative computing time over the squared number of points (time over 1002 points as unit) as a function of the problem size for complexity 100.
Mathematics 08 01600 g006
Figure 7. MAE of train and test samples on cross validations with a train size of 80% for complexities between 10 and 300.
Figure 7. MAE of train and test samples on cross validations with a train size of 80% for complexities between 10 and 300.
Mathematics 08 01600 g007
Figure 8. Quality parameters for the full and restricted models on the dataset, for different complexities from 10 to 200. (a) MAE. (b) MAPE. (c) R2.
Figure 8. Quality parameters for the full and restricted models on the dataset, for different complexities from 10 to 200. (a) MAE. (b) MAPE. (c) R2.
Mathematics 08 01600 g008
Figure 9. Results for full (solid) and restricted models (x-marks) for complexity 70. (a) Estimated vs. experimental values. (b) Errors vs. experimental. (c) REC curves of full, restricted, and null (mean) models.
Figure 9. Results for full (solid) and restricted models (x-marks) for complexity 70. (a) Estimated vs. experimental values. (b) Errors vs. experimental. (c) REC curves of full, restricted, and null (mean) models.
Mathematics 08 01600 g009
Figure 10. Computing time (in seconds) as a function of the complexity for complexities between 10 and 200.
Figure 10. Computing time (in seconds) as a function of the complexity for complexities between 10 and 200.
Mathematics 08 01600 g010
Figure 11. Relative computing time over the squared number of points (time over 5002 points as unit) as a function of the problem size for complexity 100.
Figure 11. Relative computing time over the squared number of points (time over 5002 points as unit) as a function of the problem size for complexity 100.
Mathematics 08 01600 g011
Table 1. Input and output variables for the air foil self noise.
Table 1. Input and output variables for the air foil self noise.
Input VariablesOutput Variable
1.
Frequency (Hz).
Scaled sound pressure level (dB)
2.
Angle of attack (deg).
3.
Chord length (m).
4.
Free-stream velocity (m/s).
5.
Suction side displacement thickness (m).
Table 2. Parameters of quality for models of air foil self noise dataset in previous papers.
Table 2. Parameters of quality for models of air foil self noise dataset in previous papers.
ResearchModelR2MSERMSEMAE
[71]LR0.55822.1294.7043.672
DT0.8716.4572.5411.788
SVMR0.75112.4833.5332.674
RF0.9353.2831.8121.313
ABR0.72513.7963.7143.078
XGBR0.9462.7071.6451.074
Baseline NN0.67616.2364.0293.078
Deeper NN0.8029.9163.1492.474
[72]RF0.929
SGTB0.969
[73]NN0.952 (R2 of linear regression predicted/real)
[74]NN0.894
[75]NN0.943/0.838(several models)
Table 3. Parameters of the model with complexity 90 for the air foil self noise dataset.
Table 3. Parameters of the model with complexity 90 for the air foil self noise dataset.
ParameterFull ModelRestricted Model
R20.9640.887
MAE0.7551.679
MAPE0.60%1.35%
MSE1.7115.387
RMSE1.3082.321
Table 4. Input and output variables for the combined cycle power dataset.
Table 4. Input and output variables for the combined cycle power dataset.
Input VariablesOutput Variable
1.
Temperature
Net hourly electrical energy output
2.
Ambient pressure
3.
Relative Humidity
4.
Exhaust Vacuum
Table 5. Parameters of quality for the models of combined cycle power dataset in previous research.
Table 5. Parameters of quality for the models of combined cycle power dataset in previous research.
ResearchModelMSERMSEMAE
[76]LMS20.9034.5723.621
SMOReg20.8214.5633.620
K *14.9073.8612.882
BREP14.3413.7872.818
M5R17.0404.1283.172
M5P16.7044.0873.140
REP17.7334.2113.133
[77]k-NN + ANN19.990
k-means + ANN15.430
Table 6. Parameters of the model with complexity 70 for the combined cycle power dataset.
Table 6. Parameters of the model with complexity 70 for the combined cycle power dataset.
ParameterFull ModelRestricted Model
R20.9790.957
MAE1.7412.572
MAPE0.39%0.57%
MSE6.10212.592
RMSE2.4703.548

Share and Cite

MDPI and ACS Style

Navarro-González, F.J.; Villacampa, Y.; Cortés-Molina, M.; Ivorra, S. Numerical Non-Linear Modelling Algorithm Using Radial Kernels on Local Mesh Support. Mathematics 2020, 8, 1600. https://doi.org/10.3390/math8091600

AMA Style

Navarro-González FJ, Villacampa Y, Cortés-Molina M, Ivorra S. Numerical Non-Linear Modelling Algorithm Using Radial Kernels on Local Mesh Support. Mathematics. 2020; 8(9):1600. https://doi.org/10.3390/math8091600

Chicago/Turabian Style

Navarro-González, Francisco José, Yolanda Villacampa, Mónica Cortés-Molina, and Salvador Ivorra. 2020. "Numerical Non-Linear Modelling Algorithm Using Radial Kernels on Local Mesh Support" Mathematics 8, no. 9: 1600. https://doi.org/10.3390/math8091600

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop