Next Article in Journal
Further Optimization of Maxwell-Type Dynamic Vibration Absorber with Inerter and Negative Stiffness Spring Using Particle Swarm Algorithm
Previous Article in Journal
Intuitionistic Fuzzy Metric-like Spaces and Fixed-Point Results
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Approximation by the Extended Neural Network Operators of Kantorovich Type

1
School of Mathematics, Hangzhou Normal University, Hangzhou 311121, China
2
Department of Mathematics and Statistics, Wilfrid Laurier University, Waterloo, ON N2L 3C5, Canada
3
School of Mathematics and LPMC, Nankai University, Tianjin 300071, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(8), 1903; https://doi.org/10.3390/math11081903
Submission received: 9 March 2023 / Revised: 6 April 2023 / Accepted: 13 April 2023 / Published: 17 April 2023

Abstract

:
Based on the idea of integral averaging and function extension, an extended Kantorovich-type neural network operator is constructed, and its error estimate of approximating continuous functions is obtained by using the modulus of continuity. Furthermore, by introducing the normalization factor, the approximation property of the new version of the extended Kantorovich-type neural network (normalized extended Kantorovich-type neural network) operator is obtained in L p [ 1 , 1 ] . The numerical examples show that this newly proposed neural network operator has a better approximation performance than the classical one, especially at the endpoints of a compact interval.

1. Introduction

Neural networks are broadly applied in various applications, such as visual recognition, healthcare, astronomical physics, geology, cybersecurity, and many more. As the most widely used neural networks, feedforward neural networks (FNNs) have been thoroughly studied because of their universal approximation capabilities. Theoretically, any continuous function on a compact set can be approximated by FNNs to an arbitrary desired degree provided that the number of neurons is sufficiently large. Further, some upper bounds of the approximation error for FNNs in the uniform metric and L p metric have been studied in [1,2,3,4,5], and so on.
FNNs with one hidden layer can be mathematically expressed as
N n , σ ( x ) = k = 1 n c k σ ( a k · x + b k ) , x = ( x 1 , x 2 , , x s ) R s , s N ,
where for 1 k n , b k R are the thresholds, a k = ( a k 1 , a k 2 , , a k s ) R s are the connection weights, c k R are the coefficients, a k · x is the inner product of a k and x , and σ is the activation function.
NN operators and their approximation properties have attracted a lot of attention since the 1990s. Cardaliaguet and Euvrard [6] first introduced NN operators to approximate the unit operator. Since then, different types of NN operators were constructed, and their approximation properties were widely investigated. A lot of impressed results concerning the convergence of NN operators as well as the complexity of approximation have been achieved ([6,7,8,9,10,11,12,13,14,15,16,17], etc.). Traditional operators provide references for NN operators in many aspects, such as the construction, the way of discussing approximation properties, and the practical applications. They are both related and different in approximation performance and practical applications. Taking the classical Bernstein operator as an example, it is usually used to approximate continuous functions, while NN operators can be utilized to approximate a broad class of functions, such as integrable functions; Bernstein operators are valuable tools in computer-aided geometric design, while NN operators are used for a wider range of applications, such as machine learning. Moreover, different from classical operators, one remarkable feature of NN operators is that they are nonlinear. It is worth mentioning that the main advantage of using NN operators as approximation tools is that the NN operators can be viewed as FNNs with multiple layers, while all the components in the NN operators are known, such as the coefficients, the weights, and the thresholds, in order to approximate the target function. The constructions of NN operators and related discussions have formed an important part of the fundamental theories of artificial neural networks, which were introduced in order to simulate human brain activities.
Next, we are going to review some NN operators and their approximation properties. Let C [ a , b ] be the space of continuous functions on [ a , b ] . The approximation properties of NN operator G n ( f , x ) : [ a , b ] R ,
G n ( f , x ) = k = n a n b f k n · ϕ σ ( n x k ) k = n a n b ϕ σ ( n x k ) , x [ a , b ]
has been studied in [12,18,19], where ϕ σ will be defined in Section 2, f C [ a , b ] , equipped with the uniform norm f = max a x b | f ( x ) | . The symbol x denotes the greatest integer not exceeding x, while x denotes the smallest integer greater than or equal to x.
In applications, the values of f k / n would have some errors due to the time-jitter or offset of the input signals, while more information is usually known around a point than precisely at that point. In approximation theory, constructing a Kantorovich-type operator to reduce the “time-jitter or offset“ errors is a well-known method. This kind of operator is also very useful in areas such as signal processing. To achieve this, we replace single function values f ( k / n ) by an average of f on a small interval around k / n , namely the mean, n k n k + 1 n f ( u ) d u .
In [13], the authors used a similar idea and constructed a Kantorovich-type neural network operator F n ( f , x ) : R R in R d , d Z , and the related estimates are given therein. If we consider the one-dimension case of F n ( f ) , then the NN operator degenerates to N n ( f , x ) : [ 1 , 1 ] R ,
N n ( f , x ) = k = n n 1 n k n k + 1 n f ( u ) d u · ϕ σ ( n x k ) k = n n 1 ϕ σ ( n x k ) , x [ 1 , 1 ] .
It was proved in [13] that N n ( f , x ) convergences to f ( x ) in various functional spaces.
In [9], the authors extended the continuous function f on [ 1 , 1 ] to the function
f 1 , e ( x ) = f ( x ) , x [ 1 , 1 ] f ( sgn x ) , 1 < | x | < 2 ,
and discovered that, after constructing Kantorovich-type operators, the approximation rate of the constructed NN operator to the target function was significantly improved.
A natural question is, what happens if we combine the idea of function extension with that of integral averaging? In this paper, we investigate the approximation effects, including the convergence properties and quantitative estimates, of our newly constructed NN operator, called the extended Kantorovich-type NN operator, regarding this question. A numerical example shows that this type of neural network operator has a better approximation performance than N n ( f , x ) , especially at the endpoints of a compact interval.
The remaining part of this paper is organized as follows. In Section 2, we propose two new types of NN operators, the extended Kantorovich-type NN operator (EKNNO) and the normalized extended Kantorovich-type NN operator (NEKNNO), and investigate their basic properties and the operator-dependent activation function. In Section 3, we establish the approximation theorems of the EKNNO and NEKNNO for continuous functions, including the convergence theorems and quantitative estimates, using the modulus of continuity. This section also demonstrates a numerical example and its results, which verify the validity of the theoretical results and the potential superiority of these operators. In Section 4, we further establish and prove the approximation properties of the EKNNO in the Lebesgue space, as well as the convergence results and the approximation rate of the NEKNNO. The conclusions are included in Section 5.

2. Extended Neural Network Operators of Kantorovich Type

In this section, we explain how to construct two extended neural network operators of a Kantorovich type—EKNNO and NEKNNO.
The activation function σ plays a significant role in the approximation properties of neural networks. In many fundamental NN models, the activation function σ is usually taken to be a sigmoid function ([20]), which is defined below.
A function σ ( x ) : R R is called sigmoid ([20]) if
lim x σ ( x ) = 0 , lim x + σ ( x ) = 1 .
For example, the well-known Logistic function is a typical sigmoid type, which is defined by σ ( x ) = 1 1 + e x .
Next, we write ϕ σ ( x ) , a combination of the translations of σ that was first proposed by Chen and Cao [8]:
ϕ σ ( x ) : = 1 2 σ ( x + 1 ) σ ( x 1 ) , x R .
We assume in this paper that the sigmoid-type function is nondecreasing on R , such that σ ( 4 ) > σ ( 2 ) , and it is centrosymmetric with respect to the point ( 0 , σ ( 0 ) ) . For example, the Logistic function is exactly the sigmoid-type function that satisfies these conditions, i.e.,
lim x + ϕ σ ( x ) = lim x ϕ σ ( x ) = 0 .
The function ϕ σ ( x ) , which is often called the “bell-shaped function” (a name given by Cardaliaguet and Euvrard [6]), has been discussed in many papers. It has some important properties, and we cite part of them from [11] here, which are related to the research presented in this paper.
( i ) x R , ϕ σ ( x ) 0 , specifically , ϕ σ ( 3 ) > 0 ; ( ii ) ϕ σ ( x ) is an even function on R ; ( iii ) ϕ σ ( x ) is nondecreasing for x < 0 , and ϕ σ ( x ) is nonincreasing for x 0 . Moreover , ϕ σ ( x ) ϕ σ ( 0 ) , x R .
Using the definition of the Fourier transformation ([21]) and the Poisson summation formula ([9,22]), we have Theorem 1.
Theorem 1. 
Assume that  σ ( x )   is   sigmoid type and centrosymmetric with respect to the point ( 0 , σ ( 0 ) ) , ϕ σ ( x ) is defined by (2). If there exist constants C > 0 and δ > 1 such that
| ϕ σ ( x ) | C ( 1 + | x | ) 1 δ , | ϕ ^ σ ( x ) | C ( 1 + | x | ) 1 δ , x R , + ϕ σ ( x ) d x = 1 ,
then we have
k = + ϕ σ ( x k ) = 1 + 2 k = 1 + ϕ ^ σ ( k ) cos 2 k π x ,
where ϕ ^ σ ( x ) is the Fourier transformation of ϕ σ ( x ) .
Throughout the paper, C refers to a positive constant whose value may vary under different circumstances.
Remark 1. 
If σ ( x ) is taken as the Logistic function in Theorem 1, it was proved in [8,22] that for all k Z { 0 } , we have ϕ ^ σ ( k ) = 0 , that is, Equality (3) can be simply written as k = + ϕ σ ( x k ) = 1 .
Inspired by the idea of integral averaging and function extension, we propose the first new operator—the extended Kantorovich-type NN operator (EKNNO)—as follows.
Definition 1. 
Let A > 0 , ϕ σ ( x ) is defined by (2). Denote
ϕ σ , A ( x ) = 1 A ϕ σ ( x A ) .
Assume f C [ 1 , 1 ] , and f 1 , e is defined as in (1). Then, EKNNO is defined by
K n , A ( f , x ) = k = 2 n 2 n 1 n k n k + 1 n f 1 , e ( u ) d u · ϕ σ , A ( n x k ) .
The EKNNO K n , A ( f , x ) has several key characteristics. First, we replace f ( k / n ) by the integral averaging n k n k + 1 n f 1 , e ( u ) d u to remove the effect of time-jitter. Second, we extend the traditional function f defined on interval [ 1 , 1 ] to f 1 , e on ( 2 , 2 ) to have better approximation abilities, especially around the endpoints 1 , 1 . Further, we introduce a parameter A into the activation function ϕ σ , A ( n x k ) , which serves as a flexible quantity used to fine tune the approximation ability of the operator.
Based on Definition 1, by introducing the normalization factor, we further construct the normalized version of the extended NN operator (NEKNNO) as follows.
Definition 2. 
Assume that f 1 , e ( u ) is given by (1), and ϕ σ , A ( x ) is defined by (4). Then, we define
K n , A ( f , x ) = k = 2 n 2 n 1 n k n k + 1 n f 1 , e ( u ) d u · ϕ σ , A ( n x k ) k = 2 n 2 n 1 ϕ σ , A ( n x k ) .
Normalization allows us to further discuss the approximation performance of NN operators in the integrable function space.
Compared with K n , A ( f , x ) , K n , A ( f , x ) has the following properties.
Theorem 2. 
Assume K n , A ( f , x ) is defined as in Definition 2. Then, for f , g C [ 1 , 1 ] , we have
(I) K n , A ( f , x ) K n , A ( g , x ) K n , A ( | f g | , x ) , x [ 1 , 1 ] .
(II) Denote 1 the unitary constant function on [ 1 , 1 ] , i.e., g = 1 g ( t ) 1 , t [ 1 , 1 ] . Then,
K n , A ( 1 , x ) 1 , x [ 1 , 1 ] .
Proof. 
For arbitrary x [ 1 , 1 ] , we have
f ( x ) = f ( x ) g ( x ) + g ( x ) | f ( x ) g ( x ) | + g ( x ) ,
similarly
g ( x ) | g ( x ) f ( x ) | + f ( x ) .
Notice that for arbitrary x [ 1 , 1 ] , if f ( x ) g ( x ) , then
K n , A ( f , x ) K n , A ( g , x ) ,
K n , A ( f + g , x ) K n , A ( f , x ) + K n , A ( g , x ) .
By (5)–(8), we have
K n , A ( f , x ) K n , A ( | f g | , x ) + K n , A ( g , x ) K n , A ( g , x ) K n , A ( | g f | , x ) + K n , A ( f , x )
K n , A ( f , x ) K n , A ( g , x ) K n , A ( | f g | , x ) .
The Proof of (II) is straightforward because
K n , A ( 1 , x ) = k = 2 n 2 n 1 n k n k + 1 n 1 d u · ϕ σ , A ( n x k ) k = 2 n 2 n 1 ϕ σ , A ( n x k ) = 1 .
Theorem 2 is proved. □
To obtain the error estimates of the operators K n , A ( f , x ) and K n , A ( f , x ) , we need the following lemmas.
Lemma 1. 
Let ϕ σ , A ( x ) be defined as in (4). Then, for any γ > 0 , k Z ,
lim n + | x k | > γ n ϕ σ , A ( x k ) = 0
holds uniformly for x R .
Proof. 
For any x R , set x 0 = x x , and then 0 x 0 < 1 . Further, for any k Z , denote k ¯ = k x . Then, for sufficiently large n N , by the definition of ϕ σ , A ( x ) and Theorem 1, we have
| x k | > γ n ϕ σ , A ( x k ) = | x 0 k ¯ | > γ n ϕ σ , A ( x 0 k ¯ ) | k ¯ | > γ n 1 ϕ σ , A ( x 0 k ¯ ) C sup x 0 [ 0 , 1 ) | k ¯ | > γ n 1 | x 0 k ¯ | 1 δ C k ¯ > γ n 1 | 1 k ¯ | 1 δ + k ¯ < 1 γ n | k ¯ | 1 δ 0 , n + .
Lemma 2. 
Let ϕ σ , A ( x ) be defined as in 4. For any x [ 1 , 1 ] , n N , the following inequality holds:
k = 2 n 2 n 1 ϕ σ , A ( n x k ) ϕ σ , A ( 3 ) .
Proof. 
For all k [ 2 n , 2 n 1 ] Z , it is easy to see that
k = 2 n 2 n 1 ϕ σ , A ( n x k ) = k = 2 n 2 n 1 ϕ σ , A ( | n x k | ) ϕ σ , A ( | n x k 0 | ) .
Further, we can fix a specific k 0 Z such that | n x k 0 | 1 ; therefore,
ϕ σ , A ( | n x k 0 | ) ϕ σ , A ( 3 ) > 0 ,
which completes the proof of Lemma 2. □

3. Degree of the Approximation by K n , A ( f , x ) and K n , A ( f , x ) in C [ 1 , 1 ]

The main aim of this section is to prove the convergence theorem as well as the quantitative approximation theorem of operators K n , A ( f ) and K n , A ( f ) to functions f C [ 1 , 1 ] . For the EKNNO K n , A ( f ) , Theorem 3 can be established.
Theorem 3. 
Assume that the function σ ( x ) satisfies the condition in Theorem 1, and then for f C [ 1 , 1 ] , there exists a constant C > 0 such that
f K n , A ( f ) C f 1 δ A 1 + δ + 1 δ A n δ + A δ n δ ( 1 α ) α + 2 + 1 n 1 α 1 + 1 δ A 1 + δ ω f , 1 n α ,
where α = 1 2 1 | δ 2 | + 2 , δ > 1 .
The symbol ω ( f , t ) in Theorem 3 denotes the modulus of continuity of f [23], defined by
ω ( f , t ) : = sup x , y [ 1 , 1 ] , | x y | t | f ( x ) f ( y ) | , t > 0 .
Proof. 
In view of Theorem 1,
k = + ϕ σ , A ( n x k ) = 1 + 2 k = 1 + ϕ ^ σ , A ( A k ) cos 2 k n π x ,
then
f ( x ) K n , A ( f , x ) = k Z f ( x ) ϕ σ , A ( n x k ) K n , A ( f , x ) 2 f ( x ) k = 1 + ϕ ^ σ , A ( A k ) cos 2 k n π x = k = 2 n 1 f ( x ) ϕ σ , A ( n x k ) + k = 2 n + f ( x ) ϕ σ , A ( n x k ) + k = 2 n 2 n 1 f ( x ) n k n k + 1 n f 1 , e ( u ) d u ϕ σ , A ( n x k ) 2 f ( x ) k = 1 + ϕ ^ σ , A ( A k ) cos 2 k n π x : = I 1 + I 2 + I 3 + I 4 .
Now, we estimate I 1 , I 2 , I 3 , I 4 separately.
For I 1 , x [ 1 , 1 ] , k 2 n 1 , we have n x k 2 n + 1 + n x n + 1 . By the definition of ϕ σ , A ( x ) and the property that | ϕ σ ( x ) | C ( 1 + | x | ) 1 δ , we have:
| I 1 | f k = 2 n 1 ϕ σ , A ( n x k ) C f 1 A k = 2 n 1 A 1 + δ ( n x k ) 1 + δ = C f A δ k = n + 1 1 k 1 + δ C f 1 δ A n δ .
Similarly,
| I 2 | = k = 2 n + f ( x ) ϕ σ , A ( n x k ) C f 1 δ A n δ ,
and
| I 4 | = 2 f ( x ) k = 1 + ϕ ^ σ , A ( A k ) cos 2 k n π x C f k = 1 + 1 ( A k ) 1 + δ C δ A 1 + δ f .
While for I 3 ,
| I 3 | k = 2 n 2 n f ( x ) n k n k + 1 n f 1 , e ( u ) d u · ϕ σ , A ( n x k ) k n x 1 n α + k n x > 1 n α f ( x ) n k n k + 1 n f 1 , e ( u ) d u · ϕ σ , A ( n x k ) : = I 3 1 + I 3 2 ,
then
I 3 1 = k n x 1 n α n k n k + 1 n f ( x ) f 1 , e ( u ) d u ϕ σ , A ( n x k ) k n x 1 n α n k n k + 1 n ω ( f , | u x | ) d u ϕ σ , A ( n x k ) k n x 1 n α n k n k + 1 n 1 + n α | u x | ω f , 1 n α d u ϕ σ , A ( n x k ) k n x 1 n α n ω f , 1 n α k n k + 1 n 1 + n α u k n + n α k n x d u ϕ σ , A ( n x k ) k n x 1 n α n ω f , 1 n α k n k + 1 n 1 + n α · 1 n + n α · 1 n α d u ϕ σ , A ( n x k ) n ω f , 1 n · 2 + 1 n 1 α · 1 n k = + ϕ σ , A ( n x k ) .
Furthermore, because
k = + ϕ σ , A ( n x k ) 1 + C δ A 1 + δ C 1 + 1 δ A 1 + δ ,
thus
I 3 1 C 2 + 1 n 1 α 1 + 1 δ A 1 + δ ω f , 1 n α ,
while
I 3 2 = k n x > 1 n α f ( x ) n k n k + 1 n f ( u ) d u · ϕ σ , A ( n x k ) 2 f k n x > 1 n α 1 A · A 1 + δ | n x k | 1 + δ C f A δ k n x > 1 n α 1 n ( 1 + δ ) ( 1 α ) C f A δ n 1 α 1 t 1 + δ d t C f A δ 1 δ 1 n δ ( 1 α ) .
Therefore,
I 3 C 2 + 1 n 1 α 1 + 1 δ A 1 + δ ω f , 1 n α + f A δ 1 δ n δ ( 1 α ) .
Combining the above estimates together, we have
f K n , A ( f ) C f 1 δ A 1 + δ + 1 δ A n δ + A δ δ n δ ( 1 α ) + 2 + 1 n 1 α 1 + 1 δ A 1 + δ ω f , 1 n α .
Remark 2. 
For a fixed A, the terms A n , A δ n δ ( 1 α ) , and ω f , 1 n α are approaching 0 when n ; consequently,
lim n + f K n , A ( f ) C A 1 + δ f
for some constant C > 0 . Especially, if we choose A = 1 , we have
lim n + f K n , A ( f ) C f .
Remark 3. 
Taking the activation function y = σ ( x ) = 1 1 + e x , then k = + ϕ σ ( x k ) = 1 in this case. Following the similar (but more simple) procedure, we have
f K n , A ( f ) C f 1 δ A n δ + A δ n δ ( 1 α ) α + 2 + 1 n 1 α ω f , 1 n α 0 ,
when n + .
Remark 4. 
Notice that in Theorem 3, we set
α = 1 2 1 | δ 2 | + 2 , δ > 0 .
Then, it is easy to verify that
δ ( 1 α ) = δ | δ 2 | + 4 δ 2 | δ 2 | + 4 > δ 2 ,
as shown in Figure 1. Moreover, for δ + , α tends to 1 2 .
In approximation theory, Theorem 3 is called a direct theorem of approximation by the operators, which gives the upper bound of the approximation. The direct theorems of the Kantorovich-types operators were investigated in much of the literature, see, for example, [24,25], etc.
The results on the upper bound imply the convergence of the NN operators to the target function and also provide a quantitative measurement of how accurately the target function can be approximated.
As for the NEKNNO K n , A ( f ) , we have the following theorem.
Theorem 4. 
Assume that f : [ 1 , 1 ] R is bounded on [ 1 , 1 ] and continuous at x 0 R , then
lim n + K n , A ( f , x 0 ) = f ( x 0 ) .
Furthermore, if f C [ 1 , 1 ] , we have
lim n + K n , A ( f ) f = 0 .
Proof. 
f is continuous at x = x 0 ⇔ For any ε > 0 , there exists a γ > 0 , such that for arbitrary y [ 1 , 1 ] [ x γ , x + γ ] , it holds | f ( x 0 ) f ( y ) | < ε . While for u [ k / n , ( k + 1 ) / n ] and | n x 0 k | n γ / 2 , we have
| u x 0 | u k n + k n x 0 1 n + γ 2 ,
then
K n , A ( f , x 0 ) f ( x 0 ) K n , A ( | f f x | , x 0 ) 1 ϕ σ , A ( 3 ) k = 2 n 2 n n k n k + 1 n | f ( u ) f ( x 0 ) | d u ϕ σ , A ( n x 0 k ) 1 ϕ σ , A ( 3 ) max | n x k | n γ / 2 , | n x k | > n γ / 2 n × k n k + 1 n | f ( u ) f ( x 0 ) | d u ϕ σ , A ( n x 0 k ) : = 1 ϕ σ , A ( 3 ) max ( J 1 , J 2 ) .
Notice that | u x 0 | < γ for sufficiently large n N .
Estimating J 1 , J 2 , respectively, we have:
J 1 | n x k | n γ / 2 n k n k + 1 n | f ( u ) f ( x 0 ) | d u ϕ σ , A ( n x 0 k ) n k n k + 1 n ε d u k = + ϕ σ , A ( n x 0 k ) C 1 + 1 A 1 + δ ε ; J 2 | n x k | > n γ / 2 n k n k + 1 n 2 f d u · ϕ σ , A ( n x 0 k ) 0 , n + .
lim n + K n , A ( f , x 0 ) = f ( x 0 ) .
Relation (13) is proved, and (14) can be proved similarly. □
Remark 5. 
In Figure 2, we compare the approximation efficiency of the NEKNNO K n , A ( f , x ) and N n ( f , x ) when both approximate to the quadratic function f ( x ) = x 2 on [ 1 , 1 ] , while parameters n and A vary. It is very clear to see from Figure 2a–c that the NEKNNO K n , A ( f , x ) has a better approximation performance, especially at the endpoints. At the same time, the change in the parameter A in Figure 2d obviously affects the approximation efficiency of the NEKNNO K n , A ( f , x ) at the endpoints. Then, whether the optimal solution of A exists or not is an open question worth discussing.

4. Degree of the Approximation by NEKNNO K n , A ( f ) and EKNNO K n , A ( f ) in L p [ 1 , 1 ]

Now, consider the efficiency of the approximation of f by proposed operators EKNNO K n , A ( f ) and NEKNNO K n , A ( f ) in the Lebesgue space L p [ 1 , 1 ] ( 1 p < + ) , where
L p [ 1 , 1 ] = f : 1 1 | f ( x ) | p d x < + ,
equipped with the norm
f p = 1 1 | f ( x ) | p d x 1 / p , 1 p < + .
We first give Lemma 3.
Lemma 3. 
For any functions f , g L p [ 1 , 1 ] , 1 p < + , the following inequality holds:
K n , A ( f ) K n , A ( g ) p C ϕ σ , A 1 / p ( 3 ) f g p .
Proof. 
By the definition of · p , Theorem 2, Lemma 2, and applying Jenson’s inequality due to the convexity of | · | p ( 1 p < + ) , we have
K n , A ( f ) K n , A ( g ) p p = 1 1 K n , A ( f , x ) K n , A ( g , x ) p d x 1 1 K n , A ( | f g | , x ) p d x 1 1 k = 2 n 2 n 1 ϕ σ , A ( n x k ) n k n k + 1 n f 1 , e ( u ) g 1 , e ( u ) d u p k = 2 n 2 n 1 ϕ σ , A ( n x k ) d x 1 ϕ σ , A ( 3 ) 1 1 k = 2 n 2 n 1 ϕ σ , A ( n x k ) n k n k + 1 n f 1 , e ( u ) g 1 , e ( u ) d u p d x 1 ϕ σ , A ( 3 ) R k = 2 n 2 n 1 n ϕ σ , A ( n x k ) d x k n k + 1 n f 1 , e ( u ) g 1 , e ( u ) p d u .
In view of Theorem 1, we obtain
R n ϕ σ , A ( n x k ) d x = R ϕ σ ( n x k A ) d n x A = 1 .
Therefore,
K n , A ( f ) K n , A ( g ) p p 1 ϕ σ , A ( 3 ) k = 2 n 2 n 1 k n k + 1 n f 1 , e ( u ) g 1 , e ( u ) p d u = 1 ϕ σ , A ( 3 ) 2 2 f 1 , e ( u ) g 1 , e ( u ) p d u C ϕ σ , A ( 3 ) f g p p .
This completes the proof of Lemma 3. □
Utilizing Lemma 3, we can prove the convergence of K n , A ( f ) in L p [ 1 , 1 ] .
Theorem 5. 
Assume that f L p [ 1 , 1 ] , then
lim n + K n , A ( f ) f p = 0 .
Proof. 
It is well-known that C [ 1 , 1 ] is dense in L p [ 1 , 1 ] . Let f L p [ 1 , 1 ] . Then, for any ε > 0 , there exists a function g C [ 1 , 1 ] such that f g p ε . By Lemma 3 and (14),
K n , A ( g ) g p = 1 1 K n , A ( g , x ) g ( x ) p d x 1 / p 1 1 K n , A ( g , x ) g ( x ) p d x 1 / p 2 1 / p ε .
Therefore, for sufficiently large n N , we have
K n , A ( f ) f p K n , A ( f ) K n , A ( g ) p + K n , A ( g ) g p + f g p C + 1 ϕ σ , A 1 / p ( 3 ) f g p + K n , A ( g ) g p C + 1 ϕ σ , A 1 / p ( 3 ) + 2 1 / p ε .
Theorem 5 is proved. □
Next, we consider K n , A ( f ) . We need Lemma 4, whose proof is similar to that of Lemma 3.
Lemma 4. 
For any functions f , g L p [ 1 , 1 ] , 1 p < + , the following inequality holds:
K n , A ( f ) K n , A ( g ) p f g p .
Now, we can establish the approximation theorem of K n , A ( f , x ) in L p [ 1 , 1 ] space.
Theorem 6. 
Let C 1 [ 1 , 1 ] denote the set of all functions that are differentiable and have continuous derivatives on [ 1 , 1 ] . Then, for f L p [ 1 , 1 ] , 1 p < + , g C 1 [ 1 , 1 ] , the following inequality holds:
K n , A ( f ) f p inf g C 1 [ 1 , 1 ] 2 f g p + M A , n , δ 1 ϕ σ , A 1 / p ( 3 ) g 1 , e L p [ 2 , 2 ] + M A , n , δ 2 n ϕ σ , A ( 3 ) g ,
where g 1 , e L p [ 2 , 2 ] = 2 2 | g 1 , e ( u ) | p d u 1 / p , M A , n , δ 1 = C δ 1 A 1 + δ + A n δ , and M A , n , δ 2 = C 1 n 1 2 ( δ 1 ) + 1 1 + 1 A 1 + δ + 1 δ 1 n .
Proof. 
Let f L p [ 1 , 1 ] , 1 p < + , and g ( x ) C 1 [ 1 , 1 ] . By Minkowski’s inequality,
K n , A ( f ) f p K n , A ( f ) K n , A ( g ) p + K n , A ( g ) K n , A ( g ) p + K n , A ( g ) g p + f g p .
In view of Lemma 4,
K n , A ( f ) K n , A ( g ) p f g p .
Therefore,
K n , A ( f ) f p K n , A ( g ) K n , A ( g ) p + K n , A ( g ) g p + 2 f g p : = S 1 + S 2 + 2 f g p .
According to the definition of K n , A and K n , A , we have
S 1 = K n , A ( g ) K n , A ( g ) p 1 k = 2 n 2 n 1 ϕ σ , A ( n x k ) K n , A ( g ) p .
Then, in view of (10) and (11),
1 k = 2 n 2 n 1 ϕ σ , A ( n x k ) k = 2 n 1 ϕ σ , A ( n x k ) + k = 2 n + ϕ σ , A ( n x k ) + 2 k = 1 + ϕ ^ σ , A ( A k ) cos 2 k n π x C δ 1 A 1 + δ + A n δ .
By Jensen’s inequality, Hölder’s inequality, and Lemma 2,
K n , A ( g ) p p = 1 1 k = 2 n 2 n 1 n k n k + 1 n g 1 , e ( u ) d u ϕ σ , A ( n x k ) k = 2 n 2 n 1 ϕ σ , A ( n x k ) p d x 1 1 k = 2 n 2 n 1 n k n k + 1 n g 1 , e ( u ) d u p ϕ σ , A ( n x k ) k = 2 n 2 n 1 ϕ σ , A ( n x k ) d x 1 ϕ σ , A ( 3 ) 1 1 k = 2 n 2 n 1 n k n k + 1 n g 1 , e ( u ) d u p ϕ σ , A ( n x k ) d x 1 ϕ σ , A ( 3 ) R k = 2 n 2 n 1 n ϕ σ , A ( n x k ) d x k n k + 1 n g 1 , e ( u ) p d u 1 ϕ σ , A ( 3 ) k = 2 n 2 n 1 k n k + 1 n g 1 , e ( u ) p d u = 1 ϕ σ , A ( 3 ) g 1 , e L p [ 2 , 2 ] p S 1 = K n , A ( g ) K n , A ( g ) p
C δ ϕ σ , A 1 / p ( 3 ) 1 A 1 + δ + A n δ g 1 , e L p [ 2 , 2 ] .
For S 2 , x , t [ 1 , 1 ] , according to the Lagrange mean value theorem, we have
| g ( t ) g ( x ) | g | t x | .
Therefore,
K n , A ( g , x ) g ( x ) g K n , A ( | t x | , x ) g σ σ , A ( 3 ) k = 2 n 2 n 1 n k n k + 1 n x k n + k n u d u ϕ σ , A ( n x k ) g n σ σ , A ( 3 ) k = 2 n 2 n 1 | n x k | ϕ σ , A ( n x k ) + k = 2 n 2 n 1 ϕ σ , A ( n x k ) .
Let β ( 0 , 1 ) . We have
k = 2 n 2 n 1 | n x k | ϕ σ , A ( n x k ) k n x 1 n β | n x k | ϕ σ , A ( n x k ) + k n x > 1 n β | n x k | ϕ σ , A ( n x k ) 1 n β k = + ϕ σ , A ( n x k ) + C k n x > 1 n β | n x k | | n x k | 1 + δ C n β 1 + 1 δ A 1 + δ + C 1 n β + 1 t δ d t = C 1 n β 1 + 1 δ A 1 + δ + 1 δ 1 n β ( δ 1 ) .
Meanwhile, by (12),
k = 2 n 2 n 1 ϕ σ , A ( n x k ) k = + ϕ σ , A ( n x k ) C 1 + 1 δ A 1 + δ ,
setting β = 1 2 ( δ 1 ) , we have
K n , A ( g , x ) g ( x ) C g n ϕ σ , A ( 3 ) 1 n 1 2 ( δ 1 ) + 1 1 + 1 δ A 1 + δ + 1 δ 1 n .
In summary, combined with Equations (15)–(17), Theorem 6 is proved. □
Remark 6. 
If we take y = σ ( x ) = 1 1 + e x as the activation function, then we have
k = + ϕ σ ( x k ) = 1 .
Consequently,
K n , A ( f ) f p inf g C 1 [ 1 , 1 ] 2 f g p + C δ ϕ σ , A ( 3 ) A n δ g 1 , e L p [ 2 , 2 ] + C g n ϕ σ , A 1 / p ( 3 ) 1 n 1 2 ( δ 1 ) + 1 + 1 δ 1 n .
Choosing appropriate g C 1 [ 1 , 1 ] with δ > 1 and n N leads to
lim n + f K n , A ( f ) p = 0 .
If we define g as the Steklov mean function of f, that is,
g ( x ) = 1 h x x + h f ( t ) d t , x [ 1 , 1 h ) , 1 h 1 h 1 f ( t ) d t , x [ 1 h , 1 ] ,
where 0 < h < 2 , then g is absolutely continuous and f related. The upper bound of K n , A ( f ) f p can be estimated by using the modulus of smoothness in L p space. We will derive this type of estimate in future work.

5. Conclusions

In this paper, we propose two types of NN operators, the EKNNO and NEKNNO, which can be regarded as feedforward neural networks with multiple layers. We construct the EKNNO and NEKNNO using the following ideas: (1) integral averaging leads to a Kantorovich-type NN operator for removing time-jitters; (2) a function extension improves the EKNNO’s and NEKNNO’s approximation abilities, especially at the endpoints of a compact interval; (3) the introduction of a flexible parameter A can fine tune the approximation ability of the operators; and (4) normalization allows us to further discuss the approximation performance of the NEKNNO in an integrable function space. All these features combined provide a better approximation performance. We further prove the convergence of these operators as well as attain the quantitative estimates, while in the latter some important approximation tools, such as the modulus of continuity and the idea of K-functional, are utilized. Numerical examples are used to verify the validity of the theoretical results and some potential superiorities of our NN operators.
However, in this paper, we only considered the direct theorems of the NN operators, and the target function is univariate. The converse results and higher dimensional case will be investigated in our future work. Moreover, we use the sigmoid-type function as the activation function in our paper, while many other activation functions, such as ReLU and some other variations of ReLU (such as LeakReLU, PReLu, ELU, SELU, etc.) are widely used in the machine learning and deep learning fields. It is worth exploring if the similar construction of an NN can work well with different activation functions.
In conclusion, we utilize methods and tools in approximation theory to obtain some interesting results in the field of neural networks, which may lead to more applications in neural networks.

Author Contributions

Conceptualization, writing—review and editing, visualization, C.X.; conceptualization, formal analysis, writing—review and editing Y.Z.; conceptualization, writing—review and editing X.W.; conceptualization, writing—review and editing P.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China under grant number 11671213, 11601110 and the Natural Science and Engineering Research Council of Canada under number RGPIN-2019-05917.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The study did not report any data.

Acknowledgments

The authors would like to thank the Anonymous Reviewers, Academic Editor, and the Journal Editor for their important comments.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
NNneural network
FNNsfeedforward neural networks
EKNNOextended Kantorovich-type neural network operator
NEKNNOnormalized extended Kantorovich-type neural network operator

References

  1. Barron, A.R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 1993, 39, 930–945. [Google Scholar] [CrossRef]
  2. Cao, F.L.; Xu, Z.B.; Li, Y.M. Pointwise approximation for neural networks. Lect. Notes Comput. Sci. 2005, 3496, 39–44. [Google Scholar]
  3. Cao, F.L.; Xie, T.F.; Xu, Z.B. The estimate for approximation error of neural networks: A constructive approach. Neurocomput. 2008, 71, 626–630. [Google Scholar] [CrossRef]
  4. Cao, F.L.; Zhang, R. The errors of approximation for feedforward neural networks in the Lp metric. Math. Comput. Model. 2009, 49, 1563–1572. [Google Scholar] [CrossRef]
  5. Chui, C.K.; Li, X. Approximation by ridge functions and neural networks with one hidden layer. J. Approx. Theory 1992, 70, 131–141. [Google Scholar] [CrossRef]
  6. Cardaliaguet, P.; Euvrard, G. Approximation of a function and its derivative with a neural network. Neural Netw. 1992, 5, 207–220. [Google Scholar] [CrossRef]
  7. Cantarini, M.; Coroianu, L.; Costarelli, D.; Gal, S.G.; Vinti, G. Inverse Result of Approximation for the Max-Product Neural Network Operators of the Kantorovich Type and Their Saturation Order. Mathematics 2022, 10, 63. [Google Scholar] [CrossRef]
  8. Chen, Z.X.; Cao, F.L. The approximation operators with sigmoidal functions. Comput. Math. Appl. 2009, 58, 758–765. [Google Scholar] [CrossRef]
  9. Chen, Z.X.; Cao, F.L.; Zhao, J.W. The construction and approximation of some neural networks operators. Appl. Math.-A J. Chin. Univ. 2012, 27, 69–77. [Google Scholar] [CrossRef]
  10. Chen, Z.X.; Cao, F.L. Scattered data approximation by neural network operators. Neurocomputing 2016, 190, 237–242. [Google Scholar] [CrossRef]
  11. Costarelli, D.; Spigler, R. Approximation results for neural network operators activated by sigmoidal functions. Neural Netw. 2013, 44, 101–106. [Google Scholar] [CrossRef] [PubMed]
  12. Costarelli, D.; Spigler, R. Multivariate neural network operators with sigmoidal activation functions. Neural Netw. 2013, 48, 72–77. [Google Scholar] [CrossRef] [PubMed]
  13. Costarelli, D.; Spigler, R. Convergence of a family of neural network operators of the Kantorovich type. J. Approx. Theory 2014, 185, 80–90. [Google Scholar] [CrossRef]
  14. Costarelli, D.; Vinti, G. Quantitative estimates involving K-functionals for neural network-type operators. Appl. Anal. 2019, 98, 2639–2647. [Google Scholar] [CrossRef]
  15. Qian, Y.Y.; Yu, D.S. Neural network interpolation operators activated by smooth ramp functions. Anal. Appl. 2022, 20, 791–813. [Google Scholar] [CrossRef]
  16. Yu, D.S.; Zhou, P. Approximation by neural network operators activated by smooth ramp functions. Acta Math. Sin. (Chin. Ed.) 2016, 59, 623–638. [Google Scholar]
  17. Zhao, Y.; Yu, D.S. Learning rates of neural network estimators via the new FNNs operators. In Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, China, 6–11 July 2014. [Google Scholar] [CrossRef]
  18. Anastassiou, G.A. Univariate hyperbolic tangent neural network approximation. Math. Comput. Model. 2011, 53, 1111–1132. [Google Scholar] [CrossRef]
  19. Anastassiou, G.A. Intelligent Systems: Approximation by Artificial Neural Networks, 1st ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  20. Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 1989, 27, 303–314. [Google Scholar] [CrossRef]
  21. Zygmund, A.; Fefferman, R. Trigonometric Series, 3rd ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  22. Zhang, Z.; Liu, K.; Zhu, L.; Chen, Y. The new approximation operators with sigmoidal functions. Appl. Math. Comput. 2013, 42, 455–468. [Google Scholar] [CrossRef]
  23. DeVore, R.A.; Lorentz, G.G. Constructive Approximation, 1st ed.; Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
  24. Heshamuddin, M.; Rao, N.; Lamichhane, B.P.; Kiliçman, A.; Ayman-Mursaleen, M. On one- and two-dimensional α-Stancu-Schurer- Kantorovich operators and their approximation properties. Mathematics 2022, 10, 3227. [Google Scholar] [CrossRef]
  25. Rao, N.; Malik, P.; RaniPradeep, M. Blending type Approximations by Kantorovich variant of α-Baskakov operators. Palest. J. Math 2022, 11, 402–413. [Google Scholar]
Figure 1. Comparison of the Order.
Figure 1. Comparison of the Order.
Mathematics 11 01903 g001
Figure 2. The comparison of approximation effects of operators K n , A ( f , x ) and N n ( f , x ) .
Figure 2. The comparison of approximation effects of operators K n , A ( f , x ) and N n ( f , x ) .
Mathematics 11 01903 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiang, C.; Zhao, Y.; Wang, X.; Ye, P. Approximation by the Extended Neural Network Operators of Kantorovich Type. Mathematics 2023, 11, 1903. https://doi.org/10.3390/math11081903

AMA Style

Xiang C, Zhao Y, Wang X, Ye P. Approximation by the Extended Neural Network Operators of Kantorovich Type. Mathematics. 2023; 11(8):1903. https://doi.org/10.3390/math11081903

Chicago/Turabian Style

Xiang, Chenghao, Yi Zhao, Xu Wang, and Peixin Ye. 2023. "Approximation by the Extended Neural Network Operators of Kantorovich Type" Mathematics 11, no. 8: 1903. https://doi.org/10.3390/math11081903

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop