A Neural Network Approximation Based on a Parametric Sigmoidal Function

Yun, Beong In

doi:10.3390/math7030262

Open AccessArticle

A Neural Network Approximation Based on a Parametric Sigmoidal Function

by

Beong In Yun

Department of Mathematics, Kunsan National University, Gunsan 54150, Korea

Mathematics 2019, 7(3), 262; https://doi.org/10.3390/math7030262

Submission received: 12 February 2019 / Revised: 5 March 2019 / Accepted: 11 March 2019 / Published: 14 March 2019

Download

Browse Figures

Versions Notes

Abstract

:

It is well known that feed-forward neural networks can be used for approximation to functions based on an appropriate activation function. In this paper, employing a new sigmoidal function with a parameter for an activation function, we consider a constructive feed-forward neural network approximation on a closed interval. The developed approximation method takes a simple form of a superposition of the parametric sigmoidal function. It is shown that the proposed method is very effective in approximation of discontinuous functions as well as continuous ones. For some examples, the availability of the presented method is demonstrated by comparing its numerical results with those of an existing neural network approximation method. Furthermore, the efficiency of the method in extended application to the multivariate function is also illustrated.

Keywords:

feed-forward neural network; activation function; parametric sigmoidal function; quasi-interpolation

MSC:

65D15; 92B20; 41A20

1. Introduction

Cybenko [1] and Funahashi [2] proved that any continuous function can be uniformly approximated on a compact set

I \subset R^{n}

by the feed-forward neural networks (FNN) in the form of

F_{N} (x) = \sum_{k = 0}^{N} α_{k} σ (ω_{k} \cdot x + θ_{k}), x \in I,

(1)

where

σ

is called an activation function,

ω_{k} \in R^{n}

are weights,

θ_{k} \in R

are thresholds, and

α_{k} \in R

are coefficients. It is called the universal approximation theorem. Moreover, Hornik et al. [3] showed that any measurable function can be approximated on a compact set by the form of the FNN. Some constructive approximation methods by the FNN were developed in the literature [4,5,6,7]. Other examples of the function approximation by the FNN can be found in the works of Cao et al. [8], Chui and Li [9], Ferrari and Stengel [10], and Suzuki [11]. Particularly, the activation function

σ

is a basic architecture of the neural networks because it imports non-linear properties into the networks. This allows the artificial neural networks to learn from complicated non-linear mappings between inputs in general.

In this paper, aiming efficient approximation to the data obtained from continuous or discontinuous functions on a closed interval, we develop a feed-forward neural network approximation method based on a sigmoidal activation function. First, in the following section, we propose a parametric sigmoidal function

σ^{[m]}

of the form (6) for an activation function. In Section 3 we construct an approximation formula

S_{N}^{[m]} f (x)

in (19) based on the proposed sigmoidal function

σ^{[m]}

. It is shown that

S_{N}^{[m]} f (x)

approximates every given data with error

O (δ^{m})

,

0 < δ < 1

, for the parameter m large enough. This implies the so-called quasi-interpolation property of the presented FNN approximation. Furthermore, in order to better the interpolation errors near the end-points of the given interval, a correction formula (27) is introduced in Section 4. The efficiency of the presented FNN approximation is demonstrated by the numerical results for the data sets extracted from continuous and discontinuous functions. The aforementioned efficiency means that the proposed method requires less neurons to reach similar or lower error levels than the compared FNN approximation method using the conventional logistic function.

In addition, an extended FNN approximation formula for two variable functions is proposed in Section 5 with some numerical examples showing the superiority of the presented FNN approximation method.

2. A Parametric Sigmoidal Function

The role of the activation function in the artificial neural networks is to introduce non-linearity of the input data into the output of the neural network. One of the useful activation functions commonly used in practice is the sigmoidal function

σ

having the property below.

σ (t) \to \{\begin{matrix} 0 & as t \to - \infty \\ 1 & as t \to \infty \end{matrix}

(2)

For example, two traditional sigmoidal functions are

(i): Heaviside function:

$σ_{H} (t) = \{\begin{matrix} 0, & t < 0 \\ 1, & t > 0 \end{matrix}$

(3)
(ii): Logistic function:

$σ_{L} (t) = \frac{1}{1 + e^{- t}} = \frac{1}{2} \{1 + tanh (t / 2)\}, - \infty < t < \infty$

(4)

We recall the following approximation theorem shown in the literature [6].

Theorem 1.

(Costarelli and Spigler [6]) For a bounded sigmoidal function σ and a function

f \in C [a, b]

let

G_{N} f

be a neural network approximation to f of the form

G_{N} f (x) = f (x_{0}) σ (ω (x - x_{- 1})) + \sum_{k = 1}^{N} \{f (x_{k}) - f (x_{k - 1})\} σ (ω (x - x_{k}))

(5)

for

x \in [a, b]

,

h = (b - a) / N

, and

x_{k} = a + k h

(k = - 1, 0, 1, \dots, N)

. Then for every

ϵ > 0

there exists an integer

N > 0

and a real number

ω > 0

such that

{∥G_{N} f - f∥}_{\infty} < ϵ .

Sigmoidal functions have been used in various applications including the artificial neural networks (See the literature [12,13,14,15,16,17]). In this work we employ an algebraic type sigmoidal function, containing a parameter

m > 0

, as follows.

σ^{[m]} (t) = \{\begin{matrix} 0, & t < - L \\ \frac{{(L + t)}^{m}}{{(L + t)}^{m} + {(L - t)}^{m}}, & | t | \leq L \\ 1, & t > L \end{matrix}

(6)

for a fixed

L > 0

. This function has the following properties.

(A1): $σ^{[m]}$ is strictly increasing over $[- L, L]$ and $σ^{[m]} \in C^{\infty} (- L, L) \cap C^{m - 1} (R)$ for an integer $m \geq 1$ . In addition, referring to the literature [12], we can see that the Hausdorff distance d between the heaviside function $σ_{H}$ and the presented sigmoidal function $σ^{[m]}$ satisfies

${(\frac{L + d}{L - d})}^{m} = \frac{1}{d} - 1, 0 < d < \min \{\frac{1}{2}, L\} .$

(7)

That is, $m = O (\frac{ln (1 / d)}{ln (1 + d)})$ for d small enough.
(A2): For m large enough $σ^{[m]}$ has the asymptotic behavior

$σ^{[m]} (t) = \{\begin{matrix} O (θ {(t)}^{m}), & - L \leq t < 0 \\ 1 + O (θ {(t)}^{m}), & 0 < t \leq L \end{matrix}$

(8)

where

$θ (t) = {(\frac{L - t}{L + t})}^{sgn (t)},$

(9)

satisfying $0 \leq θ (t) < 1$ for all $t \in [- L, L] \ {0}$ . In addition, for any integer $m \geq 2$

$\frac{d^{j}}{d x^{j}} σ^{[m]} (\pm L) = 0, j = 1, 2, \dots, m - 1 .$

(10)
(A3): For every $m > 0$

$σ^{[m]} (- t) + σ^{[m]} (t) = 1, t \in R$

(11)

with $σ^{[m]} (0) = \frac{1}{2}$ .

3. Constructing a Neural Network Approximation

Suppose for a real valued function

f (x)

,

a \leq x \leq b

, a set of data

{f_{k} = f (x_{k}) | k = 0, 1, 2, \dots, N}

is given, where

N \geq 2

is an integer and

x_{k}

are nodes on the interval

[a, b]

. For simplicity, we assume equally spaced nodes as

x_{k} = a + k \cdot h, h = (b - a) / N .

(12)

We can observe that, for sufficiently large m, the function

σ^{[m]}

with

L = b - a

in (6) satisfies

σ^{[m]} (t) \approx \{\begin{matrix} 0, & t < 0 \\ 1, & t > 0 \end{matrix}

(13)

due to the property (A2).

Moreover, noting that

σ^{[m]}

is an increasing function as mentioned in (A1), we can see that

σ^{[m]} (t) < \frac{1}{N}, for all t < {(σ^{[m]})}^{- 1} (\frac{1}{N}) = - L \cdot \frac{{(N - 1)}^{1 / m} - 1}{{(N - 1)}^{1 / m} + 1}

(14)

and from the property (A3)

1 - σ^{[m]} (t) < \frac{1}{N}, for all t > L \cdot \frac{{(N - 1)}^{1 / m} - 1}{{(N - 1)}^{1 / m} + 1} .

(15)

To find a lower bound of the parameter m we set

L \cdot \frac{{(N - 1)}^{1 / m} - 1}{{(N - 1)}^{1 / m} + 1} = h (= L / N)

. Then we have the lower bound

m = m^{*}

, satisfying this equation, as

m^{*} = \frac{log N - 1}{log \frac{N + 1}{N - 1}} .

(16)

That is, for every

m > m^{*}

it follows that

σ^{[m]} (t) < \frac{1}{N}, for all t < - h

(17)

and

1 - σ^{[m]} (t) < \frac{1}{N}, for all t > h .

(18)

The lower bound

m^{*}

given in (16) will be used for a threshold of the parameter m in the numerical implementation of the proposed neural network approximation later.

Referring to the above features of

σ^{[m]}

in (13), (17) and (18), we propose a superposition of

σ^{[m]}

to approximate the given data

{f_{k} = f (x_{k}) | k = 0, 1, 2, \dots, N}

as follows.

S_{N}^{[m]} f (x) = f_{0} + \sum_{k = 1}^{N} (f_{k} - f_{k - 1}) σ^{[m]} (x - {\bar{x}}_{k}), x_{0} \leq x \leq x_{N},

(19)

where

{\bar{x}}_{k} = (x_{k - 1} + x_{k}) / 2 = x_{k} - h / 2

.

We can see that

S_{N}^{[m]} f (x)

interpolates

f (x)

at

N + 1

nodes, approximately, as implied in the following theorem. Thus we call

S_{N}^{[m]} f (x)

a quasi-interpolation of

f (x)

.

Theorem 2.

The FNN

S_{N}^{[m]} f (x)

with m large enough as defined in

(19)

satisfies

S_{N}^{[m]} f (x_{j}) = f_{j} + C_{j} f^{″} (ξ_{j}) h^{2} δ^{m}, 1 \leq j \leq N - 1

(20)

for some

ξ_{j} \in (x_{j - 1}, x_{j + 1})

,

0 < δ < 1

and a constant

C_{j}

. Moreover,

S_{N}^{[m]} f (x_{0}) = f_{0} + C_{0} f^{'} (ξ_{0}) h δ^{m}, S_{N}^{[m]} f (x_{N}) = f_{N} + C_{N} f^{'} (ξ_{N}) h δ^{m},

for some

ξ_{0} \in (x_{0}, x_{1})

,

ξ_{N} \in (x_{N - 1}, x_{N})

and constants

C_{0}, C_{N}

.

Proof.

Since

σ^{[m]}

is an increasing function and it satisfies the asymptotic behaviour in (8), for each

1 \leq j \leq N - 1

with m large enough, we have

\begin{matrix} S_{N}^{[m]} f (x_{j}) & \sim f_{j - 1} + (f_{j} - f_{j - 1}) σ^{[m]} (\frac{h}{2}) + (f_{j + 1} - f_{j}) σ^{[m]} (- \frac{h}{2}) \\ = f_{j} \{1 - σ^{[m]} (- \frac{h}{2})\} + f_{j - 1} σ^{[m]} (- \frac{h}{2}) + (f_{j + 1} - f_{j}) σ^{[m]} (- \frac{h}{2}) \\ = f_{j} + \{f_{j - 1} - 2 f_{j} + f_{j + 1}\} σ^{[m]} (- \frac{h}{2}) . \end{matrix}

The second equation above results from the relation

σ^{[m]} (\frac{h}{2}) = \{1 - σ^{[m]} (- \frac{h}{2})\}

based on the property (A3). Denoting by

Δ

and

Δ^{2}

the first and the second forward difference operators, respectively, and using the function

θ (t)

defined in (9), we have

S_{N}^{[m]} f (x_{j}) = f_{j} + Δ^{2} f_{j - 1} O (θ {(- \frac{h}{2})}^{m}) .

Since

Δ^{2} f_{j - 1} = f^{″} (ξ_{j}) h^{2}

for some

ξ_{j} \in (x_{j - 1}, x_{j + 1})

, setting

δ = θ (- \frac{h}{2})

, we have the formula (20).

On the other hand, for

x = x_{0}

and m large enough

S_{N}^{[m]} f (x_{0}) \sim f_{0} + (f_{1} - f_{0}) σ^{[m]} (- \frac{h}{2}) = f_{0} + Δ f_{0} O (θ {(- \frac{h}{2})}^{m}) .

Since

Δ f_{0} = f^{'} (ξ_{0}) h

for some

ξ_{0} \in (x_{0}, x_{1})

, we have

S_{N}^{[m]} f (x_{0}) = f_{0} + C_{0} f^{'} (ξ_{0}) h δ^{m}

for a constant

C_{0}

. For

x = x_{N}

and m large enough

\begin{matrix} S_{N}^{[m]} f (x_{N}) & \sim f_{N - 1} + (f_{N} - f_{N - 1}) σ^{[m]} (\frac{h}{2}) \\ = f_{N - 1} + (f_{N} - f_{N - 1}) \{1 - σ^{[m]} (- \frac{h}{2})\} \\ = f_{N} - (f_{N} - f_{N - 1}) σ^{[m]} (- \frac{h}{2}) \\ = f_{N} + Δ f_{N - 1} O (θ {(- \frac{h}{2})}^{m}) . \end{matrix}

Since

Δ f_{N - 1} = f^{'} (ξ_{N}) h

for some

ξ_{N} \in (x_{N - 1}, x_{N})

, we have

S_{N}^{[m]} f (x_{N}) = f_{N} + C_{N} f^{'} (ξ_{N}) h δ^{m}

for a constant

C_{N}

. Thus the proof is completed. □

Theorem 2 implies that, for N fixed(i.e., h fixed), approximation errors of

S_{N}^{[m]} f (x)

at every nodes can be accelerated by increasing the value of the parameter m.

The sum

S_{N}^{[m]} f (x)

in (19) can be written by

\begin{matrix} S_{N}^{[m]} f (x) & = f_{0} \{1 - σ^{[m]} (x - {\bar{x}}_{1})\} + f_{N} σ^{[m]} (x - {\bar{x}}_{N}) \\ + \sum_{k = 1}^{N - 1} f_{k} \{σ^{[m]} (x - {\bar{x}}_{k}) - σ^{[m]} (x - {\bar{x}}_{k + 1})\} . \end{matrix}

(21)

Using a function

ψ^{[m]}

defined as

ψ^{[m]} (t) = σ^{[m]} (t + h / 2) - σ^{[m]} (t - h / 2), - L \leq t \leq L,

(22)

with

L = b - a

, satisfying

0 \leq ψ^{[m]} (t) \leq 1

for all t, we may rewrite

S_{N}^{[m]} f (x)

by

S_{N}^{[m]} f (x) = f_{0} \{1 - σ^{[m]} (x - {\bar{x}}_{1})\} + f_{N} σ^{[m]} (x - {\bar{x}}_{N}) + \sum_{k = 1}^{N - 1} f_{k} ψ^{[m]} (x - x_{k})

(23)

for

x_{0} \leq x \leq x_{N}

. In fact, it follows that

ψ^{[m]} (x - x_{k}) = σ^{[m]} (x - {\bar{x}}_{k}) - σ^{[m]} (x - {\bar{x}}_{k + 1}), 1 \leq k \leq N - 1 .

(24)

The formula (23) is a form of the feed-forward neural networks based on the activation function

ψ^{[m]}

with constant weights

w_{k} = 1

and thresholds

x_{k}

.

Under the assumption that m is large enough, the proposed quasi-interpolation

S_{N}^{[m]} f (x)

in (23) has the following properties:

(B1): Since $1 - σ^{[m]} (x - {\bar{x}}_{1}) \approx ψ^{[m]} (x - x_{0})$ and $σ^{[m]} (x - {\bar{x}}_{N}) \approx ψ^{[m]} (x - x_{N})$ over the interval $[a, b] = [x_{0}, x_{N}]$ , it follows that

$S_{N}^{[m]} f (x) \approx \sum_{k = 0}^{N} f_{k} ψ^{[m]} (x - x_{k}), x_{0} \leq x \leq x_{N} .$

(25)
(B2): For each $k = 0, 1, 2, \dots, N$ ,

$ψ^{[m]} (x_{j} - x_{k}) \approx \{\begin{matrix} 1, & j = k \\ 0, & j \neq k \end{matrix}$

and

$ψ^{[m]} ({\bar{x}}_{k} - x_{k}) = ψ^{[m]} ({\bar{x}}_{k + 1} - x_{k}) = ψ^{[m]} (\pm \frac{h}{2}) \approx \frac{1}{2} .$

Graphs of the activation functions,

σ_{k}^{[m]} (x) : = σ^{[m]} (x - {\bar{x}}_{k})

and

ψ_{k}^{[m]} (x) : = ψ^{[m]} (x - x_{k})

shown in Figure 1 illustrate the intuition of the construction of the presented quasi-interpolation

S_{N}^{[m]} f (x)

. In addition, Figure 2 includes the graphs of

ψ_{k}^{[m]} (x)

with respect to the values

m = 1, 2, 4, 16

, which shows that

ψ_{k}^{[m]} (x)

becomes flatter near the node

x_{k}

and far from the node as the parameter m goes higher.

It is well known that the interpolants for continuous functions are guaranteed to be good if and only if the Lebesgue constants are small [15]. Regarding the formula (25) as an interpolation with equispaced points

{x_{k}}_{k = 0}^{N}

, its Lebesgue function satisfies

λ_{N} (x) : = \sum_{k = 0}^{N} |ψ^{[m]} (x - x_{k})| = \sum_{k = 0}^{N} ψ^{[m]} (x - x_{k}) \approx 1

for all x, and thus the corresponding Lebesgue constant becomes

Λ_{N} = {∥λ_{N} (x)∥}_{\infty} \approx 1

. Noting that for the polynomial interpolation, the Lebesgue constant grows exponentially such as

Λ_{N} \sim \frac{2^{N + 1}}{e N log N}

as

N \to \infty

, we may expect that

S_{N}^{[m]} f (x)

will be better than the polynomial interpolation in approximation to any continuous function, at least.

4. Correction Formula

In order to improve the interpolation errors near the end-points of the given interval, that is, to make the formula (20) in Theorem 2 hold for all

0 \leq j \leq N

, we employ two values at the points

x_{- 1} : = x_{0} - h = a - h

and

x_{N + 1} : = x_{N} + h = b + h

defined as

f_{- 1} = 2 f_{0} - f_{1}, f_{N + 1} = 2 f_{N} - f_{N - 1} .

(26)

Using these additional data, we define a correction formula of (23) as

S_{N}^{[m]} f (x) = f_{- 1} \{1 - σ^{[m]} (x - {\bar{x}}_{0})\} + f_{N + 1} (x) σ^{[m]} (x - {\bar{x}}_{N + 1}) + \sum_{k = 0}^{N} f_{k} ψ^{[m]} (x - x_{k}) .

(27)

To explore the availability of the proposed approximation method (27), we consider the following examples which were employed in the literature [6].

Example 1.

A smooth function on the interval

[- 5, 5]

.

f_{1} (x) = (2 + {cos}^{2} x) sin x + 2 x + \frac{x^{2}}{8} + 4, - 5 \leq x \leq 5,

Example 2.

A function with jump-discontinuities.

f_{2} (x) = \{\begin{matrix} \frac{4}{x^{2} - 2}, & x < - 2 \\ - 3, & - 2 \leq x < 0 \\ \frac{5}{2}, & 0 \leq x < 2 \\ \frac{3 x + 2}{x^{3} - 1}, & x \geq 2, \end{matrix}

We compare the results of the presented method with those of the existing neural network approximation method (5) using the activation function

σ = σ_{L}

in (4). In the literature [6], it was proved that Theorem 1 holds if the weight

ω

is chosen such as

ω > \frac{N}{L} log (N - 1), L = b - a .

(28)

In practice, we have used

ω = N^{2} / L

in implementation of the existing FNN

G_{N} f (x)

in (5) for the examples above. The high level software, Mathematica(V.10) has been used as a programming tool throughout the numerical performance for the examples.

For the smooth function

f_{1}

in Example 1, approximations of the proposed FNN

S_{N}^{[m]} f_{1} (x)

, with small number of neurons (

N = 10

) are shown in Figure 3 with respect to each parameter

m = 10, 15, 20, 30

. The higher the value of m is, the more clearly

S_{N}^{[m]} f_{1} (x)

reveals the so-called quasi-interpolation property as shown in Theorem 2. Moreover, Figure 4 shows errors of

S_{N}^{[m]} f_{1} (x)

with

m = 2 N > m^{*}

, for

m^{*}

the lower bound of m as given in (16), compared with errors of

G_{N} f_{1} (x)

for

N = 10, 20, 30, \dots, 80

. Therein the errors are defined as

{∥S_{N}^{[m]} f_{1} (x) - f_{1}∥}_{\infty} / {∥f_{1}∥}_{\infty}

and

{∥G_{N} f_{1} (x) - f_{1}∥}_{\infty} / {∥f_{1}∥}_{\infty}

. The figure illustrates that the presented FNN is superior to the existing FNN

G_{N} f_{1} (x)

for continuous test function

f_{1}

.

For the discontinuous function

f_{2}

in Example 2, approximations of the proposed FNN

S_{N}^{[m]} f_{2} (x)

, with small number of neurons (

N = 10

), are shown in Figure 5 with respect to each

m = 10, 20, 40, 80

. In addition, approximations of

S_{N}^{[m]} f_{2} (x)

for various values

N = 10, 20, 40, 80

, with

m = 4 N

, are also given in Figure 6. One can see that the results of the presented method

S_{N}^{[m]} f_{2} (x)

are better than those of

G_{N} f_{2} (x)

shown in Figure 7. On the other hand, it is noted that the FNN approximations are free from the so-called Gibbs phenomenon, generating wiggles (i.e., overshoots and undershoots) near the jump-discontinuity, which appears inevitably in partial sum approximations composed of the polynomial or trigonometric base functions in general.

5. Multivariate Approximation

For simplicity we consider a function of two variables

g (x, y)

on a region

[a, b] \times [c, d] \subset R^{2}

, and assume that a set of data

{\{g_{i j} = g (x_{i}, y_{j})\}}_{0 \leq i, j \leq N}

is given for the nodes

x_{i} = i \cdot h_{x}, y_{j} = j \cdot h_{y},

where

h_{x} = (b - a) / N

,

h_{y} = (d - c) / N

. Set activation functions

ψ_{i, j}^{[m]} (x, y) = σ^{[m]} (χ_{i} (x) | x - {\bar{x}}_{i} |) \cdot σ^{[m]} (χ_{j} (y) | y - {\bar{y}}_{j} |)

(29)

for

0 \leq i, j \leq N

, where

{\bar{ξ}}_{k} = (ξ_{k - 1} + ξ_{k}) / 2

,

χ_{k} (ξ) = \{\begin{matrix} 1, & ξ_{k - 1} < ξ \leq ξ_{k} \\ - 1, & otherwise \end{matrix}

(30)

and

σ^{[m]}

is the parametric sigmoidal function in (6). Then, referring to the formula (25) under the assumption that m is large enough, we define an extended version of the FNN approximation to g as

\begin{matrix} S_{N}^{[m]} g (x, y) = \sum_{i = 0}^{N} \sum_{j = 0}^{N} g_{i j} ψ_{i, j}^{[m]} (x, y) . \end{matrix}

(31)

To testify the efficiency of the presented method (31), we choose functions of two variables below. In the numerical implementation for the examples the software, gnuplot(V.5) was used as it is rather fast for evaluating and graphing on two dimensional region.

Example 3.

g_{1} (x, y) = \frac{sin (x^{2} + y^{2} + 1)}{x^{2} + y^{2} + 1}, - π \leq x, y \leq π .

(32)

Example 4.

g_{2} (x, y) = \frac{x^{5} + y^{4}}{x^{2} + 1}, - 2 \leq x, y \leq 2 .

(33)

Figure 8 shows the approximations of the presented method

S_{N}^{[m]} g_{i} (x, y)

to the test functions

g_{i} (x, y)

,

i = 1, 2

, for

N = 30

with

m = 120

. We can see that

S_{N}^{[m]} g_{i} (x, y)

approximates

g_{i} (x, y)

properly over the whole region, while the existing method

G_{N} g_{i} (x, y)

given in the literature [6] produces considerable errors as shown in Figure 9.

6. Conclusions

In this work we proposed an FNN approximation method based on a new parametric sigmoidal activation function

σ^{[m]}

. It has been shown that the presented method with the parameter m large enough has a feature of the quasi-interpolation at the given nodes. As a result, we can note that the presented method is better than the existing FNN approximation method as demonstrated by the numerical results for several examples of univariate continuous and discontinuous functions. Additionally, the availability of the method in extended application to the multivariate function was illustrated.

Funding

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017 R1A2B4007682).

Conflicts of Interest

The author declares no conflict of interest.

References

Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signal. 1989, 2, 303–314. [Google Scholar] [CrossRef]
Funahashi, K.I. On the approximate realization of continuous mappings by neural networks. Neural Netw. 1989, 2, 183–192. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networs. Neural Netw. 1990, 3, 551–560. [Google Scholar] [CrossRef]
Barron, A.R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 1993, 39, 930–945. [Google Scholar] [CrossRef]
Chen, Z.; Cao, F. The approximation operators with sigmoidal functions. Comput. Math. Appl. 2009, 58, 758–765. [Google Scholar] [CrossRef] [Green Version]
Costarelli, D.; Spigler, R. Constructive approximation by superposition of sigmoidal functions. Anal. Theory Appl. 2013, 29, 169–196. [Google Scholar]
Hahm, N.; Hong, B.I. An approximation by neural networks with a fixed weight. Compu.t Math. Appl. 2004, 47, 1897–1903. [Google Scholar] [CrossRef]
Cao, F.L.; Xie, T.F.; Xu, Z.B. The estimate for approximation error of neural networks: A constructive approach. Neurocomputing 2008, 71, 626–630. [Google Scholar] [CrossRef]
Chui, C.K.; Li, X. Approximation by ridge functions and neural networks with one hidden layer. J. Approx. Theory 1992, 70, 131–141. [Google Scholar] [CrossRef] [Green Version]
Ferrari, S.; Stengel, R.F. Smooth Function Approximation Using Neural Networks. IEEE Trans. Neural Netw. 2005, 16, 24–38. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Suzuki, S. Constructive functions-approximation by three-layer artificial neural networks. Neural Netw. 1998, 11, 1049–1058. [Google Scholar] [CrossRef]
Kyurkchiev, N.; Markov, S. Sigmoidal functions: Some computational and modelling aspects. Biomath Commun. 2014, 1. [Google Scholar] [CrossRef]
Markov, S. Cell growth models using reaction schemes: Batch cultivation. Biomath 2013, 2. [Google Scholar] [CrossRef]
Prössdorf, S.; Rathsfeld, A. On an integral equation of the first kind arising from a cruciform crack problem. In Integral Equations and Inverse Problems; Petkov, V., Lazarov, R., Eds.; Longman: Coventry, UK, 1991; pp. 210–219. [Google Scholar]
Trefethen, L.N. Approximation Theory and Approximation Practice; SIAM: Oxford, UK, 2013; pp. 107–115. [Google Scholar]
Yun, B.I. An extended sigmoidal transformation technique for evaluating weakly singular integrals without splitting the integration interval. SIAM J. Sci. Comput. 2003, 25, 284–301. [Google Scholar] [CrossRef]
Yun, B.I. A smoothening method for the piecewise linear interpolation. J. Appl. Math. 2015, 2015, 376362. [Google Scholar] [CrossRef]

Figure 1. Graphs of the sigmoidal functions

σ_{k}^{[m]} (x)

in (a) and those of

ψ_{k}^{[m]} (x)

in (b).

Figure 1. Graphs of the sigmoidal functions

σ_{k}^{[m]} (x)

in (a) and those of

ψ_{k}^{[m]} (x)

in (b).

Figure 2. Graphs of

ψ_{k}^{[m]} (x)

for each

m = 1

, 2, 4, and 16.

Figure 2. Graphs of

ψ_{k}^{[m]} (x)

for each

m = 1

, 2, 4, and 16.

Figure 3. Approximations to

f_{1} (x)

by the presented feed-forward neural networks (FNN)

S_{N}^{[m]} f_{1} (x)

with

N = 10

for each

m = 10, 15, 20, 30

.

Figure 3. Approximations to

f_{1} (x)

by the presented feed-forward neural networks (FNN)

S_{N}^{[m]} f_{1} (x)

with

N = 10

for each

m = 10, 15, 20, 30

.

Figure 4. Errors of the presented FNN approximations

S_{N}^{[m]} f_{1} (x)

with

m = 2 N

and the existing FNN approximations

G_{N} f_{1} (x)

for

N = 10, 20, 30, \dots, 80

.

Figure 4. Errors of the presented FNN approximations

S_{N}^{[m]} f_{1} (x)

with

m = 2 N

and the existing FNN approximations

G_{N} f_{1} (x)

for

N = 10, 20, 30, \dots, 80

.

Figure 5. Approximations to

f_{2} (x)

by the presented FNN

S_{N}^{[m]} f_{2} (x)

with

N = 10

for each

m = 10, 20, 40, 80

.

Figure 5. Approximations to

f_{2} (x)

by the presented FNN

S_{N}^{[m]} f_{2} (x)

with

N = 10

for each

m = 10, 20, 40, 80

.

Figure 6. Approximations to

f_{2} (x)

by the presented FNN

S_{N}^{[m]} f_{2} (x)

for each

N = 10, 20, 40, 80

with

m = 4 N

.

Figure 6. Approximations to

f_{2} (x)

by the presented FNN

S_{N}^{[m]} f_{2} (x)

for each

N = 10, 20, 40, 80

with

m = 4 N

.

Figure 7. Approximations to

f_{2} (x)

by the existing FNN

G_{N} f_{2} (x)

for each

N = 10, 20, 40, 80

.

Figure 7. Approximations to

f_{2} (x)

by the existing FNN

G_{N} f_{2} (x)

for each

N = 10, 20, 40, 80

.

Figure 8. Test functions

g_{i} (x, y)

(: upper row),

i = 1, 2

, and their approximations by the presented FNN

S_{N}^{[m]} g_{i} (x, y)

(: lower row) for

N = 30

with

m = 120

.

Figure 8. Test functions

g_{i} (x, y)

(: upper row),

i = 1, 2

, and their approximations by the presented FNN

S_{N}^{[m]} g_{i} (x, y)

(: lower row) for

N = 30

with

m = 120

.

Figure 9. Test functions

g_{i} (x, y)

(: upper row),

i = 1, 2

, and their approximations by the existing FNN approximations

G_{N} g_{i} (x, y)

(: lower row) for

N = 30

.

Figure 9. Test functions

g_{i} (x, y)

(: upper row),

i = 1, 2

, and their approximations by the existing FNN approximations

G_{N} g_{i} (x, y)

(: lower row) for

N = 30

.

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yun, B.I. A Neural Network Approximation Based on a Parametric Sigmoidal Function. Mathematics 2019, 7, 262. https://doi.org/10.3390/math7030262

AMA Style

Yun BI. A Neural Network Approximation Based on a Parametric Sigmoidal Function. Mathematics. 2019; 7(3):262. https://doi.org/10.3390/math7030262

Chicago/Turabian Style

Yun, Beong In. 2019. "A Neural Network Approximation Based on a Parametric Sigmoidal Function" Mathematics 7, no. 3: 262. https://doi.org/10.3390/math7030262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Neural Network Approximation Based on a Parametric Sigmoidal Function

Abstract

1. Introduction

2. A Parametric Sigmoidal Function

3. Constructing a Neural Network Approximation

4. Correction Formula

5. Multivariate Approximation

6. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI