On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks

Kulyukin, Vladimir A.

doi:10.3390/math11204309

Open AccessArticle

On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks

by

Vladimir A. Kulyukin

Department of Computer Science, Utah State University, Logan, UT 84322, USA

Mathematics 2023, 11(20), 4309; https://doi.org/10.3390/math11204309

Submission received: 30 August 2023 / Revised: 29 September 2023 / Accepted: 9 October 2023 / Published: 16 October 2023

(This article belongs to the Section Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

We show that, for a primitive recursive function

h (x, t)

, where

x

is a n-tuple of natural numbers and t is a natural number, there exists a feedforward artificial neural network

N (x, t)

, such that for any n-tuple of natural numbers

z

and a positive natural number m, the first

m + 1

terms of the sequence

{h (z, t)}

are the same as the terms of the tuple

(N (z, 0), \dots,

N (z, m))

.

Keywords:

computability theory; theory of recursive functions; artificial neural networks

MSC:

03D75; 03D80

1. Introduction

Primitive recursive functions describe, albeit incompletely, the intuitive notion of a number-theoretic algorithm, a deterministic procedure to transform numerical inputs to numerical outputs in finitely many steps. This perception of primitive recursive functions as intuitive counterparts of number-theoretic algorithms may be rooted in the fact that any primitive recursive function can be mechanically constructed from a set of initial functions with finitely many applications of simple, well-defined operations of composition and primitive recursion. These functions and some of their properties have been investigated by Gödel [1], Péter [2,3], Kleene [4], Davis [5], and Rogers [6] in their studies of formal systems, foundations of mathematics, and computability theory. Although the confinement of the construction procedure to two operations may at first seem restrictive, many functions on natural numbers ordinarily encountered in mathematics and computer science are, in fact, primitive recursive (cf., e.g., Ch. 3 in [7]). Primitive recursive functions have been used to investigate the foundations of functional programming. Colson [8] presents a computational model in which a primitive recursive function is viewed as a rewriting system and gives a non-trivial necessary condition for an algorithm to be representable in the system. Paolini et al. [9] define a class of recursive permutations, which they call Reversible Primitive Permutations (RPP), and formalize it as a language that is sufficiently expressive to represent all primitive recursive functions. Petersen [10] uses induction and primitive recursion to develop resource conscious logics where the repeated recycling of assumptions, e.g., repeated applications of the successor function

f (n) = n + 1

to enumerate natural numbers, has costs.

Feedforward artificial neural networks have their origin in the research by McCulloch and Pitts [11], which describes neural events with propositional logic. McCulloch and Pitts assume that the human nervous system is a finite set of neurons, each of which has an excitation threshold. When a neuron’s threshold is exceeded, the neuron generates an impulse that propagates to other neurons across synapses connecting them to the origin of the impulse. A fundamental insight by McCulloch and Pitts is that if the response of a neuron can be formalized as a logical proposition specifying its stimulus, then behaviors of complex networks of neurons can, in principle, be described with symbolic logic. Artificial neural networks entered mainstream computer science almost half a century after the research by McCulloch and Pitts when Rumelhart, Hinton, and Williams [12] discovered backpropagation, a method for training networks to modify synapse weights by minimizing error between the output and the ground truth. Different types of such networks have been shown to be universal approximators of some classes of functions (e.g., [13,14,15]). Artificial neural networks are increasingly used in embedded artificial intelligence (AI) systems, i.e., systems that run on computational devices with finite amounts of computer memory (e.g., [16]). We will refer to embedded AI as finite AI to emphasize the fact that finite AI systems are realized on computational devices with finite amounts of computational memory.

In this investigation, we seek to relate, in a formal way, primitive recursive functions and feedforward artificial neural networks by investigating whether it is possible, for a given primitive recursive function, to construct a feedforward artificial neural network that arbitrarily computes many values of the function’s co-domain from the corresponding values of the function’s domain. We hope that our investigation contributes to the knowledge of the classes of functions that can be not only approximated, but provably computed by feedforward artificial neural networks. In particular, we formalize feedforward artificial neural networks with recurrence equations, propose a formal definition of the concept of

N

-computability, i.e., the property of a function to be computed by a feedforward artificial neural network

N

, and prove several lemmas and theorems to show how feedforward artificial neural networks can be constructed to arbitrarily compute many consecutive values of any primitive recursive function. Since these networks consist of finite sets of neurons and are used in some finite AI systems [17,18], our investigation will be of interest to mathematicians and computer scientists interested in the computability theory of finite AI.

The remainder of our article is organized as follows. In Section 2, we review several definitions of primitive recursive functions starting with the original definition by Gödel [1] and proceeding to the later definitions by Kleene [4], Davis [5], Rogers [6], and Davis et al. [7], and Meyer and Ritchie [19]. This section gives the reader a historical bird’s eye view of how the concept of primitive recursive function and its formalization have co-evolved in time. In Section 3, we state the notational conventions and give the definition of a primitive recursive function we use in this article. This section is intended for reference. In Section 4, we offer a formalization of feedforward artificial neural networks in terms of recursive equations. In Section 5, we prove several lemmas and theorems that form the bulk of our theoretical investigation. In Section 6, we present some perspectives on the obtained results and summarize our conclusions in Section 7.

2. Recursive Functions

Gödel [1] (Sec. 2, p. 157) describes the class of number-theoretic functions as the class of functions whose domains are non-negative integers or n-tuples thereof and whose values are non-negative integers. Gödel [1] (Sec. 2, pp. 157–159) states that a number-theoretic function

ϕ (x_{1}, x_{2}, \dots, x_{n})

is recursively defined in terms of the number-theoretic functions

ψ (x_{1}, x_{2}, \dots, x_{n - 1})

and

μ (x_{1}, x_{2}, \dots x_{n + 1})

if

ϕ

is obtained from

ψ

and

μ

by the following schema:

\begin{matrix} ϕ (0, x_{2}, \dots, x_{n}) & = & ψ (x_{2}, \dots, x_{n}), \\ ϕ (k + 1, x_{2}, \dots, x_{n}) & = & μ (k, ϕ (k, x_{2}, \dots, x_{n}), x_{2}, \dots, x_{n}), \end{matrix}

(1)

where the equalities hold for all

k, x_{2}, \dots, x_{n}

. Gödel [1] (Sec. 2, p. 159) defines a number-theoretic function

ϕ

to be recursive if there exists a finite sequence of number-theoretic functions

ϕ_{1}

,

ϕ_{2}

, …,

ϕ_{n} = ϕ

, where each function

ϕ_{i}

,

1 \leq i \leq n

, is a natural number constant, the successor function

x + 1

, or is defined from two preceding functions with Schema (1) or from one preceding function by substitution, i.e., the replacement of the arguments of a preceding function with some other preceding functions.

Kleene [4] (Chap. IX, § 43, p. 219) defines a number-theoretic function to be primitive recursive if it is definable by a finite number of applications of the six schemata in (2), where m and n are positive integers, i is an integer such that

1 \leq i \leq n

, q is a natural number, and

ψ, χ_{1}, \dots, χ_{m},

and

χ

are number-theoretic functions with the indicated numbers of arguments.

\begin{matrix} (I) & ϕ (x) = x + 1; \\ (I I) & ϕ (x_{1}, \dots, x_{n}) = q; \\ (I I I) & ϕ (x_{1}, \dots, x_{n}) = x_{i}; \\ (I V) & ϕ (x_{1}, \dots, x_{n}) = ψ (χ_{1} (x_{1}, \dots, x_{n}), \dots, χ_{m} (x_{1}, \dots, x_{n})); \\ (V a) & \{\begin{matrix} ϕ (0) & = q, \\ ϕ (y + 1) & = χ (y, ϕ (y)); \end{matrix} \\ (V b) & \{\begin{matrix} ϕ (0, x_{2}, \dots, x_{n}) & = ψ (x_{2}, \dots, x_{n}), \\ ϕ (y + 1, x_{2}, \dots, x_{n}) & = χ (y, ϕ (y, x_{2}, \dots, x_{n}), x_{2}, \dots, x_{n}) . \end{matrix} \end{matrix}

(2)

Schema (I) defines the successor function, Schema (II) defines constant functions, and Schema (III) defines identity functions, which Kleene denotes with the symbol

U_{i}^{n}

. Kleene defines the functions satisfying Schemata (I), (II), and (III) in (2) as initial functions. Schema (IV) in (2) obtains

ϕ

from

ψ

,

χ_{1}, \dots, χ_{m}

by substitution. Schemata (Va) and (Vb) obtain

ϕ

from

χ

or from

χ

and

ψ

, respectively, by primitive recursion. Kleene [4] (Chap. XI, § 55, p. 275) defines a function to be general recursive in functions

ψ_{1}, \dots, ψ_{l}

if there is a system E of equations which defines

ϕ

recursively from

ψ_{1}, \dots, ψ_{l}

.

Davis [5] (Chap. 2, Sec. 2, p. 36) defines the operation of composition as the operation to obtain the function

h (x^{(n)})

from the functions

f (y^{(m)})

,

g_{1} (x^{(n)}), \dots, g_{m} (x^{(n)})

with Schema (3).

\begin{matrix} h (x^{(n)}) & = & f (g_{1} (x^{(n)}), \dots, g_{m} (x^{(n)})), \end{matrix}

(3)

where

y^{(m)}

and

x^{(n)}

are tuples of natural numbers with m and n elements, respectively. Davis [5] (Chap. 3, Sec. 4, p. 48) defines the operation of primitive recursion as the operation that uses Schema (4) to construct the function

h (x^{(n + 1)})

from the total functions

f (x^{(n)})

and

g (x^{(n + 2)})

, where

x^{(n)}

,

x^{(n + 1)}

, and

x^{(n + 2)}

are tuples of natural numbers with n,

n + 1

, and

n + 2

elements, respectively.

\begin{matrix} h (0, x^{(n)}) & = & f (x^{(n)}) \\ h (z + 1, x^{(n)}) & = & g (z, h (z, x^{(n)}), x^{(n)}) . \end{matrix}

(4)

For a set of natural numbers A, Davis [5] (Chap. 3, Sec. 4, p. 49) defines an A-primitive recursive function or a function primitive recursive in A as a function that can be obtained by a finite number of applications of composition (cf. Schema (3)) and primitive recursion (cf. Schema (4)) from the following j functions:

\begin{matrix} (1) & C_{A} (x); \\ (2) & S (x) & = & x + 1; \\ (3) & N (x) & = & 0; \\ (4) & U_{i}^{n} (x_{1}, \dots, x_{n}) & = & x_{i}, 1 \leq i \leq n, \end{matrix}

(5)

where

C_{A} (x)

is the characteristic function of A (i.e.,

C_{A} (x)

is a total function such that

C_{A} (x) = 1

if

x \in A

and

C_{A} (x) = 0

if

x \notin A

), and

S (x)

and

U_{i}^{n}

are identical to Kleene’s Schemata (I) and (III) in (2). Davis [5] (Chap. 3, Sec. 4, p. 49) defines a function f to be primitive recursive if it is ∅-primitive recursive, where ∅ denotes the empty set.

Rogers [6] (Chap. 1, § 1.2, p. 6) defines the class C of primitive recursive functions as the smallest class of functions such that

(1): All constant functions $λ x_{1} x_{2} \dots x_{k} [m]$ , are in C, $1 \leq k$ , $0 \leq m$ ;
(2): The successor function $λ x [x + 1]$ is in C;
(3): All identity functions $λ x_{1} \dots x_{k} [x_{i}]$ are in C, $1 \leq i \leq k$ ;
(4): If f is a function of k variables in C and $g_{1}, \dots, g_{k}$ are functions in C of m variables each, then the function $λ x_{1} \dots x_{m} [f (g_{1} (x_{1}, \dots, x_{m}), \dots, g_{k} (x_{1}, \dots, x_{m}))]$ is in C, $1 \leq k, m$ ;
(5): If h is a function of $k + 1$ variables in C, and g is a function of $k - 1$ variables in C, then the unique function f of k variables satisfying Schema (6) is also in C, $1 \leq k$ .

$\begin{matrix} f (0, x_{2}, \dots, x_{k}) & = & g (x_{2}, \dots, x_{k}), \\ f (y + 1, x_{2}, \dots, x_{k}) & = & h (y, f (y, x_{2}, \dots, x_{k}), x_{2}, \dots, x_{k}) . \end{matrix}$

(6)

Davis et al. [7] (Chap. 3, Sec. 3, p. 42) defines as initial the functions

s (x) = x + 1

,

n (x) = 0

, and

u_{i}^{n} (x_{1}, \dots, x_{n}) = x_{i}

,

1 \leq i \leq n

, and defines a function to be primitive recursive if it can be obtained from the initial functions by a finite number of applications of composition or primitive recursion where primitive recursion is defined by Schema (7) (Chap. 3, Sec. 2, p. 40 in [7]) and Schema (8) (Chap. 3, Sec. 2, p. 41 in [7]). In Schema (7), k is a natural number and g is a total function of two variables. In Schema (8), f and g are total functions of n and

n + 2

variables, respectively.

\begin{matrix} h (0) & = & k, \\ h (t + 1) & = & g (t, h (t)) \end{matrix}

(7)

\begin{matrix} h (x_{1}, \dots, x_{n}, 0) & = & f (x_{1}, \dots, x_{n}), \\ h (x_{1}, \dots, x_{n}, t + 1) & = & g (t, h (x_{1}, \dots, x_{n}, t), x_{1}, \dots, x_{n}) . \end{matrix}

(8)

2.1. Computability and Turing Machines

Davis [5] (Chap. 1, Sec. 2, p. 10) gives the following definition of partially computable and computable functions.

Definition 1.

An n-ary function

f (x_{1}, \dots, x_{n})

is partially computable if there exists a Turing machine Z such that

f (x_{1}, \dots, x_{n}) = Ψ_{Z}^{(n)} (x_{1}, \dots, x_{n}) .

In this case, we say that Z computes f. If, in addition,

f (x_{1}, \dots, x_{n})

is a total function, then it is called computable.

In subsequent chapters of his monograph (cf. Chap. 2 and Chap. 3 in [5]), Davis separates the notion of computability from Turing machines to make it possible “to demonstrate the computability of quite complex functions without referring back to the original definition of computability in terms of Turing machines” (cf. Ch. 3, Sec. 1, p. 41 in [5]).

Davis et al. [7] (Chap. 2) continue this treatment of computability by designing the programming language

L

and then defining partially computable and computable functions in terms of

L

programs, viz., finite sequences of

L

instructions. In

L

, the unique variable Y is designated as the output variable to store the output of an

L

program

P

on a given input.

X_{1}, X_{2}, . . .

are input variables and

Z_{1}, Z_{2}, . . .

are internal variables. All variables refer to natural numbers.

L

has conditional dispatch instructions, line labels, elementary arithmetic operations, comparisons of natural numbers, and macros.

Davis et al. [7] (Chap. 2, Sec. 3, p. 27) define a computation of an

L

program

P

on some inputs

x_{1}, \dots, x_{m}

, and

m > 0

, as a finite sequence of snapshots

(s_{1}, \dots, s_{k})

, where each snapshot

s_{i}

,

1 \leq i \leq k

,

k > 0

specifies the number of the instruction in

P

to be executed and the value of each variable in

P

, and where each subsequent snapshot is uniquely determined by the previous snapshot (Theorem 3.2, Chap. 4, Sec. 3, pp. 74–75 in [7]). The snapshot

s_{1}

is the initial snapshot, where the values of all input variables are set to their initial values, the program instruction counter is set to 1, i.e., the number of the first instruction in

P

, and the values of all the other variables in

P

are set to 0. The snapshot

s_{k}

in

(s_{1}, \dots, s_{k})

is a terminal snapshot, where the instruction counter is set to the number of the instructions in

P

plus 1. If some program

P

in

L

takes m inputs

X_{1} = x_{1}

,

X_{2} = x_{2}

, …,

X_{m} = x_{m}

, then

Ψ_{P}^{(m)} (x_{1}, x_{2}, \dots, x_{m}) = \{\begin{matrix} Y in s_{k} & if (s_{1}, \dots, s_{k}) is a computation, k \geq 1, \\ ↑ & o t h e r w i s e . \end{matrix}

(9)

The definitions of partially computable and computable functions are made by Davis et al. [7] (Chap. 2, Sec. 4, p. 30) in terms of

L

programs as follows.

Definition 2.

An n-ary function f is partially computable if f is partial and there is a

L

program

P

such that Equation (10) holds for all

x_{1}, \dots, x_{n}

.

f (x_{1}, \dots, x_{n}) = Ψ_{P}^{(n)} (x_{1}, \dots, x_{n}) .

(10)

Definition 3.

A n-ary function f is computable if it is total and partially computable.

Equation (10) in Definition 2 is interpreted so that

f (x_{1}, \dots, x_{n})

is defined if and only if

Ψ_{P}^{(n)} (x_{1}, \dots, x_{n})

is defined. This treatment of computable functions in terms of programs in a formal language is by no means the only one in the literature. For example, as early as 1967, Meyer and Ritchie [19] formalize primitive recursive functions as loop programs consisting of assignment and iteration statements similar to DO statements of the programming language FORTRAN.

2.2. Computability of Primitive Recursive Functions

Davis et al. [7] (Chap. 3, Sec. 3, p. 42) introduce the concept of a primitive recursively closed (PRC) class of functions, which is a class of total functions that contains the initial functions and any functions obtained from the initial functions by a finite number of applications of composition or primitive recursion. Davis et al. [7] (Chap. 3, Sec. 3, pp. 42–43) show that (1) the class of computable functions is PRC; (2) the class of primitive recursive functions is PRC; and (3) a function is primitive recursive if and only if it belongs to every PRC class. A corollary of (3) is that every primitive recursive function is computable.

Péter [2,3] shows it is possible to define functions in terms of recursive equations that are not primitive recursive. In particular, Péter demonstrates that all unary primitive recursive functions are enumerable, i.e.,

ϕ_{0} (x), ϕ_{1} (x), ϕ_{2} (x), \dots

is an enumeration, with repetitions, of all unary primitive recursive functions. By Cantor’s diagonalization (cf., e.g., pp. 6–8 in [4]), the unary function

f (x) = ϕ_{x} (x) + 1

is not in the enumeration and, hence, not primitive recursive. While f is not primitive recursive, it is computable (cf. Definition 3). Thus, the class of primitive recursive functions is a proper subset of computable functions and, in and of itself, cannot completely capture the intuitive notion of a number-theoretic algorithm. Péter’s argument suffers no loss of generality, insomuch as any n-ary primitive recursive function,

n > 1

, can be reduced to an equivalent unary primitive recursive function (cf., Theorems 9.1 and 9.2, Chap. 4, Sec. 9, p. 108 in [7]). Kleene’s separation of recursive functions into general recursive and primitive recursive may have been influenced by Péter’s discovery (cited by Kleene [4] in Chap. XI, § 55, p. 272).

Rogers [6] (Chap. 1, § 1.2, p. 8) defines the Ackermann generalized exponential, a function for which there is no primitive recursive derivation, and formalizes it with the following recursive equations:

\begin{matrix} f (0, 0, y) & = & y, \\ f (0, x + 1, y) & = & f (0, x, y) + 1, \\ f (1, 0, y) & = & 0, \\ f (z + 2, 0, y) & = & 1, \\ f (z + 1, x + 1, y) & = & f (z, f (z + 1, x, y), y) . \end{matrix}

3. Notational Conventions and Definitions

If f is a function,

d o m (f)

and

c o d o m (f)

are the domain and the co-domain of f. The expression

f : A \mapsto B

abbreviates the logical conjunction

d o m (f) = A \land c o d o m (f) = B

, for some sets A and B. A function f is partial on A if

d o m (f)

is a proper subset of A, i.e.,

d o m (f) \subset A

. If f is partial on A and

a \in A

, the following statements are equivalent: (1)

a \in d o m (f)

; (2) f is defined on a; (3)

f (a)

is defined; (4)

f (a) ↓

. The following statements are also equivalent: (1)

a \notin d o m (f)

; (2) f is undefined on a; (3)

f (a)

is undefined; (4)

f (a) ↑

. If

d o m (f) = A

, then f is total on A.

The notation

(a_{1}, \dots, a_{n})

is used to denote ordered n-tuples or, simply, n-tuples over some set of numbers A. We will use bold lower-case variables, e.g.,

a, x, y

, to refer to n-tuples. Thus,

a = (13, 17, 19)

is a 3-tuple over the set of natural numbers

N = {0, 1, 2 \dots}

. We will use the symbol

N^{+}

to denote the set of positive natural numbers. If

x = (x_{1}, \dots, x_{n})

is an n-tuple over A, then

x [j]

,

1 \leq j \leq n

, refers to individual elements of

x

. Thus, if

x = (2, 3, 5, 7, 11)

, then

x [1] = 2

,

x [2] = 3

,

x [3] = 5

,

x [4] = 7

,

x [5] = 11

. The individual elements of an n-tuple are not required to be distinct. If

x

is an n-tuple, then

d i m (x) = n

, i.e., the number of elements in

x

. The 0-tuple is denoted as

()

. In calculus, a sequence is an ordered set of numbers in a one-to-one correspondence with

N

or

N^{+}

(cf., e.g., Taylor [20], § 1.62, p. 67). Thus, if

f : N \to N

, then

{f (n)}

denotes the sequence

f (0), f (1), \dots, f (m), \dots

with countably many elements or terms. In computability theory, the term sequence sometimes refers to an n-tuple (cf., e.g., Ch. 3, p. 60 in [7]). Thus, in order to avoid confusion, when we want to emphasize the fact that we are dealing with a finite number of ordered elements, we refer to the collection of these elements as a finite sequence, a tuple, or an m-tuple, where m is the number of the elements.

For

n > 0

,

A^{n}

is the n-th Cartesian power of A, i.e.,

A^{n} = {(a_{1}, \dots, a_{n}) | a_{i} \in A, 1 \leq i \leq n}

. Thus, if

f : R^{2} \mapsto N

,

d o m (f) = {(x_{1}, x_{2}) | x_{1}, x_{2} \in R}

, where

R

is the set of real numbers. We use statements like

a \in A^{n}

to mean that

a

is an n-tuple over A. We do not distinguish between 1-tuples and individual elements, e.g.,

a = (a)

,

a \in A

, and

h (a) = h ((a))

for some function h.

In formalizing feedforward artificial neural networks, it is sometimes convenient to treat n-tuples as vectors. Therefore, we occasionally use symbols like

\vec{x}

,

\vec{y}

,

\vec{z}

to denote n-tuples. If

\vec{x} \in A^{n}

, then

d i m (\vec{x}) = n

and

\vec{x} [j]

,

1 \leq j \leq n

, is the j-th element of

\vec{x}

. E.g., if

\vec{x} = (1, 1, 11) \in N^{3}

, then

\vec{x} [1] = \vec{x} [2] = 1

and

\vec{x} [3] = 11

. If

a \in A^{n}

and

\vec{a} \in A^{n}

, and

a [j] = \vec{a} [j]

,

1 \leq j \leq n

, then

a = \vec{a}

. If

f : A^{n} \mapsto B^{m}

,

0 < n, m

, then

f (x_{1}, \dots, x_{n})

=

f (\vec{x}) = f (x)

=

f (\vec{x} [1], \dots, \vec{x} [n])

=

f (x [1], \dots, x [n])

=

\vec{y}

=

y

= (

\vec{y} [1], \dots, \vec{y} [m])

=

(y [1], \dots, y [m])

. The empty tuple is discarded in function arguments. E.g., if

h : N \mapsto N

, then

h ((), t) = h (t, ()) = h (t)

,

t \in N

. We occasionally separate individual arguments of functions from the remaining arguments combined into tuples. E.g., if

f : N^{n + 2} \mapsto N

,

0 < n

, then, for

z \in N^{n}

,

x \in N

,

y \in N

,

f (z, x, y) = f (z [1], \dots, z [n], x, y) = f (w)

, where

z [i] = w [i]

,

1 \leq i \leq n

, and

w [n + 1] = x

,

w [n + 2] = y

. If f is a function that maps

a_{1} \in A_{1}^{n_{1}}

, …,

a_{k} \in A_{k}^{n_{k}}

to

c \in C^{m}

, for some sets

A_{1}, \dots, A_{k}

,

0 < n_{j}, m

,

1 \leq j \leq k

, then

f : A_{1}^{n_{1}}, \dots, A_{k}^{n_{k}} \mapsto C^{m}

.

A total function

P : A^{n} \mapsto {0, 1}

is a predicate, where 1 arbitrarily designates logical truth and 0 logical falsehood. The symbols ¬, ∧, ∨, → respectively refer to logical not, logical and, logical or, and logical implication.

P (x)

is a shorthand for

P (x) = 1

, and

\neg P (x)

is a shorthand for

P (x) = 0

. If P and Q are predicates, then

\neg P \lor Q

is logically equivalent to

P \to Q

, i.e.,

\neg P \lor Q \equiv P \to Q

. The symbols ∃ and ∀ refer to the logical existential (there exists) and universal (for all) quantifiers, respectively. Thus, the statement

(\exists x) P (x)

is logically equivalent to the statement that

P (x)

holds for at least one

x

in

d o m (P)

, while the statement

(\forall x) P (x)

is logically equivalent to the statement that

P (x)

holds for every

x

in

d o m (P)

.

Let, for

0 < k, n

,

f : N^{k} \mapsto N

,

g_{j} : N^{n} \mapsto N

,

1 \leq j \leq k

, and

x \in N^{n}

. We use the following definitions of composition and primitive recursion in our article. A function of

h : N^{n} \mapsto N

is obtained from

f, g_{j}

by composition if h is obtained from

f, g_{j}

by Schema (11).

h (x) = f (g_{1} (x), . . ., g_{k} (x)) .

(11)

Let

k \in N

and

ϕ : N^{2} \mapsto N

be total. A function

h : N \mapsto N

is obtained from

ϕ

by primitive recursion if it is obtained from

ϕ

by Schema (12).

\begin{matrix} h (0) & = & k, \\ h (t + 1) & = & ϕ (t, h (t)) . \end{matrix}

(12)

Let

f : N^{n} \mapsto N

and

g : N^{n + 2} \mapsto N

be total, then

h : N^{n + 1} \mapsto N

is obtained from f and g by primitive recursion if h is obtained from f and g by Schema (13), where

x \in N^{n}

.

\begin{matrix} h (x, 0) & = & f (x), \\ h (x, t + 1) & = & g (t, h (x, t), x) . \end{matrix}

(13)

If

\vec{x} \in N^{n}

, Schema (13) can be expressed with the vector notation as

\begin{matrix} h (\vec{x}, 0) & = & f (\vec{x}), \\ h (\vec{x}, t + 1) & = & g (t, h (\vec{x}, t), \vec{x}) . \end{matrix}

(14)

Let the set of initial functions consist of

\begin{matrix} s (x) = x + 1, x \in N; \\ n (x) = 0, x \in N; \\ u_{i}^{n} (x_{1}, \dots, x_{n}) = u_{i}^{n} (\vec{x}) = \vec{x} [i] = u_{i}^{n} (x) = x [i] = x_{i}, & 1 \leq i \leq n, \vec{x} = x \in N . \end{matrix}

(15)

Definition 4.

A function is primitive recursive if it can be obtained from the initial functions by a finite number of applications of Schemata (11)–(13).

A corollary of Definition 4 is that if f is primitive recursive, then there is a sequence of functions

ϕ_{1}, \dots, ϕ_{n} = f

such that

ϕ_{i}

,

1 \leq i \leq n

, is either an initial function or is obtained from the previous functions in the sequence by composition or primitive recursion.

4. Feedforward Artificial Neural Networks

A feedforward artificial neural network

N_{z}

is a finite set of neurons, each of which is connected to a finite number of the neurons in the same set through the synapses, i.e., directed weighted edges (cf. Figure 1). The neurons are organized into l layers

E_{1}, \dots, E_{l}

, where

E_{1}

is the input layer,

E_{l}

is the output layer, and

E_{e}

,

1 < e < l

are the hidden layers. We use the term network synonymously with the term feedforward artificial neural network.

Let

z_{z}

denote the number of layers in

N_{z}

and

n_{i}^{e}

refer to the i-th neuron in layer

E_{e}

,

1 \leq e \leq z_{z}

. The function

n n_{z} (e) : N^{+} \mapsto N^{+}

specifies the number of neurons in layer

E_{e}

of

N_{z}

. We assume that

N_{z}

is fully connected, i.e., there is a synapse from every neuron in layer

E_{e}

to every neuron in layer

E_{e + 1}

,

1 \leq e < z_{z}

. Each synaptic weight

w_{i, j}^{e}

(cf. Figure 1) is a real number. The vector

{\vec{w}}^{e}

is the vector of all synaptic weights in

N_{z}

from

E_{e}

to

E_{e + 1}

. Thus,

{\vec{w}}^{e} = (w_{1, 1}^{e}, \dots, w_{1, n n_{z} (e + 1)}^{e}, \dots, w_{n n_{z} (e), 1}^{e}, \dots, w_{n n_{z} (e), n n_{z} (e + 1)}^{e}) .

We let

{\vec{w}}^{0} = ()

and assume, without loss of generality, that, for any synaptic weight

w_{i, j}^{e}

,

0 \leq w_{i, j}^{e} \leq 1

, because, if that is not the case,

w_{i, j}^{e}

can be so scaled. No loss of generality is introduced with the assumption of full connectivity, because if full connectivity is not required, appropriate synaptic weights are set to zero. If, on the other hand, a given network is not fully connected, synapses with zero weights can be added as needed to make the network fully connected.

Each

n_{i}^{e}

,

e > 1

computes an activation function

α_{i}^{e} ({\vec{a}}^{e - 1}, {\vec{w}}^{e - 1}) : R^{d i m ({\vec{a}}^{e - 1})}, R^{d i m ({\vec{w}}^{e - 1})} \to R,

(16)

where

{\vec{a}}^{e - 1}

is the vector of the activations of the neurons in layers

E_{e - 1}

,

d i m ({\vec{a}}^{e - 1}) = n n_{z} (e - 1)

, and

d i m ({\vec{w}}^{e - 1}) = n n_{z} (e - 1) n n_{z} (e)

. If

\vec{x}

is the input to

N_{z}

, then

{\vec{a}}^{1} = \vec{x}

. For the input layer, we have

α_{i}^{1} (\vec{x}, {\vec{w}}^{0}) = α_{1}^{1} (\vec{x}, ()) = \vec{x} [i], 1 \leq i \leq n n_{z} (1) .

(17)

The term feedforward means that the activations of the neurons are computed layer by layer from the input layer to the output layer, because the activation functions of the neurons in the next layer require only the weights of the synapses connecting the current layer with the next one and the activation values, i.e., the outputs of the activation functions of the neurons in the current layer. If

\vec{x}

is the input vector, then

\begin{matrix} {\vec{a}}^{1} & = & (α_{1}^{1} (\vec{x}, ()), \dots α_{n n_{z} (1)}^{1} (\vec{x}, ())) = \vec{x}, \\ {\vec{a}}^{e} & = & (α_{1}^{e} ({\vec{a}}^{e - 1}, {\vec{w}}^{e - 1}), \dots, α_{n n (e)}^{e} ({\vec{a}}^{e - 1}, {\vec{w}}^{e - 1})), 1 < e < z_{z} . \end{matrix}

(18)

The feedforward activation function

f_{z}

that computes the activations of

N_{z}

layer by layer can be defined as

\begin{matrix} f_{z} (\vec{x}, 0) & = & \vec{x}, \\ f_{z} (\vec{x}, e + 1) & = & (α_{1}^{e + 1} (f_{z} (\vec{x}, e), {\vec{w}}^{e}), \dots, α_{n n (e + 1)}^{e + 1} (f_{z} (\vec{x}, e), {\vec{w}}^{e})) . \end{matrix}

(19)

Thus,

f_{z} (\vec{x}, 0) = f_{z} (\vec{x}, 1) = {\vec{a}}^{1} = \vec{x} = x

and

f_{z} (\vec{x}, e) = {\vec{a}}^{e}, 1 \leq e \leq z_{z}

. If

N_{z}

maps

{\vec{a}}^{1} \in A^{n}

to

{\vec{b}}^{z_{z}} \in B^{m}

, for some sets A and B, we define the function

ζ_{z} : A^{n} \to B^{m}

computed by

N_{z}

as

ζ_{z} (\vec{x}) = f_{z} (\vec{x}, z_{z}) .

(20)

Definition 5.

A function

f : A^{n} \to B^{m}

, for some sets A and B, is

N

-computable if there is a network

N_{z}

such that, for all

\vec{x} = x \in A^{n}

,

ζ_{z} (\vec{x}) = f (\vec{x}, z_{z}) = \vec{y} = ζ_{z} (x) = f (x, z_{z}) = y \in B^{m} .

If

N_{z}

computes f, we refer to

N_{z}

as

N_{f (\cdot)}

and use the expression

N_{f (\cdot)} : A^{n} \mapsto B^{m}

as a shorthand for

ζ_{z} : A^{n} \mapsto B^{m}

. Furthermore, if

N_{z}

computes f, then, for

\vec{x} = x \in A^{n}

, the expressions

ζ_{z} (\vec{x})

,

ζ_{z} (x)

,

N_{z} (\vec{x})

,

N_{z} (x)

are equivalent in that

ζ_{z} (\vec{x}) = N_{z} (\vec{x}) = \vec{y} = ζ_{z} (x) = N_{z} (x) = y \in B^{m} .

(21)

A network

N_{z}

can include other networks. Let

N_{j}

and

N_{k}

be two networks such as

ζ_{j} : A^{m} \mapsto B^{n}

and

ζ_{k} : B^{n} \mapsto C^{k}

, for some sets A, B, C, and

0 < m, n, k

. Then we can construct a new network

N_{l}

by feeding the output of

N_{j}

to

N_{k}

so that

ζ_{l} : A^{m} \mapsto C^{k}

(cf. Figure 2). We can generalize this case to a network that includes arbitrarily many networks whose outputs are the inputs to another network whose output is the output of the entire complex network (cf. Figure 3). Formally, let

N_{z_{1}}, \dots, N_{z_{l}}

be networks such that

ζ_{z_{1}} : I^{n_{z_{1}}} \to O^{k_{z_{1}}}, \dots, ζ_{z_{l}} : I^{n_{z_{l}}} \to O^{k_{z_{l}}}

for some sets I and O,

0 < n_{z_{i}}, k_{z_{i}}

, and

1 \leq i \leq l

. Let, for some set S, a network

N_{j}

compute the function

ζ_{j} : O^{k_{z_{1}}}, O^{k_{z_{2}}}, \dots, O^{k_{z_{l}}} \to S^{m}

so that

ζ_{z} ({\vec{x}}_{z_{1}}, \dots, {\vec{x}}_{z_{l}}) = ζ_{j} (ζ_{z_{1}} ({\vec{x}}_{z_{1}}), \dots, ζ_{z_{l}} ({\vec{x}}_{z_{l}})) = \vec{s} \in S^{m}, {\vec{x}}_{z_{i}} \in I^{n_{z_{i}}}, 1 \leq i \leq l .

Then, for

{\vec{x}}_{z_{i}} \in I^{n_{z_{i}}}

such that

{\vec{x}}_{z_{i}} = x_{z_{i}}

,

1 \leq i \leq l

,

N_{z} (x_{z_{1}}, \dots, x_{z_{l}}) = N_{z} (y) = N_{j} (N_{z_{1}} (x_{z_{1}}), \dots, N_{z_{l}} (x_{z_{l}})) = s \in S^{m},

(22)

where

y = (x_{z_{1}} [1], \dots, x_{z_{1}} [n_{z_{1}}], \dots, x_{z_{l}} [1], \dots, x_{z_{l}} [n_{z_{l}}])

, and

s = \vec{s}

.

We use the symbol

N_{i d}

to denote an identity network such that, for

\vec{a} = a \in A^{n}

,

0 < n

,

ζ_{i d} (\vec{a}) = \vec{a} = ζ_{i d} (a) = a

. One can think of

N_{i d}

as a single layer network of n neurons, where

α_{i}^{1} (\vec{a}, ()) = \vec{a} [i] = α_{i}^{1} (a, ()) = a [i]

,

1 \leq i \leq n

.

Our formalization of feedforward artificial neural networks as finite sets of neurons and synapses organized in finitely many layers is in compliance with the original definition by McCulloch and Pitts (Sec. 2, p. 103 in [11]) who state that the neurons of a given network may be assigned designations

c_{1}

,

c_{2}

, …,

c_{n}

. It is also in compliance with the subsequent definition by Rumelhart, Hinton, and Williams [12] as well as with modern treatments of neural networks by Nielsen [17] and Goodfellow, Bengio, and Courville [18] that continue to describe neural networks as finite sets of neurons and synapses.

5. $N$ -Computability of Primitive Recursive Functions

Lemma 1.

The initial functions are

N

-computable.

Proof.

Let

N_{n (\cdot)} : N \mapsto N

be a network with a single input node

n_{1}^{1}

and a single output node

n_{1}^{2}

such that

w_{1, 1}^{1} = 0

and

α_{1}^{2} ({\vec{a}}^{1}, {\vec{w}}^{1}) = {\vec{a}}^{1} [1] {\vec{w}}^{1} [1] .

Then,

ζ_{n (\cdot)} (x) = α_{1}^{2} ((x) (0)) = x \cdot 0 = 0 = n (x)

,

x \in N

. Let

N_{s (\cdot)} : N \mapsto N

be a network with a single input node

n_{1}^{1}

and a single output node

n_{1}^{2}

such that

w_{1, 1}^{1} = 1

and

α_{1}^{2} (\vec{a}, \vec{w}) = \vec{a} [1] {\vec{w}}^{1} [1] + 1 .

Then,

ζ_{s (\cdot)} (x) = α_{1}^{2} ((x), (1)) = x w_{1, 1}^{1} + 1 = s (x)

,

x \in N

. Let

N_{u_{i}^{n} (\cdot)} : N^{n} \mapsto N

,

1 \leq i \leq n

,

n > 0

be a network with n input nodes

n_{1}^{1}, \dots, n_{n}^{1}

and one output node

n_{1}^{2}

. Let

w_{i}^{1} = 1

,

w_{j}^{1} = 0

,

i \neq j

,

1 \leq j \leq n

, and

α_{1}^{2} ({\vec{a}}^{1}, {\vec{w}}^{1}) = \sum_{j = 1}^{n} {\vec{a}}^{1} [j] {\vec{w}}^{1} [j] .

Then, if

\vec{a} = a \in N^{n}

,

ζ_{u_{i}^{n} (\cdot)} (\vec{a}) = α_{1}^{2} ({\vec{a}}^{1}, {\vec{w}}^{1}) = \vec{a} [i] = α_{1}^{2} (a, {\vec{w}}^{1}) = a [i] = u_{i}^{n} (a [1], \dots, a [n]) .

□

We abbreviate

N_{u_{i}^{n} (\cdot)}

as

N_{u (\cdot)}

, because n and i are always evident from the context.

Lemma 2.

Let

x \in N^{n}

,

n > 0

. Let

c_{i}^{n} (x)

,

1 \leq i \leq n

, be defined as

c_{i}^{n} (x) = \{\begin{matrix} u_{1}^{n} (x) & if i = 1, \\ u_{2}^{n} (x) & if i = 2, \\ \dots \\ u_{n}^{n} (x) & if i = n . \end{matrix}

The

c_{i}^{n}

is

N

-computable.

Proof.

Since

u_{i}^{n}

is primitive recursive,

c_{i}^{n}

is primitive recursive, by the definition by cases theorem and its corollary (cf. Theorem 5.4, Chap. 3, Sec. 5, pp. 50–51 in [7]). Let

N_{c_{i}^{n} (\cdot)}

be a network with

n + 1

input nodes

n_{1}^{1}, \dots, n_{n + 1}^{1}

, where the first n nodes receive the n corresponding values of

x \in N^{n}

, and the last node

n_{n + 1}^{1}

receives

1 \leq i \leq n

. Let

N_{c_{i}^{n} (\cdot)}

have one output node

n_{1}^{2}

and let

w_{j, k}^{1} = 1

,

1 \leq j \leq n

. Let the activation function of

n_{1}^{2}

be defined as

α_{1}^{2} ({\vec{a}}^{1}, {\vec{w}}^{1}) = \{\begin{matrix} {\vec{a}}^{1} [1] {\vec{w}}^{1} [1] & if {\vec{a}}^{1} [n + 1] = 1, \\ {\vec{a}}^{1} [2] {\vec{w}}^{1} [2] & if {\vec{a}}^{1} [n + 1] = 2, \\ \dots \\ {\vec{a}}^{1} [n] {\vec{w}}^{1} [n] & if {\vec{a}}^{1} [n + 1] = n . \end{matrix}

Then,

ζ_{c_{i}^{n}} (x, j) = x [j] = u_{j}^{n} (x) = c_{j}^{n} (x) .

□

We abbreviate

N_{c_{i}^{n} (\cdot)}

as

N_{c (\cdot)}

.

Lemma 3.

Let f be a

N

-computable function of k arguments,

k > 0

, and

g_{1}, . . ., g_{k}

be

N

-computable functions of n arguments each,

n > 0

. Let a function h of n arguments be obtained from

f, g_{1}, \dots, g_{k}

by Schema (11). Then, h is

N

-computable.

Proof.

Let

f, g_{1}, \dots, g_{k}

be computable by

N_{f (\cdot)} : N^{k} \mapsto N

,

N_{g_{1} (\cdot)} : N^{n} \mapsto N, \dots, N_{g_{k} (\cdot)} : N^{n} \mapsto N

. Then let

N_{j} : N^{n} \mapsto N

be a network such that, for

x \in N^{n}

,

N_{j} (x) = N_{f (\cdot)} (N_{g_{1} (\cdot)} (x), \dots, N_{g_{k} (\cdot)} (x)) .

Then, for

z \in N^{n}

, we have

N_{j} (z) = N_{f (\cdot)} (N_{g_{1} (\cdot)} (z), \dots, N_{g_{k} (\cdot)} (z)) = f (g_{1} (z), \dots, g_{k} (z)) = h (x),

whence

\begin{matrix} ζ_{j} (x) & = & h (x) . \end{matrix}

□

Lemma 4.

Let

k \in N

. Then k is

N

-computable.

Proof.

Let

N_{n (\cdot)}

and

N_{s (\cdot)}

be as constructed in Lemma 1. Let

{N_{s (\cdot)}}^{k}

,

k \geq 0

denote a network that consists of a finite sequence of k networks

N_{s (\cdot)}

, where the first

N_{s (\cdot)}

receives its input from

N_{n (\cdot)}

and each subsequent

N_{s (\cdot)}

receives its input from the previous

N_{s (\cdot)}

(cf. Figure 2). Let

{N_{s (\cdot)}}^{0} = N_{n (\cdot)}

. Let

N_{J_{k}} (0) = {N_{s (\cdot)}}^{k} (N_{n (\cdot)} (0))

. Let

s^{k} (x)

denote k compositions of

s (x)

with itself, i.e.,

s^{1} (x) = s (x)

,

s^{2} (x) = s (s (x))

, etc. Then,

\begin{matrix} N_{J_{0}} (0) & = & {N_{s (\cdot)}}^{0} (N_{n (\cdot)} (0)) = 0; \\ N_{J_{1}} (0) & = & {N_{s (\cdot)}}^{1} (N_{n (\cdot)} (0)) = s (0) = 1; \\ N_{J_{2}} (0) & = & {N_{s (\cdot)}}^{2} (N_{n (\cdot)} (0)) = s^{2} (0) = 2; \\ \dots \\ N_{J_{k}} (0) & = & {N_{s (\cdot)}}^{k} (N_{n (\cdot)} (0)) = s^{k} (0) = k . \end{matrix}

By induction on k,

ζ_{J_{k}} (0) = k

. By construction,

ζ_{J_{k}} (n) = k

,

n \in N

. □

The next lemma, Lemma 5, is a technical result for Lemma 6. The function

x \overset{\cdot}{-} y

is primitive recursive (cf. Chap. 3, Sec. 4, p. 46 in [7]).

Lemma 5.

Let the function

x \overset{\cdot}{-} y

:

N^{2} \to N

be defined as

x \overset{\cdot}{-} y = {\begin{matrix} x - y & if x \geq 0, \\ 0 & if x < y . \end{matrix}

(23)

Then,

x \overset{\cdot}{-} y

is

N

-computable.

Proof.

Let

N_{\overset{\cdot}{-} (\cdot)}

have two input nodes

n_{1}^{1}

,

n_{2}^{1}

and one output node

n_{1}^{2}

. Let

w_{1, 1}^{1}

=

w_{2, 1}^{1}

= 1 and let

α_{1}^{2} ({\vec{a}}^{1}, {\vec{w}}^{1}) = \{\begin{matrix} {\vec{a}}^{1} [1] {\vec{w}}^{1} - {\vec{a}}^{1} [2] {\vec{w}}^{1} [2] & if {\vec{a}}^{1} [1] \geq {\vec{a}}^{1} [2], \\ 0 & if {\vec{a}}^{1} [1] < {\vec{a}}^{1} [2] . \end{matrix}

Then, for

\vec{a} = a \in N^{2}

, we have

ζ_{\overset{\cdot}{-} (\cdot)} (\vec{a}) = α_{1}^{2} ({\vec{a}}^{1}, {\vec{w}}^{1}) = {\vec{a}}^{1} [1] \overset{\cdot}{-} \vec{a} [2] = α_{1}^{2} (a, {\vec{w}}^{1}) a [1] \overset{\cdot}{-} a [2] .

□

Definition 6 confines the notion of

N

-computability of some function

f (x, t)

to the

N

-computability of the first k elements of the sequence

{f (x, t)}

,

t \in N

.

Definition 6.

A function

f : A^{n} \times N \to B^{m}

, for some sets A and B, is

N

-computable elementwise for any

k > 0

if there is a network

N_{z}

such that, for any

z \in A^{n}

, the first

k + 1

terms of the sequence

{f (z, j)} = f (z, 0), f (z, 1), \dots, f (z, k), \dots

are the same as the terms of the tuple

\begin{matrix} (N (z, 0), & N (z, 1), & \dots, & N (z, k)), \end{matrix}

i.e.,

f (z, i) = N (z, i), 0 \leq i \leq k

.

Thus, if a function

f (x, t)

is

N

-computable, it is

N

-computable elementwise for any positive k.

Lemma 6.

Let

ϕ : N^{2} \mapsto N

be

N

-computable elementwise and

h (t)

be a function obtained from ϕ by Schema (12). Then, h is

N

-computable elementwise.

Proof.

Let ϕ be computable elementwise by

N_{ϕ (\cdot)}

. Let

N_{{\tilde{h}}_{0}} (0) = N_{J_{k}} (0) = k

as constructed in Lemma 4. In the equations below, we abbreviate

N_{n (\cdot)} (0)

as 0,

N_{J_{k}} (0)

as k,

N_{J_{t}} (0)

as

N_{J_{t}}

,

N_{\overset{\cdot}{-} (\cdot)} (x, y)

as

x \overset{\cdot}{-} y

and

N_{{\tilde{h}}_{i}} (i)

as

N_{{\tilde{h}}_{i}}

. Let

\begin{matrix} N_{{\tilde{h}}_{0}} & = & k, \\ N_{{\tilde{h}}_{t + 1}} & = & N_{ϕ (\cdot)} (N_{J_{t}}, N_{{\tilde{h}}_{t}}) . \end{matrix}

(24)

By induction on t,

h (t) = N_{\tilde{h} (\cdot)} (t)

(cf. Figure 4). Let

N_{h (\cdot)} (t) = N_{c (\cdot)} (N_{{\tilde{h}}_{0}}, \dots, N_{{\tilde{h}}_{m}}, t + 1), 0 \leq t \leq m, m > 0 .

(25)

Then, the first

m + 1

terms of the sequence

{h (t)}

are the same as the terms of the tuple

(N_{h (\cdot)} (0), \dots, N_{h (\cdot)} (m))

(cf. Figure 5 for

m = 3

). □

Lemma 7.

Let

f : N^{n} \mapsto N

and

g : N^{2} \mapsto N

be

N

-computable elementwise and

h : N^{n + 1} \mapsto N

be a function obtained from f and g by Schema (13). Then h is

N

-computable elementwise.

Proof.

Let

x \in N^{n}

and

y \in N^{n + 2}

,

n > 0

, such that

y = (y_{1}, y_{2}, x [1], \dots, x [n])

. Let f and g be

N

-computable elementwise by

N_{f (\cdot)}

and

N_{g (\cdot)}

, respectively. Let us abbreviate

N_{{\tilde{h}}_{x, t}} (x, t)

as

N_{{\tilde{h}}_{x, t}}

and let

\begin{matrix} N_{{\tilde{h}}_{x, 0}} & = & N_{f (\cdot)} (x), \\ N_{{\tilde{h}}_{x, t + 1}} & = & N_{g (\cdot)} (t, N_{{\tilde{h}}_{x, t}}, x) . \end{matrix}

(26)

By induction on t,

h (x, t)

=

N_{{\tilde{h}}_{x, t}}

. Let

N_{h (\cdot)} (x, t) = N_{c (\cdot)} (N_{{\tilde{h}}_{x, 0}}, \dots, N_{{\tilde{h}}_{x, m}}, t + 1), 0 \leq t \leq m, m > 0 .

(27)

Then the first

m + 1

terms of the sequence

{h (x, t)}

, i.e.,

h (x, 0), \dots, h (x, m)

, agree elementwise with the tuple

(N_{h (\cdot)} (x, 0), \dots, N_{h (\cdot)} (x, m))

. □

Figure 6 and Figure 7 illustrate sample constructions of Lemma 7. If we treat

h (t)

as a shorthand for

h ((), t)

, then Lemmas 6 and 7 give us the following theorem.

Theorem 1.

Let

h (x, t)

be a primitive recursive function,

x \in N^{n}

,

n \geq 0

. Then

h (x, t)

is

N

-computable elementwise.

We can ask if the elementwise

N

-computability of

h (x, t)

(cf. Definition 6) can be generalized to

N

-computability. In other words, is it possible to have the sequences

{h (x, t)}

and

{N (x, n)}

agree term by term, i.e.,

h (x, t) = N (x, t)

? Since

N

has a finite set of neurons organized into a finite number of layers,

N

can compute, per Lemmas 6 and 7, only the first

m + 1

values of

h (x, t)

, i.e.,

h (x, t)

,

0 \leq t \leq m

, although m can be an arbitrarily large natural number. Thus, the answer to this question is negative.

Let us assume that

N_{h (x, t)}

in Theorem 1 is allowed to have countably many neurons so that the number of neurons in the hidden layers of

N_{h (x, t)}

is countable. Let

ζ_{N} (x, t)

be the function computed by

N_{h (x, t)}

. Since countably many neurons can be added to

N_{h (x, t)}

to compute

h (x, t)

, for any t, we have the sequence

{ζ_{N} (x, t)}

=

{N (x, t)}

, on the one hand, and the sequence

{h (x, t)}

, on the other hand. Let

f (x, t) = h (x, t) - ζ_{N} (x, t)

. Since

h (x, t) = ζ_{N} (x, t)

, for any

t \in N

,

{f (x, t)}

is vacuously convergent, i.e.,

{lim}_{t \to \infty} f (x, t) = 0

. Hence, we have the following theorem.

Theorem 2.

Let

h (x, t)

be a primitive recursive function,

x \in N^{n}

,

n \geq 0

. Then there is a network

N (x, t)

with countably many neurons such that for any

z \in N^{n}

, the sequences

{h (z, t)}

and

{ζ_{N} (z, t})}

agree term by term, i.e.,

h (z, t) = ζ_{N} (z, t)

,

t \in N

.

6. Discussion

As mathematical objects, feedforward artificial neural networks are more computationally powerful than primitive recursive functions inasmuch as the former can compute functions over real numbers whereas the latter, by definition, cannot. E.g., one can define a network that computes the sum of n real numbers, which no primitive recursive function can compute. However, the situation changes when networks cease to be mathematical objects and become computational objects by being realized on finite memory devices. A finite memory device is a computational device with a finite amount of memory available for numerical computation [21]. Such a device is analogous to a human scribe with a pencil and an eraser who is to carry out a numerical computation by writing and erasing symbols from a finite alphabet on a finite number of paper sheets. Finite memory devices are different from finite state automata of classical computability theory (e.g., a deterministic finite state machine (Chap. 2, Sec. 2.2 in [22]), non-deterministic finite state machine (Chap. 2, Sec. 2.3 in [22]), a Mealy or Moore machine (Chap. 2, Sec. 2.7 in [22], a push down automaton (Chap. 5 in [22]), or a Turing machine (Chap. 6 in [7]), because the latter do not put any bounds on the number of cells in their tapes available for computation. A finite state automaton of classical computability becomes a finite memory device only when the number of its tape cells available for computation is bounded by a natural number.

A real number x is signifiable on a finite memory device

D_{j}

if and only if the finite amount of memory on

D_{j}

can hold its sign, where a sign is a sequence of arbitrary symbols from a finite alphabet [21]. Thus, if the alphabet is { “.”, “0”, “1”, “2”, “3”, “4”, “5”, “6“, “7”, “8”, “9” } and

D_{j}

has 8 memory cells to represent a real number, then the real numbers 1.41, 1.414, 1.4142, 1.41421, 1.414213 are signifiable on

D_{j}

as “1.41”, “1.414”, “1.4142”, “1.41421”, “1.414213”, respectively, whereas the real numbers 1.4142135, 1.41421356, 1.414213562, 1.4142135623, and 1.41421356237 are not. A consequence of the finite amount of memory is that the set of real numbers signifiable on

D_{j}

is finite and, hence, vacuously countable. To put it differently, Cantor’s theorem (§ 2 in [23]) does not apply insomuch as the number of signifiable reals on

D_{j}

in any interval

(α \dots β)

,

α, β \in R

,

α < β

, is finite. Consequently, all computation of a feedforward artificial neural network

N_{z} : R^{n} \mapsto R^{m}

,

0 < n, m

, realized on

D_{j}

, can be packed into a unique natural number

Ω_{z}

such that there exists a primitive recursive function

\tilde{f} : N \mapsto N

such that

ζ_{z} (\vec{x}) = \vec{a}

if and only if

\tilde{f} (\tilde{x}) = \tilde{a}

, where

\vec{x}

uniquely corresponds to

\tilde{x}

and

\vec{a}

to

\tilde{a}

(cf. Theorem 1, pp. 15–17 in [21]). Theorem 1 is, after a fashion, the converse of Theorem 1 in [21] in the sense that it shows how one can construct a network from a primitive recursive function.

Theorem 2 shows that all values of a primitive recursive function can be computed exactly by a feedforward artificial neural network if the network is allowed to have countably many neurons. This purely theoretical result contributes to the growing collection of universality theorems on feedforward neural networks and various classes of functions (cf. Ch. 4 in [17]). Thus, Hornik et al. [13] show that multilayer feedforward networks with a single hidden layer of neurons with arbitrary squashing activation functions can approximate any Borel measurable function from one dimensional space to another to any desired degree of accuracy so long as the number of the neurons in the hidden layer is unbounded. Gripenberg [14] shows that the general approximation property of feedforward perceptron networks is achievable when the number of perceptrons in each layer is bounded but the number of layers is allowed to grow to infinity and the perceptron activation functions are continuously differentiable and not linear. Guliyev and Ismailov [15] show that single hidden layer feedforward neural networks with the fixed weights of one and two neurons in the hidden layer approximate any continuous function on a compact subset of the real line and proceed to demonstrate that single layer feedforward networks with fixed weights cannot approximate all continuous multivariate functions.

We conclude our discussion with a caveat about universality results of feedforward neural networks with unbounded numbers of neurons. While these results provide valuable theoretical insights, they may not hold much sway with computer scientists interested in computability properties of finite AI, because networks with unbounded numbers of neurons cannot be realized on computational devices with finite amounts of computational memory.

7. Conclusions

We have formalized feedforward artificial neural networks with recurrence equations and proposed a formal definition of the concept of

N

-computability, i.e., the property of a function to be computed by a feedforward artificial neural network

N

. We have shown that, for a primitive recursive function

h (x, t)

, where

x

is an n-tuple of natural numbers and t is a natural number, there exists a feedforward artificial neural network

N (x, t)

such that for any n-tuple of natural numbers

z

, the first

m + 1

terms of the sequence

{h (z, t)}

agree elementwise with the tuple

(N (z, 0), \dots, N (z, m))

, for any positive natural number m. Our investigation contributes to the knowledge of the classes of functions that can be computed by feedforward artificial neural networks. Since such networks are used in some finite AI systems, our investigation may be of interest to mathematicians and computer scientists interested in the computability theory of finite AI.

Funding

This research received no external funding.

Data Availability Statement

No new data were created.

Acknowledgments

The author is grateful to the four anonymous reviewers for their feedback.

Conflicts of Interest

The author declares no conflict of interest.

References

Gödel, K. On formally undecidable propositions of Principia Mathematica and related systems I. In Kurt Gödel Collected Works Volume I Publications 1929–1936; Feferman, S., Dawson, J.W., Kleene, S.C., Moore, G.H., Solovay, R.M., van Heijenoort, J., Eds.; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
Péter, R. Konstruktion nichtrekursiver funktionen. Math. Ann. 1935, 111, 42–60. [Google Scholar] [CrossRef]
Péter, R. Recursive Functionen; Academinai Kiado: Budapest, Hungary, 1951. [Google Scholar]
Kleene, S.C. Introduction to Metamathematics; D. Van Nostrand: New York, NY, USA, 1952. [Google Scholar]
Davis, M. Computability and Unsolvability; Dover Publications, Inc.: New York, NY, USA, 1982. [Google Scholar]
Rogers, H., Jr. Theory of Recursive Functions and Effective Computability; The MIT Press: Cambridge, NY, USA, 1988. [Google Scholar]
Davis, M.; Sigal, R.; Weyuker, E. Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science, 2nd ed.; Harcourt, Brace & Company: Boston, MA, USA, 1994. [Google Scholar]
Colson, L. About primitive recursive algorithms. Theor. Comput. Sci. 1991, 83, 57–69. [Google Scholar] [CrossRef]
Paolini, L.; Piccolo, M.; Roversi, L. A class of recursive permutations which is primitive recursive complete. Theor. Comput. Sci. 2020, 813, 218–233. [Google Scholar] [CrossRef]
Petersen, U. Induction and primitive recursion in a resource conscious logic—With a new suggestion of how to assign a measure of complexity to primitive recursive functions. Dilemmata Jahrb. ASFPG 2008, 3, 49–106. [Google Scholar]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 5, 359–366. [Google Scholar] [CrossRef]
Gripenberg, G. Approximation by neural networks with a bounded number of nodes at each level. J. Approx. Theory 2003, 122, 260–266. [Google Scholar] [CrossRef]
Guliyev, N.; Ismailov, V. On the approximation by single hidden layer feedforward neural networks with fixed weights. Neural Netw. 2019, 98, 296–304. [Google Scholar] [CrossRef]
Zhang, Z.; Li, J. A review of artificial intelligence in embedded systems. Micromachines 2023, 14, 897. [Google Scholar] [CrossRef]
Nielsen, M. Neural Networks and Deep Learning; Determination Press: San Francisco, CA, USA, 2015. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Neural Networks; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Meyer, M.; Ritchie, D. The complexity of loop programs. In Proceedings of the ACM National Meeting, Washington, DC, USA, 30 August 1967; pp. 465–469. [Google Scholar]
Taylor, A.E. Advanced Calculus; Ginn & Company: Boston, MA, USA, 1955. [Google Scholar]
Kulyukin, V.A. On correspondences between feedforward artificial neural networks on finite memory automata and classes of primitive recursive functions. Mathematics 2023, 11, 2620. [Google Scholar] [CrossRef]
Hopcroft, J.E.; Ullman, J.D. Introduction to Automata Theory, Languages, and Computation; Narosa Publishing Hourse: New Delhi, India, 2002. [Google Scholar]
Cantor, G. On a property of the class of all real algebraic numbers. Crelle’s J. Math. 1874, 77, 258–262. [Google Scholar]

Figure 1. A 3-layer feedforward artificial neural network. Layer 1 includes the input neurons

n_{1}^{1}

and

n_{2}^{1}

. Layer 2 includes the neurons

n_{1}^{2}

,

n_{2}^{2}

,

n_{3}^{2}

. Layer 3 includes the neurons

n_{1}^{3}

,

n_{2}^{3}

. The two arrows incoming into

n_{1}^{1}

and

n_{2}^{1}

signify that layer 1 is the input layer. The two arrows going out of

n_{1}^{3}

and

n_{2}^{3}

signify that layer 3 is the output layer. The weight of the synapse from

n_{i}^{e}

to

n_{j}^{e + 1}

is

w_{i, j}^{e}

,

1 \leq e \leq 3

. E.g.,

w_{1, 1}^{1}

is the weight of the synapse from

n_{1}^{1}

to

n_{1}^{2}

and

w_{3, 1}^{2}

is the weight of the synapse from

n_{3}^{2}

to

n_{1}^{3}

.

Figure 1. A 3-layer feedforward artificial neural network. Layer 1 includes the input neurons

n_{1}^{1}

and

n_{2}^{1}

. Layer 2 includes the neurons

n_{1}^{2}

,

n_{2}^{2}

,

n_{3}^{2}

. Layer 3 includes the neurons

n_{1}^{3}

,

n_{2}^{3}

. The two arrows incoming into

n_{1}^{1}

and

n_{2}^{1}

signify that layer 1 is the input layer. The two arrows going out of

n_{1}^{3}

and

n_{2}^{3}

signify that layer 3 is the output layer. The weight of the synapse from

n_{i}^{e}

to

n_{j}^{e + 1}

is

w_{i, j}^{e}

,

1 \leq e \leq 3

. E.g.,

w_{1, 1}^{1}

is the weight of the synapse from

n_{1}^{1}

to

n_{1}^{2}

and

w_{3, 1}^{2}

is the weight of the synapse from

n_{3}^{2}

to

n_{1}^{3}

.

Figure 2. A chain network

N_{l}

that consists of two networks

N_{j}

(top) and

N_{k}

(second from the top). The two bottom networks are functionally identical pictogrammatic renderings of the same network

N_{l}

. In the third network from the top the output y of

N_{j}

is made explicit. In the bottom rendering of

N_{l}

, y is implicit in the arrow from

N_{j}

to

N_{k}

. In sum, the output of

N_{j}

is given to

N_{k}

, and the output of

N_{k}

is the output of

N_{l}

. Thus,

N_{l}

maps x to z.

Figure 2. A chain network

N_{l}

that consists of two networks

N_{j}

(top) and

N_{k}

(second from the top). The two bottom networks are functionally identical pictogrammatic renderings of the same network

N_{l}

. In the third network from the top the output y of

N_{j}

is made explicit. In the bottom rendering of

N_{l}

, y is implicit in the arrow from

N_{j}

to

N_{k}

. In sum, the output of

N_{j}

is given to

N_{k}

, and the output of

N_{k}

is the output of

N_{l}

. Thus,

N_{l}

maps x to z.

Figure 3. A network

N_{z}

that includes networks

N_{z_{1}}, \dots, N_{z_{l}}

that take

x_{z_{1}}, \dots, x_{z_{k}}

as inputs and give their outputs to network

N_{j}

(cf. Equation (22)). Thus,

N_{z}

maps

x_{z_{1}}, \dots, x_{z_{k}}

to s.

Figure 3. A network

N_{z}

that includes networks

N_{z_{1}}, \dots, N_{z_{l}}

that take

x_{z_{1}}, \dots, x_{z_{k}}

as inputs and give their outputs to network

N_{j}

(cf. Equation (22)). Thus,

N_{z}

maps

x_{z_{1}}, \dots, x_{z_{k}}

to s.

Figure 4. Networks

N_{\tilde{h} (1)}

,

N_{\tilde{h} (2)}

,

N_{\tilde{h} (3)}

constructed with Schema (24) in Lemma 6. Note that 0 and k denote

N_{n (\cdot)} (0)

and

N_{J_{k}} (0)

, respectively.

Figure 4. Networks

N_{\tilde{h} (1)}

,

N_{\tilde{h} (2)}

,

N_{\tilde{h} (3)}

constructed with Schema (24) in Lemma 6. Note that 0 and k denote

N_{n (\cdot)} (0)

and

N_{J_{k}} (0)

, respectively.

Figure 5. Network

N_{h (\cdot)} (t)

,

0 \leq t \leq 3

, constructed with Equation (25) in Lemma 6. Since

h (0) = N_{h (\cdot)} (0)

,

h (1) = N_{h (\cdot)} (1)

,

h (2) = N_{h (\cdot)} (2)

,

h (3) = N_{h (\cdot)} (3)

, the first four terms of the sequence

{h (t)}

are the same as the terms of the 4-tuple

(N_{h (\cdot)} (0), N_{h (\cdot)} (1), N_{h (\cdot)} (2), N_{h (\cdot)} (3))

.

Figure 5. Network

N_{h (\cdot)} (t)

,

0 \leq t \leq 3

, constructed with Equation (25) in Lemma 6. Since

h (0) = N_{h (\cdot)} (0)

,

h (1) = N_{h (\cdot)} (1)

,

h (2) = N_{h (\cdot)} (2)

,

h (3) = N_{h (\cdot)} (3)

, the first four terms of the sequence

{h (t)}

are the same as the terms of the 4-tuple

(N_{h (\cdot)} (0), N_{h (\cdot)} (1), N_{h (\cdot)} (2), N_{h (\cdot)} (3))

.

Figure 6. Network

N_{\tilde{h} (x, 3)}

, constructed with Schema (26) in Lemma 7. Note that 0 denotes

N_{n (\cdot)} (0)

, and

N_{i d}

is the identity network.

Figure 6. Network

N_{\tilde{h} (x, 3)}

, constructed with Schema (26) in Lemma 7. Note that 0 denotes

N_{n (\cdot)} (0)

, and

N_{i d}

is the identity network.

Figure 7. Network

N_{h (\cdot)} (x, t)

,

0 \leq t \leq 3

, constructed with Equation (25) in Lemma 7. Since

h (x, 0) = N_{h (\cdot)} (x, 0)

,

h (x, 1) = N_{h (\cdot)} (x, 1)

,

h (x, 2) = N_{h (\cdot)} (x, 2)

,

h (x, 3) = N_{h (\cdot)} (x, 3)

, the first four terms of the sequence

{h (x, t)}

, i.e.,

h (x, 0), h (x, 1), h (x, 2), h (x, 3)

, are the same as the terms of the tuple

(N_{h (\cdot)} (x, 0), N_{h (\cdot)} (x, 1), N_{h (\cdot)} (x, 2), N_{h (\cdot)} (x, 3))

.

Figure 7. Network

N_{h (\cdot)} (x, t)

,

0 \leq t \leq 3

, constructed with Equation (25) in Lemma 7. Since

h (x, 0) = N_{h (\cdot)} (x, 0)

,

h (x, 1) = N_{h (\cdot)} (x, 1)

,

h (x, 2) = N_{h (\cdot)} (x, 2)

,

h (x, 3) = N_{h (\cdot)} (x, 3)

, the first four terms of the sequence

{h (x, t)}

, i.e.,

h (x, 0), h (x, 1), h (x, 2), h (x, 3)

, are the same as the terms of the tuple

(N_{h (\cdot)} (x, 0), N_{h (\cdot)} (x, 1), N_{h (\cdot)} (x, 2), N_{h (\cdot)} (x, 3))

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kulyukin, V.A. On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks. Mathematics 2023, 11, 4309. https://doi.org/10.3390/math11204309

AMA Style

Kulyukin VA. On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks. Mathematics. 2023; 11(20):4309. https://doi.org/10.3390/math11204309

Chicago/Turabian Style

Kulyukin, Vladimir A. 2023. "On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks" Mathematics 11, no. 20: 4309. https://doi.org/10.3390/math11204309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks

Abstract

1. Introduction

2. Recursive Functions

2.1. Computability and Turing Machines

2.2. Computability of Primitive Recursive Functions

3. Notational Conventions and Definitions

4. Feedforward Artificial Neural Networks

5. $N$ -Computability of Primitive Recursive Functions

6. Discussion

7. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks

Abstract

1. Introduction

2. Recursive Functions

2.1. Computability and Turing Machines

2.2. Computability of Primitive Recursive Functions

3. Notational Conventions and Definitions

4. Feedforward Artificial Neural Networks

5. N -Computability of Primitive Recursive Functions

6. Discussion

7. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5. $N$ -Computability of Primitive Recursive Functions