1. Introduction
Primitive recursive functions describe, albeit incompletely, the intuitive notion of a number-theoretic algorithm, a deterministic procedure to transform numerical inputs to numerical outputs in finitely many steps. This perception of primitive recursive functions as intuitive counterparts of number-theoretic algorithms may be rooted in the fact that any primitive recursive function can be mechanically constructed from a set of initial functions with finitely many applications of simple, well-defined operations of composition and primitive recursion. These functions and some of their properties have been investigated by Gödel [
1], Péter [
2,
3], Kleene [
4], Davis [
5], and Rogers [
6] in their studies of formal systems, foundations of mathematics, and computability theory. Although the confinement of the construction procedure to two operations may at first seem restrictive, many functions on natural numbers ordinarily encountered in mathematics and computer science are, in fact, primitive recursive (cf., e.g., Ch. 3 in [
7]). Primitive recursive functions have been used to investigate the foundations of functional programming. Colson [
8] presents a computational model in which a primitive recursive function is viewed as a rewriting system and gives a non-trivial necessary condition for an algorithm to be representable in the system. Paolini et al. [
9] define a class of recursive permutations, which they call Reversible Primitive Permutations (RPP), and formalize it as a language that is sufficiently expressive to represent all primitive recursive functions. Petersen [
10] uses induction and primitive recursion to develop resource conscious logics where the repeated recycling of assumptions, e.g., repeated applications of the successor function
to enumerate natural numbers, has costs.
Feedforward artificial neural networks have their origin in the research by McCulloch and Pitts [
11], which describes neural events with propositional logic. McCulloch and Pitts assume that the human nervous system is a finite set of neurons, each of which has an excitation threshold. When a neuron’s threshold is exceeded, the neuron generates an impulse that propagates to other neurons across synapses connecting them to the origin of the impulse. A fundamental insight by McCulloch and Pitts is that if the response of a neuron can be formalized as a logical proposition specifying its stimulus, then behaviors of complex networks of neurons can, in principle, be described with symbolic logic. Artificial neural networks entered mainstream computer science almost half a century after the research by McCulloch and Pitts when Rumelhart, Hinton, and Williams [
12] discovered
backpropagation, a method for training networks to modify synapse weights by minimizing error between the output and the ground truth. Different types of such networks have been shown to be universal approximators of some classes of functions (e.g., [
13,
14,
15]). Artificial neural networks are increasingly used in embedded artificial intelligence (AI) systems, i.e., systems that run on computational devices with finite amounts of computer memory (e.g., [
16]). We will refer to embedded AI as
finite AI to emphasize the fact that finite AI systems are realized on computational devices with finite amounts of computational memory.
In this investigation, we seek to relate, in a formal way, primitive recursive functions and feedforward artificial neural networks by investigating whether it is possible, for a given primitive recursive function, to construct a feedforward artificial neural network that arbitrarily computes many values of the function’s co-domain from the corresponding values of the function’s domain. We hope that our investigation contributes to the knowledge of the classes of functions that can be not only approximated, but provably computed by feedforward artificial neural networks. In particular, we formalize feedforward artificial neural networks with recurrence equations, propose a formal definition of the concept of
-computability, i.e., the property of a function to be computed by a feedforward artificial neural network
, and prove several lemmas and theorems to show how feedforward artificial neural networks can be constructed to arbitrarily compute many consecutive values of any primitive recursive function. Since these networks consist of finite sets of neurons and are used in some finite AI systems [
17,
18], our investigation will be of interest to mathematicians and computer scientists interested in the computability theory of finite AI.
The remainder of our article is organized as follows. In
Section 2, we review several definitions of primitive recursive functions starting with the original definition by Gödel [
1] and proceeding to the later definitions by Kleene [
4], Davis [
5], Rogers [
6], and Davis et al. [
7], and Meyer and Ritchie [
19]. This section gives the reader a historical bird’s eye view of how the concept of primitive recursive function and its formalization have co-evolved in time. In
Section 3, we state the notational conventions and give the definition of a primitive recursive function we use in this article. This section is intended for reference. In
Section 4, we offer a formalization of feedforward artificial neural networks in terms of recursive equations. In
Section 5, we prove several lemmas and theorems that form the bulk of our theoretical investigation. In
Section 6, we present some perspectives on the obtained results and summarize our conclusions in
Section 7.
2. Recursive Functions
Gödel [
1] (Sec. 2, p. 157) describes the class of number-theoretic functions as the class of functions whose domains are non-negative integers or
n-tuples thereof and whose values are non-negative integers. Gödel [
1] (Sec. 2, pp. 157–159) states that a number-theoretic function
is
recursively defined in terms of the number-theoretic functions
and
if
is obtained from
and
by the following schema:
where the equalities hold for all
. Gödel [
1] (Sec. 2, p. 159) defines a number-theoretic function
to be
recursive if there exists a finite sequence of number-theoretic functions
,
,
…,
, where each function
,
, is a natural number constant, the successor function
, or is defined from two preceding functions with Schema (
1) or from one preceding function by substitution, i.e., the replacement of the arguments of a preceding function with some other preceding functions.
Kleene [
4] (Chap. IX,
§ 43, p. 219) defines a number-theoretic function to be
primitive recursive if it is definable by a finite number of applications of the six schemata in (
2), where
m and
n are positive integers,
i is an integer such that
,
q is a natural number, and
and
are number-theoretic functions with the indicated numbers of arguments.
Schema (I) defines the successor function, Schema (II) defines constant functions, and Schema (III) defines identity functions, which Kleene denotes with the symbol
. Kleene defines the functions satisfying Schemata (I), (II), and (III) in (
2) as
initial functions. Schema (IV) in (
2) obtains
from
,
by
substitution. Schemata (Va) and (Vb) obtain
from
or from
and
, respectively, by
primitive recursion. Kleene [
4] (Chap. XI, § 55, p. 275) defines a function to be
general recursive in functions
if there is a system
E of equations which defines
recursively from
.
Davis [
5] (Chap. 2, Sec. 2, p. 36) defines the operation of
composition as the operation to obtain the function
from the functions
,
with Schema (
3).
where
and
are tuples of natural numbers with
m and
n elements, respectively. Davis [
5] (Chap. 3, Sec. 4, p. 48) defines the operation of primitive recursion as the operation that uses Schema (
4) to construct the function
from the total functions
and
, where
,
, and
are tuples of natural numbers with
n,
, and
elements, respectively.
For a set of natural numbers
A, Davis [
5] (Chap. 3, Sec. 4, p. 49) defines an
A-
primitive recursive function or a function
primitive recursive in A as a function that can be obtained by a finite number of applications of composition (cf. Schema (
3)) and primitive recursion (cf. Schema (
4)) from the following
j functions:
where
is the characteristic function of
A (i.e.,
is a total function such that
if
and
if
), and
and
are identical to Kleene’s Schemata (I) and (III) in (
2). Davis [
5] (Chap. 3, Sec. 4, p. 49) defines a function
f to be primitive recursive if it is
∅-
primitive recursive, where
∅ denotes the empty set.
Rogers [
6] (Chap. 1, § 1.2, p. 6) defines the class
C of primitive recursive functions as the smallest class of functions such that
- (1)
All constant functions , are in C, , ;
- (2)
The successor function is in C;
- (3)
All identity functions are in C, ;
- (4)
If f is a function of k variables in C and are functions in C of m variables each, then the function is in C, ;
- (5)
If
h is a function of
variables in
C, and
g is a function of
variables in
C, then the unique function
f of
k variables satisfying Schema (
6) is also in
C,
.
Davis et al. [
7] (Chap. 3, Sec. 3, p. 42) defines as initial the functions
,
, and
,
, and defines a function to be primitive recursive if it can be obtained from the initial functions by a finite number of applications of composition or primitive recursion where primitive recursion is defined by Schema (
7) (Chap. 3, Sec. 2, p. 40 in [
7]) and Schema (
8) (Chap. 3, Sec. 2, p. 41 in [
7]). In Schema (
7),
k is a natural number and
g is a total function of two variables. In Schema (
8),
f and
g are total functions of
n and
variables, respectively.
2.1. Computability and Turing Machines
Davis [
5] (Chap. 1, Sec. 2, p. 10) gives the following definition of partially computable and computable functions.
Definition 1. An n-ary function is partially computable if there exists a Turing machine Z such that In this case, we say that Z computes f. If, in addition, is a total function, then it is called computable.
In subsequent chapters of his monograph (cf. Chap. 2 and Chap. 3 in [
5]), Davis separates the notion of computability from Turing machines to make it possible “to demonstrate the computability of quite complex functions without referring back to the original definition of computability in terms of Turing machines” (cf. Ch. 3, Sec. 1, p. 41 in [
5]).
Davis et al. [
7] (Chap. 2) continue this treatment of computability by designing the programming language
and then defining partially computable and computable functions in terms of
programs, viz., finite sequences of
instructions. In
, the unique variable
Y is designated as the output variable to store the output of an
program
on a given input.
are input variables and
are internal variables. All variables refer to natural numbers.
has conditional dispatch instructions, line labels, elementary arithmetic operations, comparisons of natural numbers, and macros.
Davis et al. [
7] (Chap. 2, Sec. 3, p. 27) define a
computation of an
program
on some inputs
, and
, as a finite sequence of
snapshots , where each snapshot
,
,
specifies the number of the instruction in
to be executed and the value of each variable in
, and where each subsequent snapshot is uniquely determined by the previous snapshot (Theorem 3.2, Chap. 4, Sec. 3, pp. 74–75 in [
7]). The snapshot
is the
initial snapshot, where the values of all input variables are set to their initial values, the program instruction counter is set to 1, i.e., the number of the first instruction in
, and the values of all the other variables in
are set to 0. The snapshot
in
is a
terminal snapshot, where the instruction counter is set to the number of the instructions in
plus 1. If some program
in
takes
m inputs
,
,
…,
, then
The definitions of partially computable and computable functions are made by Davis et al. [
7] (Chap. 2, Sec. 4, p. 30) in terms of
programs as follows.
Definition 2. An n-ary function f is partially computable if f is partial and there is a program such that Equation (10) holds for all . Definition 3. A n-ary function f is computable if it is total and partially computable.
Equation (
10) in Definition 2 is interpreted so that
is defined if and only if
is defined. This treatment of computable functions in terms of programs in a formal language is by no means the only one in the literature. For example, as early as 1967, Meyer and Ritchie [
19] formalize primitive recursive functions as loop programs consisting of assignment and iteration statements similar to DO statements of the programming language FORTRAN.
2.2. Computability of Primitive Recursive Functions
Davis et al. [
7] (Chap. 3, Sec. 3, p. 42) introduce the concept of a
primitive recursively closed (PRC) class of functions, which is a class of total functions that contains the initial functions and any functions obtained from the initial functions by a finite number of applications of composition or primitive recursion. Davis et al. [
7] (Chap. 3, Sec. 3, pp. 42–43) show that (1) the class of computable functions is PRC; (2) the class of primitive recursive functions is PRC; and (3) a function is primitive recursive if and only if it belongs to every PRC class. A corollary of (3) is that every primitive recursive function is computable.
Péter [
2,
3] shows it is possible to define functions in terms of recursive equations that are not primitive recursive. In particular, Péter demonstrates that all unary primitive recursive functions are enumerable, i.e.,
is an enumeration, with repetitions, of all unary primitive recursive functions. By Cantor’s diagonalization (cf., e.g., pp. 6–8 in [
4]), the unary function
is not in the enumeration and, hence, not primitive recursive. While
f is not primitive recursive, it is computable (cf. Definition 3). Thus, the class of primitive recursive functions is a proper subset of computable functions and, in and of itself, cannot completely capture the intuitive notion of a number-theoretic algorithm. Péter’s argument suffers no loss of generality, insomuch as any
n-ary primitive recursive function,
, can be reduced to an equivalent unary primitive recursive function (cf., Theorems 9.1 and 9.2, Chap. 4, Sec. 9, p. 108 in [
7]). Kleene’s separation of recursive functions into
general recursive and
primitive recursive may have been influenced by Péter’s discovery (cited by Kleene [
4] in Chap. XI, § 55, p. 272).
Rogers [
6] (Chap. 1, § 1.2, p. 8) defines the
Ackermann generalized exponential, a function for which there is no primitive recursive derivation, and formalizes it with the following recursive equations:
3. Notational Conventions and Definitions
If f is a function, and are the domain and the co-domain of f. The expression abbreviates the logical conjunction , for some sets A and B. A function f is partial on A if is a proper subset of A, i.e., . If f is partial on A and , the following statements are equivalent: (1) ; (2) f is defined on a; (3) is defined; (4) . The following statements are also equivalent: (1) ; (2) f is undefined on a; (3) is undefined; (4) . If , then f is total on A.
The notation
is used to denote
ordered n-tuples or, simply,
n-tuples over some set of numbers
A. We will use bold lower-case variables, e.g.,
, to refer to
n-tuples. Thus,
is a 3-tuple over the set of natural numbers
. We will use the symbol
to denote the set of positive natural numbers. If
is an
n-tuple over
A, then
,
, refers to individual elements of
. Thus, if
, then
,
,
,
,
. The individual elements of an
n-tuple are not required to be distinct. If
is an
n-tuple, then
, i.e., the number of elements in
. The 0-tuple is denoted as
. In calculus, a
sequence is an ordered set of numbers in a one-to-one correspondence with
or
(cf., e.g., Taylor [
20],
§ 1.62, p. 67). Thus, if
, then
denotes the sequence
with countably many elements or terms. In computability theory, the term
sequence sometimes refers to an
n-tuple (cf., e.g., Ch. 3, p. 60 in [
7]). Thus, in order to avoid confusion, when we want to emphasize the fact that we are dealing with a finite number of ordered elements, we refer to the collection of these elements as a
finite sequence, a
tuple, or an
m-tuple, where
m is the number of the elements.
For , is the n-th Cartesian power of A, i.e., . Thus, if , , where is the set of real numbers. We use statements like to mean that is an n-tuple over A. We do not distinguish between 1-tuples and individual elements, e.g., , , and for some function h.
In formalizing feedforward artificial neural networks, it is sometimes convenient to treat n-tuples as vectors. Therefore, we occasionally use symbols like , , to denote n-tuples. If , then and , , is the j-th element of . E.g., if , then and . If and , and , , then . If , , then = = = = = = ( = . The empty tuple is discarded in function arguments. E.g., if , then , . We occasionally separate individual arguments of functions from the remaining arguments combined into tuples. E.g., if , , then, for , , , , where , , and , . If f is a function that maps , …, to , for some sets , , , then .
A total function is a predicate, where 1 arbitrarily designates logical truth and 0 logical falsehood. The symbols ¬, ∧, ∨, → respectively refer to logical not, logical and, logical or, and logical implication. is a shorthand for , and is a shorthand for . If P and Q are predicates, then is logically equivalent to , i.e., . The symbols ∃ and ∀ refer to the logical existential (there exists) and universal (for all) quantifiers, respectively. Thus, the statement is logically equivalent to the statement that holds for at least one in , while the statement is logically equivalent to the statement that holds for every in .
Let, for
,
,
,
, and
. We use the following definitions of composition and primitive recursion in our article. A function of
is obtained from
by
composition if
h is obtained from
by Schema (
11).
Let
and
be total. A function
is obtained from
by
primitive recursion if it is obtained from
by Schema (
12).
Let
and
be total, then
is obtained from
f and
g by
primitive recursion if
h is obtained from
f and
g by Schema (
13), where
.
If
, Schema (
13) can be expressed with the vector notation as
Let the set of
initial functions consist of
Definition 4. A function is primitive recursive if it can be obtained from the initial functions by a finite number of applications of Schemata (11)–(13). A corollary of Definition 4 is that if f is primitive recursive, then there is a sequence of functions such that , , is either an initial function or is obtained from the previous functions in the sequence by composition or primitive recursion.
4. Feedforward Artificial Neural Networks
A
feedforward artificial neural network is a finite set of
neurons, each of which is connected to a finite number of the neurons in the same set through the
synapses, i.e., directed weighted edges (cf.
Figure 1). The neurons are organized into
l layers
, where
is the input layer,
is the output layer, and
,
are the hidden layers. We use the term
network synonymously with the term
feedforward artificial neural network.
Let
denote the number of layers in
and
refer to the
i-th neuron in layer
,
. The function
specifies the number of neurons in layer
of
. We assume that
is
fully connected, i.e., there is a synapse from every neuron in layer
to every neuron in layer
,
. Each synaptic weight
(cf.
Figure 1) is a real number. The vector
is the vector of all synaptic weights in
from
to
. Thus,
We let and assume, without loss of generality, that, for any synaptic weight , , because, if that is not the case, can be so scaled. No loss of generality is introduced with the assumption of full connectivity, because if full connectivity is not required, appropriate synaptic weights are set to zero. If, on the other hand, a given network is not fully connected, synapses with zero weights can be added as needed to make the network fully connected.
Each
,
computes an activation function
where
is the vector of the activations of the neurons in layers
,
, and
. If
is the input to
, then
. For the input layer, we have
The term
feedforward means that the activations of the neurons are computed layer by layer from the input layer to the output layer, because the activation functions of the neurons in the next layer require only the weights of the synapses connecting the current layer with the next one and the activation values, i.e., the outputs of the activation functions of the neurons in the current layer. If
is the input vector, then
The feedforward activation function
that computes the activations of
layer by layer can be defined as
Thus,
and
. If
maps
to
, for some sets
A and
B, we define the function
computed by
as
Definition 5. A function , for some sets A and B, is -computable if there is a network such that, for all , If
computes
f, we refer to
as
and use the expression
as a shorthand for
. Furthermore, if
computes
f, then, for
, the expressions
,
,
,
are equivalent in that
A network
can include other networks. Let
and
be two networks such as
and
, for some sets
A,
B,
C, and
. Then we can construct a new network
by feeding the output of
to
so that
(cf.
Figure 2). We can generalize this case to a network that includes arbitrarily many networks whose outputs are the inputs to another network whose output is the output of the entire complex network (cf.
Figure 3). Formally, let
be networks such that
for some sets
I and
O,
, and
. Let, for some set
S, a network
compute the function
so that
Then, for
such that
,
,
where
, and
.
We use the symbol to denote an identity network such that, for , , . One can think of as a single layer network of n neurons, where , .
Our formalization of feedforward artificial neural networks as finite sets of neurons and synapses organized in finitely many layers is in compliance with the original definition by McCulloch and Pitts (Sec. 2, p. 103 in [
11]) who state that the neurons of a given network may be assigned designations
,
,
…,
. It is also in compliance with the subsequent definition by Rumelhart, Hinton, and Williams [
12] as well as with modern treatments of neural networks by Nielsen [
17] and Goodfellow, Bengio, and Courville [
18] that continue to describe neural networks as finite sets of neurons and synapses.
5. -Computability of Primitive Recursive Functions
Lemma 1. The initial functions are -computable.
Proof. Let
be a network with a single input node
and a single output node
such that
and
Then,
,
. Let
be a network with a single input node
and a single output node
such that
and
Then,
,
. Let
,
,
be a network with
n input nodes
and one output node
. Let
,
,
,
, and
Then, if
,
□
We abbreviate as , because n and i are always evident from the context.
Lemma 2. Let , . Let , , be defined as The is -computable.
Proof. Since
is primitive recursive,
is primitive recursive, by the definition by cases theorem and its corollary (cf. Theorem 5.4, Chap. 3, Sec. 5, pp. 50–51 in [
7]). Let
be a network with
input nodes
, where the first
n nodes receive the
n corresponding values of
, and the last node
receives
. Let
have one output node
and let
,
. Let the activation function of
be defined as
We abbreviate as .
Lemma 3. Let f be a -computable function of k arguments, , and be -computable functions of n arguments each, . Let a function h of n arguments be obtained from by Schema (11). Then, h is -computable. Proof. Let
be computable by
,
. Then let
be a network such that, for
,
Then, for
, we have
whence
□
Lemma 4. Let . Then k is -computable.
Proof. Let
and
be as constructed in Lemma 1. Let
,
denote a network that consists of a finite sequence of
k networks
, where the first
receives its input from
and each subsequent
receives its input from the previous
(cf.
Figure 2). Let
. Let
. Let
denote
k compositions of
with itself, i.e.,
,
, etc. Then,
By induction on k, . By construction, , . □
The next lemma, Lemma 5, is a technical result for Lemma 6. The function
is primitive recursive (cf. Chap. 3, Sec. 4, p. 46 in [
7]).
Lemma 5. Let the function : be defined as Then, is -computable.
Proof. Let
have two input nodes
,
and one output node
. Let
=
= 1 and let
Then, for
, we have
□
Definition 6 confines the notion of -computability of some function to the -computability of the first k elements of the sequence , .
Definition 6. A function , for some sets A and B, is -computable elementwise for any if there is a network such that, for any , the first terms of the sequence are the same as the terms of the tuple i.e., .
Thus, if a function is -computable, it is -computable elementwise for any positive k.
Lemma 6. Let be -computable elementwise and be a function obtained from ϕ by Schema (12). Then, h is -computable elementwise. Proof. Let ϕ be computable elementwise by
. Let
as constructed in Lemma 4. In the equations below, we abbreviate
as 0,
as k,
as
,
as
and
as
. Let
By induction on t,
(cf.
Figure 4). Let
Then, the first
terms of the sequence
are the same as the terms of the tuple
(cf.
Figure 5 for
). □
Lemma 7. Let and be -computable elementwise and be a function obtained from f and g by Schema (13). Then h is -computable elementwise. Proof. Let
and
,
, such that
. Let f and g be
-computable elementwise by
and
, respectively. Let us abbreviate
as
and let
By induction on t,
=
. Let
Then the first terms of the sequence , i.e., , agree elementwise with the tuple . □
Figure 6 and
Figure 7 illustrate sample constructions of Lemma 7. If we treat
as a shorthand for
, then Lemmas 6 and 7 give us the following theorem.
Theorem 1. Let be a primitive recursive function, , . Then is -computable elementwise.
We can ask if the elementwise -computability of (cf. Definition 6) can be generalized to -computability. In other words, is it possible to have the sequences and agree term by term, i.e., ? Since has a finite set of neurons organized into a finite number of layers, can compute, per Lemmas 6 and 7, only the first values of , i.e., , , although m can be an arbitrarily large natural number. Thus, the answer to this question is negative.
Let us assume that in Theorem 1 is allowed to have countably many neurons so that the number of neurons in the hidden layers of is countable. Let be the function computed by . Since countably many neurons can be added to to compute , for any t, we have the sequence = , on the one hand, and the sequence , on the other hand. Let . Since , for any , is vacuously convergent, i.e., . Hence, we have the following theorem.
Theorem 2. Let be a primitive recursive function, , . Then there is a network with countably many neurons such that for any , the sequences and agree term by term, i.e., , .
6. Discussion
As mathematical objects, feedforward artificial neural networks are more computationally powerful than primitive recursive functions inasmuch as the former can compute functions over real numbers whereas the latter, by definition, cannot. E.g., one can define a network that computes the sum of n real numbers, which no primitive recursive function can compute. However, the situation changes when networks cease to be mathematical objects and become computational objects by being realized on finite memory devices. A finite memory device is a computational device with a finite amount of memory available for numerical computation [
21]. Such a device is analogous to a human scribe with a pencil and an eraser who is to carry out a numerical computation by writing and erasing symbols from a finite alphabet on a finite number of paper sheets. Finite memory devices are different from finite state automata of classical computability theory (e.g., a deterministic finite state machine (Chap. 2, Sec. 2.2 in [
22]), non-deterministic finite state machine (Chap. 2, Sec. 2.3 in [
22]), a Mealy or Moore machine (Chap. 2, Sec. 2.7 in [
22], a push down automaton (Chap. 5 in [
22]), or a Turing machine (Chap. 6 in [
7]), because the latter do not put any bounds on the number of cells in their tapes available for computation. A finite state automaton of classical computability becomes a finite memory device only when the number of its tape cells available for computation is bounded by a natural number.
A real number x is signifiable on a finite memory device
if and only if the finite amount of memory on
can hold its sign, where a sign is a sequence of arbitrary symbols from a finite alphabet [
21]. Thus, if the alphabet is { “.”, “0”, “1”, “2”, “3”, “4”, “5”, “6“, “7”, “8”, “9” } and
has 8 memory cells to represent a real number, then the real numbers 1.41, 1.414, 1.4142, 1.41421, 1.414213 are signifiable on
as “1.41”, “1.414”, “1.4142”, “1.41421”, “1.414213”, respectively, whereas the real numbers 1.4142135, 1.41421356, 1.414213562, 1.4142135623, and 1.41421356237 are not. A consequence of the finite amount of memory is that the set of real numbers signifiable on
is finite and, hence, vacuously countable. To put it differently, Cantor’s theorem (§ 2 in [
23]) does not apply insomuch as the number of signifiable reals on
in any interval
,
,
, is finite. Consequently, all computation of a feedforward artificial neural network
,
, realized on
, can be packed into a unique natural number
such that there exists a primitive recursive function
such that
if and only if
, where
uniquely corresponds to
and
to
(cf. Theorem 1, pp. 15–17 in [
21]). Theorem 1 is, after a fashion, the converse of Theorem 1 in [
21] in the sense that it shows how one can construct a network from a primitive recursive function.
Theorem 2 shows that all values of a primitive recursive function can be computed exactly by a feedforward artificial neural network if the network is allowed to have countably many neurons. This purely theoretical result contributes to the growing collection of universality theorems on feedforward neural networks and various classes of functions (cf. Ch. 4 in [
17]). Thus, Hornik et al. [
13] show that multilayer feedforward networks with a single hidden layer of neurons with arbitrary squashing activation functions can approximate any Borel measurable function from one dimensional space to another to any desired degree of accuracy so long as the number of the neurons in the hidden layer is unbounded. Gripenberg [
14] shows that the general approximation property of feedforward perceptron networks is achievable when the number of perceptrons in each layer is bounded but the number of layers is allowed to grow to infinity and the perceptron activation functions are continuously differentiable and not linear. Guliyev and Ismailov [
15] show that single hidden layer feedforward neural networks with the fixed weights of one and two neurons in the hidden layer approximate any continuous function on a compact subset of the real line and proceed to demonstrate that single layer feedforward networks with fixed weights cannot approximate all continuous multivariate functions.
We conclude our discussion with a caveat about universality results of feedforward neural networks with unbounded numbers of neurons. While these results provide valuable theoretical insights, they may not hold much sway with computer scientists interested in computability properties of finite AI, because networks with unbounded numbers of neurons cannot be realized on computational devices with finite amounts of computational memory.
7. Conclusions
We have formalized feedforward artificial neural networks with recurrence equations and proposed a formal definition of the concept of -computability, i.e., the property of a function to be computed by a feedforward artificial neural network . We have shown that, for a primitive recursive function , where is an n-tuple of natural numbers and t is a natural number, there exists a feedforward artificial neural network such that for any n-tuple of natural numbers , the first terms of the sequence agree elementwise with the tuple , for any positive natural number m. Our investigation contributes to the knowledge of the classes of functions that can be computed by feedforward artificial neural networks. Since such networks are used in some finite AI systems, our investigation may be of interest to mathematicians and computer scientists interested in the computability theory of finite AI.