Next Article in Journal
On Normalized Laplacian Spectra of the Weakly Zero-Divisor Graph of the Ring ℤn
Previous Article in Journal
Exact Solution for the Production Planning Problem with Several Regimes Switching over an Infinite Horizon Time
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks

by
Vladimir A. Kulyukin
Department of Computer Science, Utah State University, Logan, UT 84322, USA
Mathematics 2023, 11(20), 4309; https://doi.org/10.3390/math11204309
Submission received: 30 August 2023 / Revised: 29 September 2023 / Accepted: 9 October 2023 / Published: 16 October 2023
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
We show that, for a primitive recursive function h ( x , t ) , where x is a n-tuple of natural numbers and t is a natural number, there exists a feedforward artificial neural network N ( x , t ) , such that for any n-tuple of natural numbers z and a positive natural number m, the first m + 1 terms of the sequence { h ( z , t ) } are the same as the terms of the tuple ( N ( z , 0 ) , , N ( z , m ) ) .

1. Introduction

Primitive recursive functions describe, albeit incompletely, the intuitive notion of a number-theoretic algorithm, a deterministic procedure to transform numerical inputs to numerical outputs in finitely many steps. This perception of primitive recursive functions as intuitive counterparts of number-theoretic algorithms may be rooted in the fact that any primitive recursive function can be mechanically constructed from a set of initial functions with finitely many applications of simple, well-defined operations of composition and primitive recursion. These functions and some of their properties have been investigated by Gödel [1], Péter [2,3], Kleene [4], Davis [5], and Rogers [6] in their studies of formal systems, foundations of mathematics, and computability theory. Although the confinement of the construction procedure to two operations may at first seem restrictive, many functions on natural numbers ordinarily encountered in mathematics and computer science are, in fact, primitive recursive (cf., e.g., Ch. 3 in [7]). Primitive recursive functions have been used to investigate the foundations of functional programming. Colson [8] presents a computational model in which a primitive recursive function is viewed as a rewriting system and gives a non-trivial necessary condition for an algorithm to be representable in the system. Paolini et al. [9] define a class of recursive permutations, which they call Reversible Primitive Permutations (RPP), and formalize it as a language that is sufficiently expressive to represent all primitive recursive functions. Petersen [10] uses induction and primitive recursion to develop resource conscious logics where the repeated recycling of assumptions, e.g., repeated applications of the successor function f ( n ) = n + 1 to enumerate natural numbers, has costs.
Feedforward artificial neural networks have their origin in the research by McCulloch and Pitts [11], which describes neural events with propositional logic. McCulloch and Pitts assume that the human nervous system is a finite set of neurons, each of which has an excitation threshold. When a neuron’s threshold is exceeded, the neuron generates an impulse that propagates to other neurons across synapses connecting them to the origin of the impulse. A fundamental insight by McCulloch and Pitts is that if the response of a neuron can be formalized as a logical proposition specifying its stimulus, then behaviors of complex networks of neurons can, in principle, be described with symbolic logic. Artificial neural networks entered mainstream computer science almost half a century after the research by McCulloch and Pitts when Rumelhart, Hinton, and Williams [12] discovered backpropagation, a method for training networks to modify synapse weights by minimizing error between the output and the ground truth. Different types of such networks have been shown to be universal approximators of some classes of functions (e.g., [13,14,15]). Artificial neural networks are increasingly used in embedded artificial intelligence (AI) systems, i.e., systems that run on computational devices with finite amounts of computer memory (e.g., [16]). We will refer to embedded AI as finite AI to emphasize the fact that finite AI systems are realized on computational devices with finite amounts of computational memory.
In this investigation, we seek to relate, in a formal way, primitive recursive functions and feedforward artificial neural networks by investigating whether it is possible, for a given primitive recursive function, to construct a feedforward artificial neural network that arbitrarily computes many values of the function’s co-domain from the corresponding values of the function’s domain. We hope that our investigation contributes to the knowledge of the classes of functions that can be not only approximated, but provably computed by feedforward artificial neural networks. In particular, we formalize feedforward artificial neural networks with recurrence equations, propose a formal definition of the concept of N -computability, i.e., the property of a function to be computed by a feedforward artificial neural network N , and prove several lemmas and theorems to show how feedforward artificial neural networks can be constructed to arbitrarily compute many consecutive values of any primitive recursive function. Since these networks consist of finite sets of neurons and are used in some finite AI systems [17,18], our investigation will be of interest to mathematicians and computer scientists interested in the computability theory of finite AI.
The remainder of our article is organized as follows. In Section 2, we review several definitions of primitive recursive functions starting with the original definition by Gödel [1] and proceeding to the later definitions by Kleene [4], Davis [5], Rogers [6], and Davis et al. [7], and Meyer and Ritchie [19]. This section gives the reader a historical bird’s eye view of how the concept of primitive recursive function and its formalization have co-evolved in time. In Section 3, we state the notational conventions and give the definition of a primitive recursive function we use in this article. This section is intended for reference. In Section 4, we offer a formalization of feedforward artificial neural networks in terms of recursive equations. In Section 5, we prove several lemmas and theorems that form the bulk of our theoretical investigation. In Section 6, we present some perspectives on the obtained results and summarize our conclusions in Section 7.

2. Recursive Functions

Gödel [1] (Sec. 2, p. 157) describes the class of number-theoretic functions as the class of functions whose domains are non-negative integers or n-tuples thereof and whose values are non-negative integers. Gödel [1] (Sec. 2, pp. 157–159) states that a number-theoretic function ϕ ( x 1 , x 2 , , x n ) is recursively defined in terms of the number-theoretic functions ψ ( x 1 , x 2 , , x n 1 ) and μ ( x 1 , x 2 , x n + 1 ) if ϕ is obtained from ψ and μ by the following schema:
ϕ ( 0 , x 2 , , x n ) = ψ ( x 2 , , x n ) , ϕ ( k + 1 , x 2 , , x n ) = μ ( k , ϕ ( k , x 2 , , x n ) , x 2 , , x n ) ,
where the equalities hold for all k , x 2 , , x n . Gödel [1] (Sec. 2, p. 159) defines a number-theoretic function ϕ to be recursive if there exists a finite sequence of number-theoretic functions ϕ 1 , ϕ 2 , , ϕ n = ϕ , where each function ϕ i , 1 i n , is a natural number constant, the successor function x + 1 , or is defined from two preceding functions with Schema (1) or from one preceding function by substitution, i.e., the replacement of the arguments of a preceding function with some other preceding functions.
Kleene [4] (Chap. IX, § 43, p. 219) defines a number-theoretic function to be primitive recursive if it is definable by a finite number of applications of the six schemata in (2), where m and n are positive integers, i is an integer such that 1 i n , q is a natural number, and ψ , χ 1 , , χ m , and χ are number-theoretic functions with the indicated numbers of arguments.
( I ) ϕ ( x ) = x + 1 ; ( I I ) ϕ ( x 1 , , x n ) = q ; ( I I I ) ϕ ( x 1 , , x n ) = x i ; ( I V ) ϕ ( x 1 , , x n ) = ψ ( χ 1 ( x 1 , , x n ) , , χ m ( x 1 , , x n ) ) ; ( V a ) ϕ ( 0 ) = q , ϕ ( y + 1 ) = χ ( y , ϕ ( y ) ) ; ( V b ) ϕ ( 0 , x 2 , , x n ) = ψ ( x 2 , , x n ) , ϕ ( y + 1 , x 2 , , x n ) = χ ( y , ϕ ( y , x 2 , , x n ) , x 2 , , x n ) .
Schema (I) defines the successor function, Schema (II) defines constant functions, and Schema (III) defines identity functions, which Kleene denotes with the symbol U i n . Kleene defines the functions satisfying Schemata (I), (II), and (III) in (2) as initial functions. Schema (IV) in (2) obtains ϕ from ψ , χ 1 , , χ m by substitution. Schemata (Va) and (Vb) obtain ϕ from χ or from χ and ψ , respectively, by primitive recursion. Kleene [4] (Chap. XI, § 55, p. 275) defines a function to be general recursive in functions ψ 1 , , ψ l if there is a system E of equations which defines ϕ recursively from ψ 1 , , ψ l .
Davis [5] (Chap. 2, Sec. 2, p. 36) defines the operation of composition as the operation to obtain the function h ( x ( n ) ) from the functions f ( y ( m ) ) , g 1 ( x ( n ) ) , , g m ( x ( n ) ) with Schema (3).
h ( x ( n ) ) = f ( g 1 ( x ( n ) ) , , g m ( x ( n ) ) ) ,
where y ( m ) and x ( n ) are tuples of natural numbers with m and n elements, respectively. Davis [5] (Chap. 3, Sec. 4, p. 48) defines the operation of primitive recursion as the operation that uses Schema (4) to construct the function h ( x ( n + 1 ) ) from the total functions f ( x ( n ) ) and g ( x ( n + 2 ) ) , where x ( n ) , x ( n + 1 ) , and x ( n + 2 ) are tuples of natural numbers with n, n + 1 , and n + 2 elements, respectively.
h ( 0 , x ( n ) ) = f ( x ( n ) ) h ( z + 1 , x ( n ) ) = g ( z , h ( z , x ( n ) ) , x ( n ) ) .
For a set of natural numbers A, Davis [5] (Chap. 3, Sec. 4, p. 49) defines an A-primitive recursive function or a function primitive recursive in A as a function that can be obtained by a finite number of applications of composition (cf. Schema (3)) and primitive recursion (cf. Schema (4)) from the following j functions:
( 1 ) C A ( x ) ; ( 2 ) S ( x ) = x + 1 ; ( 3 ) N ( x ) = 0 ; ( 4 ) U i n ( x 1 , , x n ) = x i , 1 i n ,
where C A ( x ) is the characteristic function of A (i.e., C A ( x ) is a total function such that C A ( x ) = 1 if x A and C A ( x ) = 0 if x A ), and S ( x ) and U i n are identical to Kleene’s Schemata (I) and (III) in (2). Davis [5] (Chap. 3, Sec. 4, p. 49) defines a function f to be primitive recursive if it is -primitive recursive, where denotes the empty set.
Rogers [6] (Chap. 1, § 1.2, p. 6) defines the class C of primitive recursive functions as the smallest class of functions such that
(1)
All constant functions λ x 1 x 2 x k [ m ] , are in C, 1 k , 0 m ;
(2)
The successor function λ x [ x + 1 ] is in C;
(3)
All identity functions λ x 1 x k [ x i ] are in C, 1 i k ;
(4)
If f is a function of k variables in C and g 1 , , g k are functions in C of m variables each, then the function λ x 1 x m [ f ( g 1 ( x 1 , , x m ) , , g k ( x 1 , , x m ) ) ] is in C, 1 k , m ;
(5)
If h is a function of k + 1 variables in C, and g is a function of k 1 variables in C, then the unique function f of k variables satisfying Schema (6) is also in C, 1 k .
f ( 0 , x 2 , , x k ) = g ( x 2 , , x k ) , f ( y + 1 , x 2 , , x k ) = h ( y , f ( y , x 2 , , x k ) , x 2 , , x k ) .
Davis et al. [7] (Chap. 3, Sec. 3, p. 42) defines as initial the functions s ( x ) = x + 1 , n ( x ) = 0 , and u i n ( x 1 , , x n ) = x i , 1 i n , and defines a function to be primitive recursive if it can be obtained from the initial functions by a finite number of applications of composition or primitive recursion where primitive recursion is defined by Schema (7) (Chap. 3, Sec. 2, p. 40 in [7]) and Schema (8) (Chap. 3, Sec. 2, p. 41 in [7]). In Schema (7), k is a natural number and g is a total function of two variables. In Schema (8), f and g are total functions of n and n + 2 variables, respectively.
h ( 0 ) = k , h ( t + 1 ) = g ( t , h ( t ) )
h ( x 1 , , x n , 0 ) = f ( x 1 , , x n ) , h ( x 1 , , x n , t + 1 ) = g ( t , h ( x 1 , , x n , t ) , x 1 , , x n ) .

2.1. Computability and Turing Machines

Davis [5] (Chap. 1, Sec. 2, p. 10) gives the following definition of partially computable and computable functions.
Definition 1.
An n-ary function f ( x 1 , , x n ) is partially computable if there exists a Turing machine Z such that
f ( x 1 , , x n ) = Ψ Z ( n ) ( x 1 , , x n ) .
In this case, we say that Z computes f. If, in addition, f ( x 1 , , x n ) is a total function, then it is called computable.
In subsequent chapters of his monograph (cf. Chap. 2 and Chap. 3 in [5]), Davis separates the notion of computability from Turing machines to make it possible “to demonstrate the computability of quite complex functions without referring back to the original definition of computability in terms of Turing machines” (cf. Ch. 3, Sec. 1, p. 41 in [5]).
Davis et al. [7] (Chap. 2) continue this treatment of computability by designing the programming language L and then defining partially computable and computable functions in terms of L programs, viz., finite sequences of L instructions. In L , the unique variable Y is designated as the output variable to store the output of an L program P on a given input. X 1 , X 2 , . . . are input variables and Z 1 , Z 2 , . . . are internal variables. All variables refer to natural numbers. L has conditional dispatch instructions, line labels, elementary arithmetic operations, comparisons of natural numbers, and macros.
Davis et al. [7] (Chap. 2, Sec. 3, p. 27) define a computation of an L program P on some inputs x 1 , , x m , and m > 0 , as a finite sequence of snapshots ( s 1 , , s k ) , where each snapshot s i , 1 i k , k > 0 specifies the number of the instruction in P to be executed and the value of each variable in P , and where each subsequent snapshot is uniquely determined by the previous snapshot (Theorem 3.2, Chap. 4, Sec. 3, pp. 74–75 in [7]). The snapshot s 1 is the initial snapshot, where the values of all input variables are set to their initial values, the program instruction counter is set to 1, i.e., the number of the first instruction in P , and the values of all the other variables in P are set to 0. The snapshot s k in ( s 1 , , s k ) is a terminal snapshot, where the instruction counter is set to the number of the instructions in P plus 1. If some program P in L takes m inputs X 1 = x 1 , X 2 = x 2 , , X m = x m , then
Ψ P ( m ) ( x 1 , x 2 , , x m ) = Y in s k if ( s 1 , , s k ) is a computation , k 1 , o t h e r w i s e .
The definitions of partially computable and computable functions are made by Davis et al. [7] (Chap. 2, Sec. 4, p. 30) in terms of L programs as follows.
Definition 2.
An n-ary function f is partially computable if f is partial and there is a L program P such that Equation (10) holds for all x 1 , , x n .
f ( x 1 , , x n ) = Ψ P ( n ) ( x 1 , , x n ) .
Definition 3.
A n-ary function f is computable if it is total and partially computable.
Equation (10) in Definition 2 is interpreted so that f ( x 1 , , x n ) is defined if and only if Ψ P ( n ) ( x 1 , , x n ) is defined. This treatment of computable functions in terms of programs in a formal language is by no means the only one in the literature. For example, as early as 1967, Meyer and Ritchie [19] formalize primitive recursive functions as loop programs consisting of assignment and iteration statements similar to DO statements of the programming language FORTRAN.

2.2. Computability of Primitive Recursive Functions

Davis et al. [7] (Chap. 3, Sec. 3, p. 42) introduce the concept of a primitive recursively closed (PRC) class of functions, which is a class of total functions that contains the initial functions and any functions obtained from the initial functions by a finite number of applications of composition or primitive recursion. Davis et al. [7] (Chap. 3, Sec. 3, pp. 42–43) show that (1) the class of computable functions is PRC; (2) the class of primitive recursive functions is PRC; and (3) a function is primitive recursive if and only if it belongs to every PRC class. A corollary of (3) is that every primitive recursive function is computable.
Péter [2,3] shows it is possible to define functions in terms of recursive equations that are not primitive recursive. In particular, Péter demonstrates that all unary primitive recursive functions are enumerable, i.e., ϕ 0 ( x ) , ϕ 1 ( x ) , ϕ 2 ( x ) , is an enumeration, with repetitions, of all unary primitive recursive functions. By Cantor’s diagonalization (cf., e.g., pp. 6–8 in [4]), the unary function f ( x ) = ϕ x ( x ) + 1 is not in the enumeration and, hence, not primitive recursive. While f is not primitive recursive, it is computable (cf. Definition 3). Thus, the class of primitive recursive functions is a proper subset of computable functions and, in and of itself, cannot completely capture the intuitive notion of a number-theoretic algorithm. Péter’s argument suffers no loss of generality, insomuch as any n-ary primitive recursive function, n > 1 , can be reduced to an equivalent unary primitive recursive function (cf., Theorems 9.1 and 9.2, Chap. 4, Sec. 9, p. 108 in [7]). Kleene’s separation of recursive functions into general recursive and primitive recursive may have been influenced by Péter’s discovery (cited by Kleene [4] in Chap. XI, § 55, p. 272).
Rogers [6] (Chap. 1, § 1.2, p. 8) defines the Ackermann generalized exponential, a function for which there is no primitive recursive derivation, and formalizes it with the following recursive equations:
f ( 0 , 0 , y ) = y , f ( 0 , x + 1 , y ) = f ( 0 , x , y ) + 1 , f ( 1 , 0 , y ) = 0 , f ( z + 2 , 0 , y ) = 1 , f ( z + 1 , x + 1 , y ) = f ( z , f ( z + 1 , x , y ) , y ) .

3. Notational Conventions and Definitions

If f is a function, d o m ( f ) and c o d o m ( f ) are the domain and the co-domain of f. The expression f : A B abbreviates the logical conjunction d o m ( f ) = A c o d o m ( f ) = B , for some sets A and B. A function f is partial on A if d o m ( f ) is a proper subset of A, i.e., d o m ( f ) A . If f is partial on A and a A , the following statements are equivalent: (1) a d o m ( f ) ; (2) f is defined on a; (3) f ( a ) is defined; (4) f ( a ) . The following statements are also equivalent: (1) a d o m ( f ) ; (2) f is undefined on a; (3) f ( a ) is undefined; (4) f ( a ) . If d o m ( f ) = A , then f is total on A.
The notation ( a 1 , , a n ) is used to denote ordered n-tuples or, simply, n-tuples over some set of numbers A. We will use bold lower-case variables, e.g., a , x , y , to refer to n-tuples. Thus, a = ( 13 , 17 , 19 ) is a 3-tuple over the set of natural numbers N = { 0 , 1 , 2 } . We will use the symbol N + to denote the set of positive natural numbers. If x = ( x 1 , , x n ) is an n-tuple over A, then x [ j ] , 1 j n , refers to individual elements of x . Thus, if x = ( 2 , 3 , 5 , 7 , 11 ) , then x [ 1 ] = 2 , x [ 2 ] = 3 , x [ 3 ] = 5 , x [ 4 ] = 7 , x [ 5 ] = 11 . The individual elements of an n-tuple are not required to be distinct. If x is an n-tuple, then d i m ( x ) = n , i.e., the number of elements in x . The 0-tuple is denoted as ( ) . In calculus, a sequence is an ordered set of numbers in a one-to-one correspondence with N or N + (cf., e.g., Taylor [20], § 1.62, p. 67). Thus, if f : N N , then { f ( n ) } denotes the sequence f ( 0 ) , f ( 1 ) , , f ( m ) , with countably many elements or terms. In computability theory, the term sequence sometimes refers to an n-tuple (cf., e.g., Ch. 3, p. 60 in [7]). Thus, in order to avoid confusion, when we want to emphasize the fact that we are dealing with a finite number of ordered elements, we refer to the collection of these elements as a finite sequence, a tuple, or an m-tuple, where m is the number of the elements.
For n > 0 , A n is the n-th Cartesian power of A, i.e., A n = { ( a 1 , , a n ) | a i A , 1 i n } . Thus, if f : R 2 N , d o m ( f ) = { ( x 1 , x 2 ) | x 1 , x 2 R } , where R is the set of real numbers. We use statements like a A n to mean that a is an n-tuple over A. We do not distinguish between 1-tuples and individual elements, e.g., a = ( a ) , a A , and h ( a ) = h ( ( a ) ) for some function h.
In formalizing feedforward artificial neural networks, it is sometimes convenient to treat n-tuples as vectors. Therefore, we occasionally use symbols like x , y , z to denote n-tuples. If x A n , then d i m ( x ) = n and x [ j ] , 1 j n , is the j-th element of x . E.g., if x = ( 1 , 1 , 11 ) N 3 , then x [ 1 ] = x [ 2 ] = 1 and x [ 3 ] = 11 . If a A n and a A n , and a [ j ] = a [ j ] , 1 j n , then a = a . If f : A n B m , 0 < n , m , then f ( x 1 , , x n ) = f ( x ) = f ( x ) = f ( x [ 1 ] , , x [ n ] ) = f ( x [ 1 ] , , x [ n ] ) = y = y = ( y [ 1 ] , , y [ m ] ) = ( y [ 1 ] , , y [ m ] ) . The empty tuple is discarded in function arguments. E.g., if h : N N , then h ( ( ) , t ) = h ( t , ( ) ) = h ( t ) , t N . We occasionally separate individual arguments of functions from the remaining arguments combined into tuples. E.g., if f : N n + 2 N , 0 < n , then, for z N n , x N , y N , f ( z , x , y ) = f ( z [ 1 ] , , z [ n ] , x , y ) = f ( w ) , where z [ i ] = w [ i ] , 1 i n , and w [ n + 1 ] = x , w [ n + 2 ] = y . If f is a function that maps a 1 A 1 n 1 , , a k A k n k to c C m , for some sets A 1 , , A k , 0 < n j , m , 1 j k , then f : A 1 n 1 , , A k n k C m .
A total function P : A n { 0 , 1 } is a predicate, where 1 arbitrarily designates logical truth and 0 logical falsehood. The symbols ¬, ∧, ∨, → respectively refer to logical not, logical and, logical or, and logical implication. P ( x ) is a shorthand for P ( x ) = 1 , and ¬ P ( x ) is a shorthand for P ( x ) = 0 . If P and Q are predicates, then ¬ P Q is logically equivalent to P Q , i.e., ¬ P Q P Q . The symbols ∃ and ∀ refer to the logical existential (there exists) and universal (for all) quantifiers, respectively. Thus, the statement ( x ) P ( x ) is logically equivalent to the statement that P ( x ) holds for at least one x in d o m ( P ) , while the statement ( x ) P ( x ) is logically equivalent to the statement that P ( x ) holds for every x in d o m ( P ) .
Let, for 0 < k , n , f : N k N , g j : N n N , 1 j k , and x N n . We use the following definitions of composition and primitive recursion in our article. A function of h : N n N is obtained from f , g j by composition if h is obtained from f , g j by Schema (11).
h ( x ) = f ( g 1 ( x ) , . . . , g k ( x ) ) .
Let k N and ϕ : N 2 N be total. A function h : N N is obtained from ϕ by primitive recursion if it is obtained from ϕ by Schema (12).
h ( 0 ) = k , h ( t + 1 ) = ϕ ( t , h ( t ) ) .
Let f : N n N and g : N n + 2 N be total, then h : N n + 1 N is obtained from f and g by primitive recursion if h is obtained from f and g by Schema (13), where x N n .
h ( x , 0 ) = f ( x ) , h ( x , t + 1 ) = g ( t , h ( x , t ) , x ) .
If x N n , Schema (13) can be expressed with the vector notation as
h ( x , 0 ) = f ( x ) , h ( x , t + 1 ) = g ( t , h ( x , t ) , x ) .
Let the set of initial functions consist of
s ( x ) = x + 1 , x N ; n ( x ) = 0 , x N ; u i n ( x 1 , , x n ) = u i n ( x ) = x [ i ] = u i n ( x ) = x [ i ] = x i , 1 i n , x = x N .
Definition 4.
A function is primitive recursive if it can be obtained from the initial functions by a finite number of applications of Schemata (11)–(13).
A corollary of Definition 4 is that if f is primitive recursive, then there is a sequence of functions ϕ 1 , , ϕ n = f such that ϕ i , 1 i n , is either an initial function or is obtained from the previous functions in the sequence by composition or primitive recursion.

4. Feedforward Artificial Neural Networks

A feedforward artificial neural network N z is a finite set of neurons, each of which is connected to a finite number of the neurons in the same set through the synapses, i.e., directed weighted edges (cf. Figure 1). The neurons are organized into l layers E 1 , , E l , where E 1 is the input layer, E l is the output layer, and E e , 1 < e < l are the hidden layers. We use the term network synonymously with the term feedforward artificial neural network.
Let z z denote the number of layers in N z and n i e refer to the i-th neuron in layer E e , 1 e z z . The function n n z ( e ) : N + N + specifies the number of neurons in layer E e of N z . We assume that N z is fully connected, i.e., there is a synapse from every neuron in layer E e to every neuron in layer E e + 1 , 1 e < z z . Each synaptic weight w i , j e (cf. Figure 1) is a real number. The vector w e is the vector of all synaptic weights in N z from E e to E e + 1 . Thus,
w e = w 1 , 1 e , , w 1 , n n z ( e + 1 ) e , , w n n z ( e ) , 1 e , , w n n z ( e ) , n n z ( e + 1 ) e .
We let w 0 = ( ) and assume, without loss of generality, that, for any synaptic weight w i , j e , 0 w i , j e 1 , because, if that is not the case, w i , j e can be so scaled. No loss of generality is introduced with the assumption of full connectivity, because if full connectivity is not required, appropriate synaptic weights are set to zero. If, on the other hand, a given network is not fully connected, synapses with zero weights can be added as needed to make the network fully connected.
Each n i e , e > 1 computes an activation function
α i e a e 1 , w e 1 : R d i m ( a e 1 ) , R d i m ( w e 1 ) R ,
where a e 1 is the vector of the activations of the neurons in layers E e 1 , d i m ( a e 1 ) = n n z ( e 1 ) , and d i m ( w e 1 ) = n n z ( e 1 ) n n z ( e ) . If x is the input to N z , then a 1 = x . For the input layer, we have
α i 1 ( x , w 0 ) = α 1 1 ( x , ( ) ) = x [ i ] , 1 i n n z ( 1 ) .
The term feedforward means that the activations of the neurons are computed layer by layer from the input layer to the output layer, because the activation functions of the neurons in the next layer require only the weights of the synapses connecting the current layer with the next one and the activation values, i.e., the outputs of the activation functions of the neurons in the current layer. If x is the input vector, then
a 1 = α 1 1 x , ( ) , α n n z ( 1 ) 1 x , ( ) = x , a e = α 1 e a e 1 , w e 1 , , α n n ( e ) e a e 1 , w e 1 , 1 < e < z z .
The feedforward activation function f z that computes the activations of N z layer by layer can be defined as
f z ( x , 0 ) = x , f z ( x , e + 1 ) = α 1 e + 1 f z x , e , w e , , α n n ( e + 1 ) e + 1 f z x , e , w e .
Thus, f z ( x , 0 ) = f z ( x , 1 ) = a 1 = x = x and f z ( x , e ) = a e , 1 e z z . If N z maps a 1 A n to b z z B m , for some sets A and B, we define the function ζ z : A n B m computed by N z as
ζ z ( x ) = f z ( x , z z ) .
Definition 5.
A function f : A n B m , for some sets A and B, is N -computable if there is a network N z such that, for all x = x A n ,
ζ z ( x ) = f ( x , z z ) = y = ζ z ( x ) = f ( x , z z ) = y B m .
If N z computes f, we refer to N z as N f ( · ) and use the expression N f ( · ) : A n B m as a shorthand for ζ z : A n B m . Furthermore, if N z computes f, then, for x = x A n , the expressions ζ z ( x ) , ζ z ( x ) , N z ( x ) , N z ( x ) are equivalent in that
ζ z ( x ) = N z ( x ) = y = ζ z ( x ) = N z ( x ) = y B m .
A network N z can include other networks. Let N j and N k be two networks such as ζ j : A m B n and ζ k : B n C k , for some sets A, B, C, and 0 < m , n , k . Then we can construct a new network N l by feeding the output of N j to N k so that ζ l : A m C k (cf. Figure 2). We can generalize this case to a network that includes arbitrarily many networks whose outputs are the inputs to another network whose output is the output of the entire complex network (cf. Figure 3). Formally, let N z 1 , , N z l be networks such that ζ z 1 : I n z 1 O k z 1 , , ζ z l : I n z l O k z l for some sets I and O, 0 < n z i , k z i , and 1 i l . Let, for some set S, a network N j compute the function
ζ j : O k z 1 , O k z 2 , , O k z l S m
so that
ζ z ( x z 1 , , x z l ) = ζ j ( ζ z 1 ( x z 1 ) , , ζ z l ( x z l ) ) = s S m , x z i I n z i , 1 i l .
Then, for x z i I n z i such that x z i = x z i , 1 i l ,
N z ( x z 1 , , x z l ) = N z ( y ) = N j ( N z 1 ( x z 1 ) , , N z l ( x z l ) ) = s S m ,
where y = ( x z 1 [ 1 ] , , x z 1 [ n z 1 ] , , x z l [ 1 ] , , x z l [ n z l ] ) , and s = s .
We use the symbol N i d to denote an identity network such that, for a = a A n , 0 < n , ζ i d ( a ) = a = ζ i d ( a ) = a . One can think of N i d as a single layer network of n neurons, where α i 1 ( a , ( ) ) = a [ i ] = α i 1 ( a , ( ) ) = a [ i ] , 1 i n .
Our formalization of feedforward artificial neural networks as finite sets of neurons and synapses organized in finitely many layers is in compliance with the original definition by McCulloch and Pitts (Sec. 2, p. 103 in [11]) who state that the neurons of a given network may be assigned designations c 1 , c 2 , , c n . It is also in compliance with the subsequent definition by Rumelhart, Hinton, and Williams [12] as well as with modern treatments of neural networks by Nielsen [17] and Goodfellow, Bengio, and Courville [18] that continue to describe neural networks as finite sets of neurons and synapses.

5. N -Computability of Primitive Recursive Functions

Lemma 1.
The initial functions are N -computable.
Proof. 
Let N n ( · ) : N N be a network with a single input node n 1 1 and a single output node n 1 2 such that w 1 , 1 1 = 0 and α 1 2 ( a 1 , w 1 ) = a 1 [ 1 ] w 1 [ 1 ] . Then, ζ n ( · ) ( x ) = α 1 2 ( ( x ) ( 0 ) ) = x · 0 = 0 = n ( x ) , x N . Let N s ( · ) : N N be a network with a single input node n 1 1 and a single output node n 1 2 such that w 1 , 1 1 = 1 and α 1 2 ( a , w ) = a [ 1 ] w 1 [ 1 ] + 1 . Then, ζ s ( · ) ( x ) = α 1 2 ( ( x ) , ( 1 ) ) = x w 1 , 1 1 + 1 = s ( x ) , x N . Let N u i n ( · ) : N n N , 1 i n , n > 0 be a network with n input nodes n 1 1 , , n n 1 and one output node n 1 2 . Let w i 1 = 1 , w j 1 = 0 , i j , 1 j n , and
α 1 2 ( a 1 , w 1 ) = j = 1 n a 1 [ j ] w 1 [ j ] .
Then, if a = a N n ,
ζ u i n ( · ) ( a ) = α 1 2 ( a 1 , w 1 ) = a [ i ] = α 1 2 ( a , w 1 ) = a [ i ] = u i n ( a [ 1 ] , , a [ n ] ) .
We abbreviate N u i n ( · ) as N u ( · ) , because n and i are always evident from the context.
Lemma 2.
Let x N n , n > 0 . Let c i n ( x ) , 1 i n , be defined as
c i n ( x ) = u 1 n ( x ) if i = 1 , u 2 n ( x ) if i = 2 , u n n ( x ) if i = n .
The c i n is N -computable.
Proof. 
Since u i n is primitive recursive, c i n is primitive recursive, by the definition by cases theorem and its corollary (cf. Theorem 5.4, Chap. 3, Sec. 5, pp. 50–51 in [7]). Let N c i n ( · ) be a network with n + 1 input nodes n 1 1 , , n n + 1 1 , where the first n nodes receive the n corresponding values of x N n , and the last node n n + 1 1 receives 1 i n . Let N c i n ( · ) have one output node n 1 2 and let w j , k 1 = 1 , 1 j n . Let the activation function of n 1 2 be defined as
α 1 2 ( a 1 , w 1 ) = a 1 [ 1 ] w 1 [ 1 ] if a 1 [ n + 1 ] = 1 , a 1 [ 2 ] w 1 [ 2 ] if a 1 [ n + 1 ] = 2 , a 1 [ n ] w 1 [ n ] if a 1 [ n + 1 ] = n .
Then,
ζ c i n ( x , j ) = x [ j ] = u j n ( x ) = c j n ( x ) .
We abbreviate N c i n ( · ) as N c ( · ) .
Lemma 3.
Let f be a N -computable function of k arguments, k > 0 , and g 1 , . . . , g k be N -computable functions of n arguments each, n > 0 . Let a function h of n arguments be obtained from f , g 1 , , g k by Schema (11). Then, h is N -computable.
Proof. 
Let f , g 1 , , g k be computable by N f ( · ) : N k N , N g 1 ( · ) : N n N , , N g k ( · ) : N n N . Then let N j : N n N be a network such that, for x N n ,
N j ( x ) = N f ( · ) ( N g 1 ( · ) ( x ) , , N g k ( · ) ( x ) ) .
Then, for z N n , we have
N j ( z ) = N f ( · ) ( N g 1 ( · ) ( z ) , , N g k ( · ) ( z ) ) = f ( g 1 ( z ) , , g k ( z ) ) = h ( x ) ,
whence
ζ j ( x ) = h ( x ) .
Lemma 4.
Let k N . Then k is N -computable.
Proof. 
Let N n ( · ) and N s ( · ) be as constructed in Lemma 1. Let { N s ( · ) } k , k 0 denote a network that consists of a finite sequence of k networks N s ( · ) , where the first N s ( · ) receives its input from N n ( · ) and each subsequent N s ( · ) receives its input from the previous N s ( · ) (cf. Figure 2). Let { N s ( · ) } 0 = N n ( · ) . Let N J k ( 0 ) = { N s ( · ) } k ( N n ( · ) ( 0 ) ) . Let s k ( x ) denote k compositions of s ( x ) with itself, i.e., s 1 ( x ) = s ( x ) , s 2 ( x ) = s ( s ( x ) ) , etc. Then,
N J 0 ( 0 ) = { N s ( · ) } 0 ( N n ( · ) ( 0 ) ) = 0 ; N J 1 ( 0 ) = { N s ( · ) } 1 ( N n ( · ) ( 0 ) ) = s ( 0 ) = 1 ; N J 2 ( 0 ) = { N s ( · ) } 2 ( N n ( · ) ( 0 ) ) = s 2 ( 0 ) = 2 ; N J k ( 0 ) = { N s ( · ) } k ( N n ( · ) ( 0 ) ) = s k ( 0 ) = k .
By induction on k, ζ J k ( 0 ) = k . By construction, ζ J k ( n ) = k , n N . □
The next lemma, Lemma 5, is a technical result for Lemma 6. The function x · y is primitive recursive (cf. Chap. 3, Sec. 4, p. 46 in [7]).
Lemma 5.
Let the function x · y : N 2 N be defined as
x · y = { x y if x 0 , 0 if x < y .
Then, x · y is N -computable.
Proof. 
Let N · ( · ) have two input nodes n 1 1 , n 2 1 and one output node n 1 2 . Let w 1 , 1 1 = w 2 , 1 1 = 1 and let
α 1 2 ( a 1 , w 1 ) = a 1 [ 1 ] w 1 a 1 [ 2 ] w 1 [ 2 ] if a 1 [ 1 ] a 1 [ 2 ] , 0 if a 1 [ 1 ] < a 1 [ 2 ] .
Then, for a = a N 2 , we have
ζ · ( · ) ( a ) = α 1 2 ( a 1 , w 1 ) = a 1 [ 1 ] · a [ 2 ] = α 1 2 ( a , w 1 ) a [ 1 ] · a [ 2 ] .
Definition 6 confines the notion of N -computability of some function f ( x , t ) to the N -computability of the first k elements of the sequence { f ( x , t ) } , t N .
Definition 6.
A function f : A n × N B m , for some sets A and B, is N -computable elementwise for any k > 0 if there is a network N z such that, for any z A n , the first k + 1 terms of the sequence
{ f ( z , j ) } = f ( z , 0 ) , f ( z , 1 ) , , f ( z , k ) ,
are the same as the terms of the tuple
( N ( z , 0 ) , N ( z , 1 ) , , N ( z , k ) ) ,
i.e., f ( z , i ) = N ( z , i ) , 0 i k .
Thus, if a function f ( x , t ) is N -computable, it is N -computable elementwise for any positive k.
Lemma 6.
Let ϕ : N 2 N be N -computable elementwise and h ( t ) be a function obtained from ϕ by Schema (12). Then, h is N -computable elementwise.
Proof. 
Let ϕ be computable elementwise by N ϕ ( · ) . Let N h ˜ 0 ( 0 ) = N J k ( 0 ) = k as constructed in Lemma 4. In the equations below, we abbreviate N n ( · ) ( 0 ) as 0, N J k ( 0 ) as k, N J t ( 0 ) as N J t , N · ( · ) ( x , y ) as x · y and N h ˜ i ( i ) as N h ˜ i . Let
N h ˜ 0 = k , N h ˜ t + 1 = N ϕ ( · ) N J t , N h ˜ t .
By induction on t, h ( t ) = N h ˜ ( · ) ( t ) (cf. Figure 4). Let
N h ( · ) ( t ) = N c ( · ) N h ˜ 0 , , N h ˜ m , t + 1 , 0 t m , m > 0 .
Then, the first m + 1 terms of the sequence { h ( t ) } are the same as the terms of the tuple ( N h ( · ) ( 0 ) , , N h ( · ) ( m ) ) (cf. Figure 5 for m = 3 ). □
Lemma 7.
Let f : N n N and g : N 2 N be N -computable elementwise and h : N n + 1 N be a function obtained from f and g by Schema (13). Then h is N -computable elementwise.
Proof. 
Let x N n and y N n + 2 , n > 0 , such that y = ( y 1 , y 2 , x [ 1 ] , , x [ n ] ) . Let f and g be N -computable elementwise by N f ( · ) and N g ( · ) , respectively. Let us abbreviate N h ˜ x , t ( x , t ) as N h ˜ x , t and let
N h ˜ x , 0 = N f ( · ) ( x ) , N h ˜ x , t + 1 = N g ( · ) t , N h ˜ x , t , x .
By induction on t, h ( x , t ) = N h ˜ x , t . Let
N h ( · ) ( x , t ) = N c ( · ) N h ˜ x , 0 , , N h ˜ x , m , t + 1 , 0 t m , m > 0 .
Then the first m + 1 terms of the sequence { h ( x , t ) } , i.e., h ( x , 0 ) , , h ( x , m ) , agree elementwise with the tuple ( N h ( · ) ( x , 0 ) , , N h ( · ) ( x , m ) ) . □
Figure 6 and Figure 7 illustrate sample constructions of Lemma 7. If we treat h ( t ) as a shorthand for h ( ( ) , t ) , then Lemmas 6 and 7 give us the following theorem.
Theorem 1.
Let h ( x , t ) be a primitive recursive function, x N n , n 0 . Then h ( x , t ) is N -computable elementwise.
We can ask if the elementwise N -computability of h ( x , t ) (cf. Definition 6) can be generalized to N -computability. In other words, is it possible to have the sequences { h ( x , t ) } and { N ( x , n ) } agree term by term, i.e., h ( x , t ) = N ( x , t ) ? Since N has a finite set of neurons organized into a finite number of layers, N can compute, per Lemmas 6 and 7, only the first m + 1 values of h ( x , t ) , i.e., h ( x , t ) , 0 t m , although m can be an arbitrarily large natural number. Thus, the answer to this question is negative.
Let us assume that N h ( x , t ) in Theorem 1 is allowed to have countably many neurons so that the number of neurons in the hidden layers of N h ( x , t ) is countable. Let ζ N ( x , t ) be the function computed by N h ( x , t ) . Since countably many neurons can be added to N h ( x , t ) to compute h ( x , t ) , for any t, we have the sequence { ζ N ( x , t ) } = { N ( x , t ) } , on the one hand, and the sequence { h ( x , t ) } , on the other hand. Let f ( x , t ) = h ( x , t ) ζ N ( x , t ) . Since h ( x , t ) = ζ N ( x , t ) , for any t N , { f ( x , t ) } is vacuously convergent, i.e., lim t f ( x , t ) = 0 . Hence, we have the following theorem.
Theorem  2.
Let h ( x , t ) be a primitive recursive function, x N n , n 0 . Then there is a network N ( x , t ) with countably many neurons such that for any z N n , the sequences { h ( z , t ) } and { ζ N ( z , t } ) } agree term by term, i.e., h ( z , t ) = ζ N ( z , t ) , t N .

6. Discussion

As mathematical objects, feedforward artificial neural networks are more computationally powerful than primitive recursive functions inasmuch as the former can compute functions over real numbers whereas the latter, by definition, cannot. E.g., one can define a network that computes the sum of n real numbers, which no primitive recursive function can compute. However, the situation changes when networks cease to be mathematical objects and become computational objects by being realized on finite memory devices. A finite memory device is a computational device with a finite amount of memory available for numerical computation [21]. Such a device is analogous to a human scribe with a pencil and an eraser who is to carry out a numerical computation by writing and erasing symbols from a finite alphabet on a finite number of paper sheets. Finite memory devices are different from finite state automata of classical computability theory (e.g., a deterministic finite state machine (Chap. 2, Sec. 2.2 in [22]), non-deterministic finite state machine (Chap. 2, Sec. 2.3 in [22]), a Mealy or Moore machine (Chap. 2, Sec. 2.7 in [22], a push down automaton (Chap. 5 in [22]), or a Turing machine (Chap. 6 in [7]), because the latter do not put any bounds on the number of cells in their tapes available for computation. A finite state automaton of classical computability becomes a finite memory device only when the number of its tape cells available for computation is bounded by a natural number.
A real number x is signifiable on a finite memory device D j if and only if the finite amount of memory on D j can hold its sign, where a sign is a sequence of arbitrary symbols from a finite alphabet [21]. Thus, if the alphabet is { “.”, “0”, “1”, “2”, “3”, “4”, “5”, “6“, “7”, “8”, “9” } and D j has 8 memory cells to represent a real number, then the real numbers 1.41, 1.414, 1.4142, 1.41421, 1.414213 are signifiable on D j as “1.41”, “1.414”, “1.4142”, “1.41421”, “1.414213”, respectively, whereas the real numbers 1.4142135, 1.41421356, 1.414213562, 1.4142135623, and 1.41421356237 are not. A consequence of the finite amount of memory is that the set of real numbers signifiable on D j is finite and, hence, vacuously countable. To put it differently, Cantor’s theorem (§ 2 in [23]) does not apply insomuch as the number of signifiable reals on D j in any interval ( α β ) , α , β R , α < β , is finite. Consequently, all computation of a feedforward artificial neural network N z : R n R m , 0 < n , m , realized on D j , can be packed into a unique natural number Ω z such that there exists a primitive recursive function f ˜ : N N such that ζ z ( x ) = a if and only if f ˜ ( x ˜ ) = a ˜ , where x uniquely corresponds to x ˜ and a to a ˜ (cf. Theorem 1, pp. 15–17 in [21]). Theorem 1 is, after a fashion, the converse of Theorem 1 in [21] in the sense that it shows how one can construct a network from a primitive recursive function.
Theorem 2 shows that all values of a primitive recursive function can be computed exactly by a feedforward artificial neural network if the network is allowed to have countably many neurons. This purely theoretical result contributes to the growing collection of universality theorems on feedforward neural networks and various classes of functions (cf. Ch. 4 in [17]). Thus, Hornik et al. [13] show that multilayer feedforward networks with a single hidden layer of neurons with arbitrary squashing activation functions can approximate any Borel measurable function from one dimensional space to another to any desired degree of accuracy so long as the number of the neurons in the hidden layer is unbounded. Gripenberg [14] shows that the general approximation property of feedforward perceptron networks is achievable when the number of perceptrons in each layer is bounded but the number of layers is allowed to grow to infinity and the perceptron activation functions are continuously differentiable and not linear. Guliyev and Ismailov [15] show that single hidden layer feedforward neural networks with the fixed weights of one and two neurons in the hidden layer approximate any continuous function on a compact subset of the real line and proceed to demonstrate that single layer feedforward networks with fixed weights cannot approximate all continuous multivariate functions.
We conclude our discussion with a caveat about universality results of feedforward neural networks with unbounded numbers of neurons. While these results provide valuable theoretical insights, they may not hold much sway with computer scientists interested in computability properties of finite AI, because networks with unbounded numbers of neurons cannot be realized on computational devices with finite amounts of computational memory.

7. Conclusions

We have formalized feedforward artificial neural networks with recurrence equations and proposed a formal definition of the concept of N -computability, i.e., the property of a function to be computed by a feedforward artificial neural network N . We have shown that, for a primitive recursive function h ( x , t ) , where x is an n-tuple of natural numbers and t is a natural number, there exists a feedforward artificial neural network N ( x , t ) such that for any n-tuple of natural numbers z , the first m + 1 terms of the sequence { h ( z , t ) } agree elementwise with the tuple ( N ( z , 0 ) , , N ( z , m ) ) , for any positive natural number m. Our investigation contributes to the knowledge of the classes of functions that can be computed by feedforward artificial neural networks. Since such networks are used in some finite AI systems, our investigation may be of interest to mathematicians and computer scientists interested in the computability theory of finite AI.

Funding

This research received no external funding.

Data Availability Statement

No new data were created.

Acknowledgments

The author is grateful to the four anonymous reviewers for their feedback.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Gödel, K. On formally undecidable propositions of Principia Mathematica and related systems I. In Kurt Gödel Collected Works Volume I Publications 1929–1936; Feferman, S., Dawson, J.W., Kleene, S.C., Moore, G.H., Solovay, R.M., van Heijenoort, J., Eds.; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
  2. Péter, R. Konstruktion nichtrekursiver funktionen. Math. Ann. 1935, 111, 42–60. [Google Scholar] [CrossRef]
  3. Péter, R. Recursive Functionen; Academinai Kiado: Budapest, Hungary, 1951. [Google Scholar]
  4. Kleene, S.C. Introduction to Metamathematics; D. Van Nostrand: New York, NY, USA, 1952. [Google Scholar]
  5. Davis, M. Computability and Unsolvability; Dover Publications, Inc.: New York, NY, USA, 1982. [Google Scholar]
  6. Rogers, H., Jr. Theory of Recursive Functions and Effective Computability; The MIT Press: Cambridge, NY, USA, 1988. [Google Scholar]
  7. Davis, M.; Sigal, R.; Weyuker, E. Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science, 2nd ed.; Harcourt, Brace & Company: Boston, MA, USA, 1994. [Google Scholar]
  8. Colson, L. About primitive recursive algorithms. Theor. Comput. Sci. 1991, 83, 57–69. [Google Scholar] [CrossRef]
  9. Paolini, L.; Piccolo, M.; Roversi, L. A class of recursive permutations which is primitive recursive complete. Theor. Comput. Sci. 2020, 813, 218–233. [Google Scholar] [CrossRef]
  10. Petersen, U. Induction and primitive recursion in a resource conscious logic—With a new suggestion of how to assign a measure of complexity to primitive recursive functions. Dilemmata Jahrb. ASFPG 2008, 3, 49–106. [Google Scholar]
  11. McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
  12. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  13. Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 5, 359–366. [Google Scholar] [CrossRef]
  14. Gripenberg, G. Approximation by neural networks with a bounded number of nodes at each level. J. Approx. Theory 2003, 122, 260–266. [Google Scholar] [CrossRef]
  15. Guliyev, N.; Ismailov, V. On the approximation by single hidden layer feedforward neural networks with fixed weights. Neural Netw. 2019, 98, 296–304. [Google Scholar] [CrossRef]
  16. Zhang, Z.; Li, J. A review of artificial intelligence in embedded systems. Micromachines 2023, 14, 897. [Google Scholar] [CrossRef]
  17. Nielsen, M. Neural Networks and Deep Learning; Determination Press: San Francisco, CA, USA, 2015. [Google Scholar]
  18. Goodfellow, I.; Bengio, Y.; Courville, A. Neural Networks; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  19. Meyer, M.; Ritchie, D. The complexity of loop programs. In Proceedings of the ACM National Meeting, Washington, DC, USA, 30 August 1967; pp. 465–469. [Google Scholar]
  20. Taylor, A.E. Advanced Calculus; Ginn & Company: Boston, MA, USA, 1955. [Google Scholar]
  21. Kulyukin, V.A. On correspondences between feedforward artificial neural networks on finite memory automata and classes of primitive recursive functions. Mathematics 2023, 11, 2620. [Google Scholar] [CrossRef]
  22. Hopcroft, J.E.; Ullman, J.D. Introduction to Automata Theory, Languages, and Computation; Narosa Publishing Hourse: New Delhi, India, 2002. [Google Scholar]
  23. Cantor, G. On a property of the class of all real algebraic numbers. Crelle’s J. Math. 1874, 77, 258–262. [Google Scholar]
Figure 1. A 3-layer feedforward artificial neural network. Layer 1 includes the input neurons n 1 1 and n 2 1 . Layer 2 includes the neurons n 1 2 , n 2 2 , n 3 2 . Layer 3 includes the neurons n 1 3 , n 2 3 . The two arrows incoming into n 1 1 and n 2 1 signify that layer 1 is the input layer. The two arrows going out of n 1 3 and n 2 3 signify that layer 3 is the output layer. The weight of the synapse from n i e to n j e + 1 is w i , j e , 1 e 3 . E.g., w 1 , 1 1 is the weight of the synapse from n 1 1 to n 1 2 and w 3 , 1 2 is the weight of the synapse from n 3 2 to n 1 3 .
Figure 1. A 3-layer feedforward artificial neural network. Layer 1 includes the input neurons n 1 1 and n 2 1 . Layer 2 includes the neurons n 1 2 , n 2 2 , n 3 2 . Layer 3 includes the neurons n 1 3 , n 2 3 . The two arrows incoming into n 1 1 and n 2 1 signify that layer 1 is the input layer. The two arrows going out of n 1 3 and n 2 3 signify that layer 3 is the output layer. The weight of the synapse from n i e to n j e + 1 is w i , j e , 1 e 3 . E.g., w 1 , 1 1 is the weight of the synapse from n 1 1 to n 1 2 and w 3 , 1 2 is the weight of the synapse from n 3 2 to n 1 3 .
Mathematics 11 04309 g001
Figure 2. A chain network N l that consists of two networks N j (top) and N k (second from the top). The two bottom networks are functionally identical pictogrammatic renderings of the same network N l . In the third network from the top the output y of N j is made explicit. In the bottom rendering of N l , y is implicit in the arrow from N j to N k . In sum, the output of N j is given to N k , and the output of N k is the output of N l . Thus, N l maps x to z.
Figure 2. A chain network N l that consists of two networks N j (top) and N k (second from the top). The two bottom networks are functionally identical pictogrammatic renderings of the same network N l . In the third network from the top the output y of N j is made explicit. In the bottom rendering of N l , y is implicit in the arrow from N j to N k . In sum, the output of N j is given to N k , and the output of N k is the output of N l . Thus, N l maps x to z.
Mathematics 11 04309 g002
Figure 3. A network N z that includes networks N z 1 , , N z l that take x z 1 , , x z k as inputs and give their outputs to network N j (cf. Equation (22)). Thus, N z maps x z 1 , , x z k to s.
Figure 3. A network N z that includes networks N z 1 , , N z l that take x z 1 , , x z k as inputs and give their outputs to network N j (cf. Equation (22)). Thus, N z maps x z 1 , , x z k to s.
Mathematics 11 04309 g003
Figure 4. Networks N h ˜ ( 1 ) , N h ˜ ( 2 ) , N h ˜ ( 3 ) constructed with Schema (24) in Lemma 6. Note that 0 and k denote N n ( · ) ( 0 ) and N J k ( 0 ) , respectively.
Figure 4. Networks N h ˜ ( 1 ) , N h ˜ ( 2 ) , N h ˜ ( 3 ) constructed with Schema (24) in Lemma 6. Note that 0 and k denote N n ( · ) ( 0 ) and N J k ( 0 ) , respectively.
Mathematics 11 04309 g004
Figure 5. Network N h ( · ) ( t ) , 0 t 3 , constructed with Equation (25) in Lemma 6. Since h ( 0 ) = N h ( · ) ( 0 ) , h ( 1 ) = N h ( · ) ( 1 ) , h ( 2 ) = N h ( · ) ( 2 ) , h ( 3 ) = N h ( · ) ( 3 ) , the first four terms of the sequence { h ( t ) } are the same as the terms of the 4-tuple ( N h ( · ) ( 0 ) , N h ( · ) ( 1 ) , N h ( · ) ( 2 ) , N h ( · ) ( 3 ) ) .
Figure 5. Network N h ( · ) ( t ) , 0 t 3 , constructed with Equation (25) in Lemma 6. Since h ( 0 ) = N h ( · ) ( 0 ) , h ( 1 ) = N h ( · ) ( 1 ) , h ( 2 ) = N h ( · ) ( 2 ) , h ( 3 ) = N h ( · ) ( 3 ) , the first four terms of the sequence { h ( t ) } are the same as the terms of the 4-tuple ( N h ( · ) ( 0 ) , N h ( · ) ( 1 ) , N h ( · ) ( 2 ) , N h ( · ) ( 3 ) ) .
Mathematics 11 04309 g005
Figure 6. Network N h ˜ ( x , 3 ) , constructed with Schema (26) in Lemma 7. Note that 0 denotes N n ( · ) ( 0 ) , and N i d is the identity network.
Figure 6. Network N h ˜ ( x , 3 ) , constructed with Schema (26) in Lemma 7. Note that 0 denotes N n ( · ) ( 0 ) , and N i d is the identity network.
Mathematics 11 04309 g006
Figure 7. Network N h ( · ) ( x , t ) , 0 t 3 , constructed with Equation (25) in Lemma 7. Since h ( x , 0 ) = N h ( · ) ( x , 0 ) , h ( x , 1 ) = N h ( · ) ( x , 1 ) , h ( x , 2 ) = N h ( · ) ( x , 2 ) , h ( x , 3 ) = N h ( · ) ( x , 3 ) , the first four terms of the sequence { h ( x , t ) } , i.e., h ( x , 0 ) , h ( x , 1 ) , h ( x , 2 ) , h ( x , 3 ) , are the same as the terms of the tuple ( N h ( · ) ( x , 0 ) , N h ( · ) ( x , 1 ) , N h ( · ) ( x , 2 ) , N h ( · ) ( x , 3 ) ) .
Figure 7. Network N h ( · ) ( x , t ) , 0 t 3 , constructed with Equation (25) in Lemma 7. Since h ( x , 0 ) = N h ( · ) ( x , 0 ) , h ( x , 1 ) = N h ( · ) ( x , 1 ) , h ( x , 2 ) = N h ( · ) ( x , 2 ) , h ( x , 3 ) = N h ( · ) ( x , 3 ) , the first four terms of the sequence { h ( x , t ) } , i.e., h ( x , 0 ) , h ( x , 1 ) , h ( x , 2 ) , h ( x , 3 ) , are the same as the terms of the tuple ( N h ( · ) ( x , 0 ) , N h ( · ) ( x , 1 ) , N h ( · ) ( x , 2 ) , N h ( · ) ( x , 3 ) ) .
Mathematics 11 04309 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kulyukin, V.A. On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks. Mathematics 2023, 11, 4309. https://doi.org/10.3390/math11204309

AMA Style

Kulyukin VA. On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks. Mathematics. 2023; 11(20):4309. https://doi.org/10.3390/math11204309

Chicago/Turabian Style

Kulyukin, Vladimir A. 2023. "On the Computability of Primitive Recursive Functions by Feedforward Artificial Neural Networks" Mathematics 11, no. 20: 4309. https://doi.org/10.3390/math11204309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop