Next Article in Journal
Distributed Finite-Time Coverage Control of Multi-Quadrotor Systems with Switching Topology
Previous Article in Journal
Optimal Robust Tracking Control of Injection Velocity in an Injection Molding Machine
Previous Article in Special Issue
Thermodynamic Interpretation of a Machine-Learning-Based Response Surface Model and Its Application to Pharmacodynamic Synergy between Propofol and Opioids
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions

by
Vladimir A. Kulyukin
Department of Computer Science, Utah State University, Logan, UT 84322, USA
Mathematics 2023, 11(12), 2620; https://doi.org/10.3390/math11122620
Submission received: 6 May 2023 / Revised: 26 May 2023 / Accepted: 5 June 2023 / Published: 8 June 2023
(This article belongs to the Special Issue Theory of Algorithms and Recursion Theory)

Abstract

:
When realized on computational devices with finite quantities of memory, feedforward artificial neural networks and the functions they compute cease being abstract mathematical objects and turn into executable programs generating concrete computations. To differentiate between feedforward artificial neural networks and their functions as abstract mathematical objects and the realizations of these networks and functions on finite memory devices, we introduce the categories of general and actual computabilities and show that there exist correspondences, i.e., bijections, between functions computable by trained feedforward artificial neural networks on finite memory automata and classes of primitive recursive functions.

1. Introduction

An offspring of McCollough and Pitts’ research on foundations of cybernetics [1], artificial neural networks (ANNs) entered mainstream machine learning after the discovery of backpropagation by Rumelhart, Hinton, and Williams [2]. ANNs proved to be universal approximators of different classes of functions when no limits are imposed on the number of artificial neurons in any layer (arbitrary width) or on the number of hidden layers (arbitrary depth) and even with bounded widths and depths (e.g., [3,4,5]). ANNs cease being abstract mathematical objects when implemented in specific programming languages on computational devices with finite quantities of internal and external memory, to which we interchangeably refer in our article as finite memory devices (FMDs) and finite memory automata (FMA). To differentiate between functions computable by ANNs in principle and functions computable by ANNs realized on FMA, we introduce the categories of general and actual computabilities and show that there exist correspondences, i.e., bijections, between functions computable by trained feedforward ANNs (FANNs) on FMA and classes of primitive recursive functions.
Our article is organized as follows. In Section 2, we expound the terms, definitions, and notational conventions for functions and predicates espoused in this article and define the term finite memory automaton. In Section 3, we explicate the categories of general and actual computabilities and elucidate their similarities and differences. In Section 4, we formalize FANNs in terms of recursively defined functions. In Section 5, we present primitive recursive techniques to pack finite sets and Cartesian powers thereof into Gödel numbers. In Section 6, we use the set packing techniques of Section 5 to show that functions computable by trained FANNs implemented on FMA can be archived into natural numbers. In Section 7, we show how such archives can be used to define primitive recursive functions corresponding to functions computable by FANNs. In Section 8, we discuss theoretical and practical reasons for separating computability into the general and actual categories and pursue some implications of the theorems proved in Section 7. In Section 9, we summarize our conclusions. For the reader’s convenience, Appendix A gives supplementary definitions, results, and examples that are referenced in the main text when relevant.

2. Preliminaries

2.1. Functions and Predicates

If f is a function, d o m ( f ) and c o d o m ( f ) denote the domain and the co-domain of f, respectively. Statements such as f : S R abbreviate the logical conjunction d o m ( f ) = S c o d o m ( f ) = R . A function f is partial on a set S if d o m ( f ) is a proper subset of S, i.e., d o m ( f ) S . Thus, if S = N = { 0 , 1 , 2 , } and f ( x ) = x 1 / 3 , then f is partial on S, because d o m ( f ) = { i 3 | i N } N . If S and R are sets, then S = R is logically equivalent to the logical conjunction S R R S , i.e., S is a subset of R, and vice versa. If f is partial on S and z S , the following statements are equivalent: (1) z d o m ( f ) ; (2) f is defined on z; (3)  f ( z ) is defined; and (4) f ( z ) . The following statements are also equivalent: (1)  z d o m ( f ) ; (2) f is undefined on z; (3) f ( z ) is undefined; and (4) f ( z ) . If f is partial on S and d o m ( f ) = S , then f is total on S. Thus, f ( x ) = x + 1 is total on N . When f : S R is a bijection, i.e., f is injective (one-to-one) and surjective (onto), f is a correspondence between S and R.
If S is a set, then | S | is the cardinality of S, i.e., the number of elements in S. S is finite if and only if (iff) | S | N . For n > 0 , S n is the n-th Cartesian power of S, i.e., S n = { ( s 0 , , s n 1 ) | s i S , 0 i n 1 }   = { ( s 1 , , s n ) | s i S , 1 i n } Thus, if f : R 2 N , d o m ( f ) = { ( x 1 , x 2 ) | x 1 , x 2 R } . The symbol x is a sequence of numbers, i.e., a vector, from a set S, i.e., x = ( x 0 , x 1 , , x n 1 ) = ( x 1 , x 2 , , x n ) S n ; ( ) is the empty sequence. If x S n , its individual elements are x 0 = x 0 , x 1 = x 1 , , x n 1 = x n 1 or, equivalently, x 1 = x 1 , x 2 = x 2 , , x n = x n . If d o m ( f ) S n and x S n , f ( x ) = f ( ( x 0 , , x n 1 ) ) = f ( x 0 , , x n 1 ) = f ( x 1 , , x n ) . If f : d o m ( f ) c o d o m ( f ) is a bijection, the inverse of f is f 1 : c o d o m ( f ) d o m ( f ) . When the arguments of f are evident, f or f ( · ) abbreviate f ( x ) , f ( x 0 , , x n 1 ) , or f ( x 1 , , x n )
A total function P : S n { 0 , 1 } is a predicate if, for any x S n , P ( x ) = 1 or P ( x ) = 0 , where 1 arbitrarily designates the logical truth and 0 designates a logical falsehood. The symbols ¬, ∧, ∨, →, respectively, refer to logical not, logical and, logical or, and logical implication. We abbreviate P ( x ) = 1 to P ( x ) and P ( x ) = 0 to ¬ P ( x ) . If P and Q are predicates, then ¬ P Q is logically equivalent to P Q , i.e., ¬ P Q P Q . For clarity, sub-predicates of compound predicates may be included in matching pairs of { } . Thus, if a compound predicate P consists of predicates P 1 , P 2 , P 3 , and P 4 , it can be defined as P { { P 1 P 2 } { P 3 P 4 } } . The symbols ∃ and ∀ refer to the logical existential (there exists) and universal (for all) quantifiers, respectively. Thus, the statement ( x S n ) P ( x ) is logically equivalent to the statement that P ( x ) holds for at least one x in d o m ( P ) , while the statement ( x S n ) P ( x ) is logically equivalent to the statement that P ( x ) holds for all x in d o m ( P ) .

2.2. Finite Memory Automata

A finite memory device  D j is a physical or abstract automaton with a finite quantity of internal and external memory and an automated capability of executing programs, i.e., finite sequences of instructions written in a formalism, e.g., a programming language for D j , and stored in the finite memory of D j . Since bijections exist between expressions over any finite alphabet, i.e., a finite set of symbols or signs, and subsets of N  [6], we call the memory of D j  numerical memory. The numerical memory consists of registers, each of which is a sequence of numerical unit cells, e.g., digital array cells, mechanical switches, and finite state machine tape cells. The quantity of numerical memory is the product of the number of registers and the number of unit cells in each register, i.e., this quantity is a natural number.
A cell holds exactly one elementary sign from a finite alphabet, e.g., { “.”, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9” }, or is empty. The sign of the empty cell is unique and is not an elementary sign. A number sign is a sequence of elementary signs in consecutive cells of a register with no empty cells to the left of the first elementary sign and possibly some empty cells to the right of the rightmost elementary sign. Thus, if “|” is the empty sign on D j , the alphabet is { “.”, “-”, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9” }, and each register on D j has seven cells, then “3.1||||”, “3.14|||”, “3.141||”, “3.1415|”, and “3.14159” are number signs conventionally interpreted as the real numbers 3.1, 3.14, 3.141, 3.1415, and 3.14159, respectively. An arbitrary number sign interpretation is fixed a priori for a given alphabet and D j and does not change from sign to sign. Thus, if the alphabet is { “,”, “0”, “f0”, “ff0”, “fff0”, “ffff0”, “fffff0”, “ffffff0”, “fffffff0”, “ffffffff0”, “fffffffff0” } and the interpretation is such that “,” is interpreted as the decimal point, “0” as 0, “f0” as 1, “ff0” as 2, “fff0” as 3, etc., “*” is the empty sign, and each sign is read left to right, then, if each register on D j has twenty three cells, the sign “f0,ffff0f0ffff0ff0f0***” is interpreted as 1.41421.
A real number x is signifiable on D j iff a register on D j can hold its sign. Put another way, a number is signifiable on D j if, in a programming language L for D j , the number’s sign can be assigned to a variable, i.e., stored in a designated register. When x is signifiable on D j , we say that x is simply signifiable. A set or a sequence of numbers is signifiable if each number in the set or sequence is signifiable.
Δ j > 0 is the smallest positive signifiable real number on D j iff for any signifiable x, there is no signifiable y such that x < y < x + Δ j . The finite set of real numbers in the closed interval between 0 and 1 signifiable on D j is
R 0 , 1 j { x R | x = i Δ j < 1 } { 1 } , i N .
We note, in passing, a notational convention in Equation (1) to which we adhere in our article: if D j is an FMA, then the Latin letter j in subscripts or superscripts of symbols is used to emphasize that they are defined with respect to  D j . Thus, if D j and D k are two FMDs with different quantities of numerical memory, Δ j Δ k .
Lemma 1.
If z = i Δ j is a maximal element of { x R | x = i Δ j < 1 } and y = ( i + 1 ) Δ j , then y 1 .
Proof. 
If y R 0 , 1 j , then y = 1 , because 1 is the only number in R 0.1 j greater than z. If y R 0 , 1 j , then y > 1 and z < 1 < y .    □
A corollary of Lemma 1 is that if a , b are signifiable, a < b , then
R a , b j { x R | x = a + i Δ j < b } { b } , i N ,
is the finite set of signifiable numbers in the closed interval from a to b such that there exists no signifiable number between any two consecutive members of R a , b j when the latter is sorted in non-descending order.
Lemma 2.
If a , b are signifiable and b a Δ j , there exists a bijection ψ a , b j : R a , b j Z a , b j = { 0 , , z } N , z > 0 , where a + z Δ j b . If a + z Δ j is signifiable, it is the smallest signifiable number b .
Proof. 
Let
ψ a , b j ( x ) = k if x = a + k Δ j < b , z if { { x = a + z Δ j = b } { a + ( z 1 ) Δ j < x = b < a + z Δ j } } .
Let r Z a , b j . If r = z , then ψ a , b j ( x ) = r , for x = a + z Δ j = b or a + ( z 1 ) Δ j < x = b < a + z Δ j . If r < z , then ψ a , b j ( x ) = r , for x = a + r Δ j . Let ψ a , b j ( x ) = ψ a , b j ( y ) = r . If r = z , then x = a + r Δ j = y = b or a + ( r 1 ) Δ j < x = b = y < a + r Δ j . If r < z , then x = a + r Δ j = y . Let a + z Δ j be signifiable. If a + z Δ j = b , it is vacuously the smallest signifiable number b . If a + z Δ j > b , then, since a + ( z 1 ) Δ j < b < a + z Δ j , the assertion that 0 < b ( a + ( z 1 ) Δ j ) < Δ j or 0 < a + z Δ j b < Δ j leads to a contradiction.    □
A corollary of Lemma 2 is that ψ a , b j 1 : Z a , b j R a , b j is
ψ a , b j 1 ( k ) = x if x = a + k Δ j < b , b if { { b = a + k Δ j } { a + ( k 1 ) Δ j < b < a + k Δ j } } .
Lemmas 1 and 2 draw on the empirically verifiable fact manifested by division underflow errors in modern programming languages: given an FMD D j and two signifiable real numbers a and b, with a < b , the set of signifiable real numbers in the closed interval between a and b is a proper finite subset of the set of real numbers R . Thus, bijections are possible between R a , b j and finite subsets of N . While these bijections may differ from FMA to FMA in that they depend on the exact quantity of memory on a given FMA, they differ only in terms of the cardinalities of their domains and co-domains: the larger the quantity of memory, the greater the cardinality. A constructive interpretation of Lemmas 1 and 2 is that if we take two signifiable real numbers a and b such that b a Δ j , we can effectively enumerate the elements of Z a , b j by iteratively adding increasing integer multiples of Δ j to a until we reach b, i.e., a + z Δ j = b , or go slightly above it, i.e., a + ( z 1 ) Δ j < b < a + z Δ j , for z > 0 .
To map the elements of R a , b j to N + = { 1 , 2 , 3 , } , we define the bijection μ a , b j ( x ) : R a , b j I a , b j = z + 1 | z Z a , b j and its inverse μ a , b j ( x ) 1 : I a , b j R a , b j as
μ a , b j ( x ) = ψ a , b j ( x ) + 1 ; μ a , b j 1 ( k ) = ψ a , b j 1 ( k 1 ) , k > 0 .
If we abbreviate μ 0 , 1 j , μ 0 , 1 j 1 , ψ 0 , 1 j , ψ 0 , 1 j 1 , R 0 , 1 j , and Z 0 , 1 j , I 0 , 1 j to μ , μ 1 , ψ , ψ 1 , R, Z, and I, respectively, and let Δ j = 0.2 , we have the following example.
Example 1.
R = { 0 , 0.2 , 0.4 , 0.6 , 0.8 , 1 } ; Z = { 0 , 1 , 2 , 3 , 4 , 5 } ; I = { 1 , 2 , 3 , 4 , 5 , 6 } ; ψ ( 0 ) = 0 , ψ ( 0.2 ) = 1 , ψ ( 0.4 ) = 2 , ψ ( 0.6 ) = 3 , ψ ( 0.8 ) = 4 , ψ ( 1 ) = 5 ; ψ 1 ( 0 ) = 0 , ψ 1 ( 1 ) = 0.2 , ψ 1 ( 2 ) = 0.4 , ψ 1 ( 3 ) = 0.6 , ψ 1 ( 4 ) = 0.8 , ψ 1 ( 5 ) = 1 ; μ ( 0 ) = 1 , μ ( 0.2 ) = 2 , μ ( 0.4 ) = 3 , μ ( 0.6 ) = 4 , μ ( 0.8 ) = 5 , μ ( 1 ) = 6 ; μ 1 ( 1 ) = 0 , μ 1 ( 2 ) = 0.2 , μ 1 ( 3 ) = 0.4 , μ 1 ( 4 ) = 0.6 , μ 1 ( 5 ) = 0.8 , μ 1 ( 6 ) = 1 .
For Δ j = 0.3 , we have another example.
Example 2.
R = { 0 , 0.3 , 0.6 , 0.9 , 1 } ; Z = { 0 , 1 , 2 , 3 , 4 } ; I = { 1 , 2 , 3 , 4 , 5 } ; ψ ( 0 ) = 0 , ψ ( 0.3 ) = 1 , ψ ( 0.6 ) = 2 , ψ ( 0.9 ) = 3 , ψ ( 1 ) = 4 ; ψ 1 ( 0 ) = 0 , ψ 1 ( 1 ) = 0.3 , ψ 1 ( 2 ) = 0.6 , ψ 1 ( 3 ) = 0.9 , ψ 1 ( 4 ) = 1 ; μ ( 0 ) = 1 , μ ( 0.3 ) = 2 , μ ( 0.6 ) = 3 , μ ( 0.9 ) = 4 , μ ( 1 ) = 5 ; μ 1 ( 1 ) = 0 , μ 1 ( 2 ) = 0.3 , μ 1 ( 3 ) = 0.6 , μ 1 ( 4 ) = 0.9 , μ 1 ( 5 ) = 1 .

3. Computability: General vs. Actual

Computability theory lacks a uniform, commonly accepted formalism for computable, partially computable, and primitive recursive functions. The treatment of such functions in our article is based, in part, on the formalism by Davis, Sigal, and Weyuker (Chapters 2 and 3 in [7]), which has, in turn, much in common with Kleene’s formalism (Chapter 9 in  [8]). Alternative treatments include [9], where primitive recursive functions are formalized as loop programs consisting of assignment and iteration statements similar to DO statements in FORTRAN, and [10], where λ -calculus is used. These symbolically different treatments have one feature in common: computable, partially computable, and primitive recursive functions operate on natural numbers and the underlying automata, explicit or implicit, on which these functions can, in principle, be executed if implemented as programs in some formalism, have access to infinite numerical memory. To distinguish computability in principle from computability on finite memory automata, we introduce the categories of general and actual computabilities.

3.1. General Computability

As our formalism in this section, we use the programming language L developed in Chapter 2 in [7] and subsequently used in that book to define partially computable, computable, and primitive recursive functions and to prove various properties thereof. An L program P is a finite sequence of L instructions. The unique variable Y is designated as the output variable where the output of P on a given input is stored. X 1 , X 2 , designate input variables, and Z 1 , Z 2 , refer to internal variables, i.e., variables in P that are not input variables. No bounds are imposed on the magnitude of natural numbers assigned to variables. L has conditional dispatch instructions; line labels; elementary arithmetic operations on and comparisons of natural numbers; and macros, i.e., statements expandable into primitive L instructions.
A computation of P on some input x N m , m > 0 , is a finite sequence of snapshots  ( s 1 , , s k ) , where each snapshot s 1 i k , k > 0 , specifies the number of the instruction in P to be executed and the value of each variable in P . The snapshot s 1 is the initial snapshot, where the values of all input variables are set to their initial values, the program instruction counter is set to 1, i.e., the number of the first instruction in P , and the values of all the other variables in P are set to 0. The snapshot s k in ( s 1 , , s k ) is a terminal snapshot, where the instruction counter is set to the number of the instructions in P plus 1. Not all snapshot sequences are computations. If ( s 1 , s 2 , , s k ) is a computation of P on x N m , i.e., X 1 = x 1 , X 2 = x 2 , , X m = x m , then there is a function that, given the text of P and a snapshot s 1 i < k in the computation, generates the next snapshot s i + 1 of the computation. This function can verify if ( s 1 , , s k ) constitutes the computation of P on x . The existence of such functions implies that each instruction in L is interpreted unambiguously. If some program P in L takes m inputs and the values of the input variables are X 1 = x 1 , X 2 = x 2 , , X m = x m , then
Ψ P ( m ) ( x 1 , x 2 , , x m ) = Y in s k if a computation ( s 1 , , s k ) , k 1 , o t h e r w i s e
denotes the value of Y in the terminal snapshot s k if there exists a computation ( s 1 , , s k ) of P on ( x 1 , x 2 , , x m ) and is undefined otherwise.
Definition 1.
A function f : N m N , m N + , is partially computable if f is partial and there is an L program P such that Equation (7) holds.
( x N m ) f ( x ) = Ψ P ( m ) ( x )
Equation (7) is interpreted so that f ( x ) iff Ψ P ( m ) ( x ) and f ( x ) iff Ψ P ( m ) ( x ) .
Definition 2.
A function f : N m N , 0 < m N , is computable if it is total, i.e., ( x N m ) f ( x ) , and partially computable.
Let f : N k N and g i : N n N , 1 i k , n N + . Then, h : N n N is obtained by composition from f , g 1 , , g k if
h ( x 1 , , x n ) = f ( g 1 ( x 1 , , x n ) , , g k ( x 1 , , x n ) ) .
Let k N , n N + , and ϕ : N 2 N , f : N n N , g : N n + 2 N be total. If h is obtained from ϕ by the recurrences in (9) or from f and g by the recurrences in (10), then h is obtained from ϕ or from f and g by primitive recursion or simply by recursion. The recurrences in (10) are isomorphic to Gödel’s recurrences (Section 2, Equation (2) in [6]) where he introduces the concept of recursively defined number-theoretic function. The three functions in (11) are the initial functions.
h ( 0 ) = k , h ( t + 1 ) = ϕ ( t , h ( t ) )
h ( x 1 , , x n , 0 ) = f ( x 1 , , x n ) , h ( x 1 , , x n , t + 1 ) = g ( t , h ( x 1 , , x n , t ) , x 1 , , x n )
s ( x ) = x + 1 ; n ( x ) = 0 ; u i n ( x 1 , , x n ) = x i , 1 i n ,
Definition 3.
A function is primitive recursive if it can be obtained from the initial functions by a finite number of applications of composition and recursion in (8)–(10).
An implication of Definition 3 is that if f is a primitive recursive function, then there is a sequence of functions ( f 1 , , f n = f ) , n > 0 , where every function in the sequence is an initial function or is obtained from the previous functions in the sequence by composition or recursion.
A class C of total functions is primitive recursively closed (PRC) if the initial functions are in it and any function obtained from the functions in C by composition or recursion is also in C . It has been shown (Chapter 3 in [7]) that (1) the class of computable functions is PRC; (2) the class of primitive recursive functions is PRC; and (3) a function is primitive recursive iff it belongs to every PRC class. A corollary of (3) is that every primitive recursive function is computable.
If C includes all functions of a certain type, we refer to it as the class of those functions, e.g., the class of partially computable functions, the class of computable functions, the class of primitive recursive functions, etc. When we say that C is a class of functions of a certain type, we mean that C C , where C is the class of functions of that type.

3.2. Actual Computability

In general, the FMA defined in Section 2.2 is different from the finite state automata of classical computability theory, because the latter, e.g., a Turing machine (TM), do not impose any limitations on memory. A TM becomes an FMA iff the number of cells on its tape where it reads and writes symbols is finite. Analogously, a finite state automaton (FSA) of classical computability is an FMA iff there is a limit, expressed as a natural number, on the length of the input tape from which the FSA reads sign sequences over a given alphabet.
As is the case with general computability, we let P L j be a L program, i.e., a finite sequence of unambiguous instructions in a programming language L for an FMD D j . Thus, if D j is a physical computer with an operating system, e.g., Linux, a programming language for D j can be Lisp, C, Perl, Python, etc. If D j is an abstract FMA, e.g., a TM with a finite number of cells on its tape, then D j is programmed with the standard quadruple formalism (Chapter 6 in  [7]). If D j is a mechanical device, then we assume that there is a formalism that consists of instructions such as “set switch i to position p”, “turn handle full circle clockwise t times”, etc. A state of D j while executing P L j on some input x includes the number of the instruction in P L j to execute next and, depending on D j , may include the contents of each register, the signs on the finite input tape, or the state of each mechanical switch. As we did with general computability, we call such a state a snapshot of D j for P L j ( x ) and define a computation of P L j ( x ) on D j to be a finite sequence of snapshots ( s 1 , , s k ) , k 1 , where each subsequent snapshot is computed from the previous snapshot, the initial snapshot s 1 has the values of all the variables in P L j appropriately specified and the instruction counter of P L j set to 1, and the terminal snapshot s k has the instruction counter set to the number of the instructions in P L j plus 1. We let
Ψ P L j ( n ) ( x )
denote the number sign corresponding to the output of P L j ( x ) executed on D j . It is irrelevant to our discussion where this number sign is stored (e.g., in a register, a section of a finite tape, or the sequence of the positions of the mechanical switches examined left to right or right to left, etc.) so long as it is understood that the output, whenever there is a computation, is unambiguously interpreted as a real number according to an interpretation fixed a priori.
Definition 4.
A partial function f : R m R , m N + , is actually partially computable on D j if Equation (12) holds.
( x R m ) f ( x ) = Ψ P L j ( m ) ( x ) .
Equation (12) of actual computability is interpreted so that f ( x ) iff Ψ P ( m ) ( x ) , i.e., f ( x ) = z iff Ψ P ( m ) ( x ) = z , for any x R m and z R signifiable on D j , and f ( x ) iff Ψ P ( m ) ( x ) . However, unlike Equation (7) of general computability, which is defined only on natural numbers and every natural number is signifiable by implication, in actual computability, we have to make provisions for non-signifiable real numbers. Toward that end, we introduce the following inequality, which holds when a non-signifiable number is encountered during a computation of P L j ( x ) .
( x R m ) f ( x ) Ψ P L j ( m ) ( x ) .
Inequality (13) can be illustrated with two examples. Let D j have two cells per register, let f : N 2 N be f ( x 1 , x 2 ) = x 1 + x 2 , and let P L j ( x 1 , x 2 ) be a program that implements f, i.e., adds two number signs of x 1 and x 2 and puts the number sign of x 1 + x 2 in a designated output register. Let number signs be interpreted in standard decimal notation. Furthermore, if some number x is not signifiable on D j , only the first two elementary signs of the number sign of x are placed into a register, i.e., number signs are truncated to fit into registers, as is common in many programming languages. Then, after “100” is truncated to “10”,
f ( 99 , 1 ) = 100 Ψ P L j ( 2 ) ( 99 , 1 ) = 10 ,
and
f ( 213 , 13 ) = 226 Ψ P L j ( 2 ) ( 213 , 13 ) = 34 ,
because 213 is not signifiable on D j and is truncated to “21.” In both cases, f ( x 1 , x 2 ) , as a mathematical object, is total, and there is a computation of P L j ( x 1 , x 2 ) on x 1 = 99 , x 2 = 1 and x 1 = 213 , x 2 = 13 , but during both computations, non-signifiable numbers, i.e., 100 and 213, are encountered.
Definition 5.
A function f : R m R , m N + , is actually computable on D j if it is total, i.e., ( x R m ) f ( x ) , and actually partially computable.
A program P L j that implements an actually computable f ( x ) is guaranteed to have a computation for any signifiable x . However, Inequality (13) may still hold if a non-signifiable number is produced during a computation. Functions can be defined for a specific D j so that they deal only with signifiable numbers, e.g., whose domains and codomains are, respectively, finite signifiable proper subsets of R m and R . The next definition characterizes these functions.
Definition 6.
A function f : R m R , m N + , is absolutely actually computable on D j if it is actually computable and Inequality (13) holds for no computation of P L j ( x ) , where x is signifiable on D j .
An implication of Definitions 4–6 is that if f : N m N satisfies Definition 4, it is partially computable according to Definition 1, and if it satisfies Definitions 5 or 6, it is computable according to Definition 2, because, if no memory limitations are placed on registers, every natural number is signifiable.
We call an FMD D j  sufficiently significant if three conditions are satisfied. First, a programming language L for D j exists with the same control structures as the programming language L described in Section 3.1 such that L (1) is capable of signifying a finite subset of R and (2) capable of specifying the following operations on numbers: addition, subtraction, multiplication, division, assignment, i.e., setting the value of a register to a number sign, comparison, i.e., a = b , a < b , a > b , a b , a b , on any signifiable a and b, and the truncation of the signs of non-signifiable numbers to fit them into registers. Second, the finite memory of D j suffices to hold L programs of length N N + , where the length of the program is the number of instructions in it. Third, the finite memory of D j suffices, in addition to holding a program of at most N instructions, to hold number signs in K N + registers.
Lemma 3.
Let an FMA D j be sufficiently significant with K 7 , a , b signifiable, b a Δ j , and let a + z Δ j , z > 0 , be the smallest signifiable number greater than or equal to b. Let μ a , b j : R a , b j I a , b j be the bijection in (5). Let P L j ( x ) , x R a , b j , be a program for D j that iterates from a to a + z Δ j b in positive unit integer increments of Δ j until k or z that satisfies the conditions in (3) is encountered, and the length of P L j N . Then, μ a , b j is absolutely actually computable.
Proof. 
Since a , b , and a + z Δ j are signifiable, so are d o m ( μ a , b j ) and c o d o m ( μ a , b j ) . The finite memory of D j suffices to hold P L j , and P L j needs access to five signifiable numbers to iterate over d o m ( μ a , b j ) : a, b, i, Δ j , a + i Δ j . Since K 7 , the signs of these numbers are placed in registers ρ 1 , ρ 2 , ρ 3 , ρ 4 , and ρ 5 . After x d o m ( μ a , b j ) is placed in register ρ 6 , P L j sets ρ 3 to 0. If x < b , P L j goes into a while loop with the condition of ρ 5 < ρ 2 , i.e., a + i Δ j < b . Inside the loop, when ρ 5 = ρ 6 , ρ 3 is incremented by 1 and placed into the output register ρ 7 , and P L j exits. Otherwise, the loop continues with ρ 3 incremented by 1. If x = b , P L j goes into a while loop with the condition of ρ 5 ρ 2 , i.e., a + i Δ j b , and keeps incrementing ρ 3 by 1 inside the loop. After the loop terminates, ρ 3 is incremented by 1 and placed into the output register ρ 7 , and P L j exits.    □
A corollary of Lemma 3 is that μ a , b j 1 is absolutely actually computable.

4. A Recursive Formalization of Feedforward Artificial Neural Networks

A trained feedforward artificial neural network (FANN) N z j implemented in a programming language L on a sufficiently significant FMA D j is a finite set of artificial neurons, each of which is connected to a finite number of the neurons in the same set through the synapses, i.e., directed weighted edges (See Figure 1). The neurons are organized into k + 1 layers E 0 , E 1 , , E k , with E 0 being the input layer; E k being the output layer; and E e , 0 < e < k , being the hidden layers. We let E z j denote the number of layers in N z j and n z , i j , e refer to the i-th neuron in layer E e in N z j . We abbreviate n z , i j , e to n i e , because n i e always refers to a unique neuron in N z j . The function n n z j ( e ) : N N + specifies the number of neurons in layer E e of N z j and is abbreviated n n ( e ) .
We assume that N z j is trained, i.e., the synapse weights are fixed automatically or manually, and fully connected, i.e., there is a synapse from every neuron in layer E e 1 to every neuron in layer E e . Each synapse has a weight, i.e., a signifiable real number, associated with it. We let w i , j e , 0 < e < E z j , denote the weight of the synapse from n i e 1 to n j e (see Figure 1) and w e refer to a vector of all synaptic weights between E e 1 and E e . We define w 0 = ( ) . Thus, for the FANN N z j in Figure 1, w 1 = w 0 , 0 1 , w 0 , 1 1 , w 0 , 2 1 , w 1 , 0 1 , w 1 , 1 1 , w 1 , 2 1 and w 2 = w 0 , 0 2 , w 0 , 1 2 , w 1 , 0 2 , w 1 , 1 2 , w 2 , 0 2 , w 2 , 1 2 . We assume, without loss of generality, that all numbers in w e are in R 0 , 1 j defined in (1), because, if that is not the case, they can be so scaled, nor is there any loss of generality associated with the assumption of full connectivity, because partial connectivity can be defined by setting the weights of the appropriate synapses to 0.
If R 0 , 1 j is abbreviated to R 0 , 1 , each n i e in N z j , e > 0 , computes an activation function
α i e a e 1 , w e : R 0 , 1 n n ( e 1 ) R 0 , 1 ,
where a e 1 is the vector of the activations, i.e., real signifiable numbers, of the neurons in layer E e 1 . For e = 0 ,
α i 0 ( x , ( ) ) = x i ,
where x R 0 , 1 n n ( 0 ) and x i R 0 , 1 , 0 i < n n ( 0 ) . Thus, if n n ( 0 ) = 3 , as in Figure 1, then, given the input x = ( x 0 , x 1 , x 2 ) = ( 0.0 , 0.3 , 0.6 ) , α 0 0 ( x , ( ) ) = x 0 = x 0 = 0.0 , α 1 0 ( x , ( ) ) = x 1 = x 1 = 0.3 , α 2 0 ( x , ( ) ) = x 2 = x 2 = 0.6 . Since N z j is implemented on a sufficiently significant D j , all activation functions α i e ( · ) are absolutely actually computable. It is irrelevant to our discussion whether the activation functions are the same, e.g., sigmoid, for all or some neurons, or each neuron has its own activation function.
The term feedforward means that the activations of the neurons are computed layer by layer from the input layer to the output layer, because the activation functions of the neurons in the next layer require only the weights of the synapses connecting the next layer with the previous one and the activation values, i.e., the outputs of the activation functions of the neurons in the previous layer. To define the activation vectors of individual layers, let
a 0 = α 0 0 x , ( ) , α n n ( 0 ) 1 0 x , ( ) , a e = α 0 e a e 1 , w e , , α n n ( e ) 1 e a e 1 , w e ,
where 0 < e < E z j and x is an input vector. For each N z j , we define the absolutely actually computable function that N z j computes as
f z j ( x , 0 ) = x , f z j ( x , e + 1 ) = α 0 e + 1 f z j x , e , w e + 1 , , α n n ( e + 1 ) 1 e + 1 f z j x , e , w e + 1 .
If e > E z j 1 , let f ( x , e ) = ( ) . The function f z j in (17) computes the feedforward activation of N z j layer by layer, i.e., f ( x , 0 ) = a 0 , f ( x , 1 ) = a 1 , , f ( x , E z j 1 ) = a E z j 1 . For example, if x = ( x 0 , x 1 ) R 0 , 1 2 is the input to N z j in Figure 1,
f z j ( x , 0 ) = a 0 = x ; f z j ( x , 1 ) = α 0 1 f z j x , 0 , w 1 , α 1 1 f z j x , 0 , w 1 , α 2 1 f z j x , 0 , w 1 = α 0 1 a 0 , w 1 , α 1 1 a 0 , w 1 , α 2 1 a 0 , w 1 = a 1 R 0 , 1 j 3 ; f z j ( x , 2 ) = α 0 2 f z j x , 1 , w 2 , α 1 2 f z j x , 1 , w 2 ) = α 0 2 a 1 , w 2 , α 1 2 a 1 , w 2 = a 2 R 0 , 1 j 2 .

5. Finite Sets as Gödel Numbers

Our primitive recursive techniques to pack finite sets and Cartesian powers thereof into Gödel numbers in this section rely, in part, on our previous work on primitive recursive characteristics of chess [11], which, in turn, was based on several functions shown to be primitive recursive in [7]. For the reader’s convenience, Appendix A.1 in Appendix A gives the functions shown to be primitive recursive in [7] and gives the necessary auxiliary definitions and theorems. Appendix A.2 in Appendix A gives the functions or variants thereof shown to be primitive recursive in [11]. When we use the functions from [7,11] in this section, we refer to their definitions in the above two sections of Appendix A as necessary.
Let G be a Gödel number (G-number) as defined in (A8). The primitive recursive predicate G P in (18) uses the bounded existential quantification of a primitive recursive predicate defined in (A2) and the primitive recursive functions ( x ) i and L t ( x ) , respectively, defined in (A9) and (A10).
G P ( G ) { L t ( G ) > 0 } { { L t ( G ) = 1 L t ( ( G ) 1 ) > 0 } { ( t ) L t ( G ) { { t > 1 } { { L t ( ( G ) t ) = L t ( ( G ) 1 ) } { L t ( ( G ) t ) > 0 } } } } }
The logical structure of G P is G P 1 { G P 2 G P 3 } , where G P 1 , G P 2 , and G P 3 are
G P 1 { L t ( G ) > 0 } ; G P 2 { L t ( G ) = 1 L t ( ( G ) 1 ) > 0 } ; G P 3 ( t ) L t ( G ) { { t > 1 } { { L t ( ( G ) t ) = L t ( ( G ) 1 ) } { L t ( ( G ) t ) > 0 } } } .
The predicate G P holds for G-numbers with at least one element and whose elements themselves have the same length, i.e., the same number of elements, greater than 0. Thus, G P ( [ [ 1 ] ] ) , G P ( [ [ 1 ] , [ 2 ] , [ 3 ] ] ) , and G P ( [ [ 1 , 2 ] , [ 3 , 4 ] , [ 5 , 6 ] ] ) , but ¬ G P ( [ [ 0 ] ] ) and ¬ G P ( [ [ 1 ] , [ 3 , 4 , 5 ] , [ 11 , 10 ] ] ) .
Let G be a G-number, the predicate g be as defined in  (A13), the function s ( t ) be as defined in (11), and the function x l y be as defined in (A15), and let
τ χ 0 ( G , 0 ) = 1 , τ χ 0 ( G , t + 1 ) = [ [ ( G ) s ( t ) ] ] l τ χ 0 ( G , t ) .
Then, the primitive recursive function
τ 0 ( G ) = τ χ 0 ( G , L t ( G ) ) if L t ( G ) > 0 0 g G , 0 otherwise
turns a G-number into another G-number whose elements are the elements of the original G-number G, each of which is placed into a G-number whose length is 1. Thus, τ 0 ( [ 11 , 13 ] ) = [ [ 11 ] , [ 13 ] ] . In general, if G = [ g 1 , , g n ] , L t ( G ) > 0 , 0 g G , i.e., g i 0 , for 1 i n , then τ 0 ( G ) = [ [ g 1 ] , , [ g n ] ] .
Let g N , G be a G-number, the function x r y be defined in (A16), and
τ χ 1 ( g , G , 0 ) = 1 , τ χ 1 ( g , G , t + 1 ) = [ [ g ] r [ ( G ) s ( t ) ] ] l τ χ 1 ( g , G , t ) .
Then, the primitive recursive function
τ 1 ( g , G ) = τ χ 1 ( g , G , L t ( G ) ) if g > 0 G P ( G ) , 0 otherwise
adds g to each element of G. Thus, τ 1 ( 1 , [ [ 2 ] , [ 3 ] ] ) = [ [ 1 , 2 ] , [ 1 , 3 ] ] and τ 1 ( 3 , [ [ 1 , 2 ] , [ 4 , 5 ] ] ) = [ [ 3 , 1 , 2 ] , [ 3 , 4 , 5 ] ] .
Let G 1 and G 2 be two G-numbers, and let
τ χ 2 ( G 1 , G 2 , 0 ) = 1 , τ χ 2 ( G 1 , G 2 , t + 1 ) = τ 1 ( ( G 1 ) s ( t ) , G 2 ) l τ χ 2 ( G 1 , G 2 , t ) .
Then, the primitive recursive function
τ 2 ( G 1 , G 2 ) = τ χ 2 ( G 1 , G 2 , L t ( G 1 ) ) if 0 g G 1 G P ( G 2 ) L t ( G 1 ) > 0 , 0 otherwise
adds each element of G 1 to each element of G 2 . Thus,
τ 2 ( [ 1 ] , [ [ 2 ] , [ 3 ] ] ) = [ [ 1 , 2 ] , [ 1 , 3 ] ] ; τ 2 ( [ 1 , 2 ] , [ [ 4 , 5 ] , [ 6 , 7 ] ] ) = [ [ 1 , 4 , 5 ] , [ 1 , 6 , 7 ] , [ 2 , 4 , 5 ] , [ 2 , 6 , 7 ] ] .
Let G be a G-number, and let
τ χ 3 ( G , 0 ) = τ 0 ( G ) , τ χ 3 ( G , t + 1 ) = τ 2 ( G , τ 3 ( G , t ) ) .
Then, the primitive recursive function
τ 3 ( G , t ) = τ χ 3 ( G , t ) if 0 g G L t ( G ) > 0 , 0 otherwise
computes, for t N + , a Gödel number whose components are Gödel numbers representing all sequences of t + 1 elements of G. Thus,
τ 3 ( [ 1 , 2 ] , 1 ) = [ [ 1 , 1 ] , [ 1 , 2 ] , [ 2 , 1 ] , [ 2 , 2 ] ] .
Let S = { a 1 , a 2 , , a n } N + , S , and G = [ a 1 , , a n ] . An induction on t shows that, for t > 0 , τ 3 ( G , t 1 ) is a G-number representation of S t in the sense that ( a i 1 , , a i t ) S t iff [ a i 1 , , a i t ] g τ 3 ( G , t 1 ) .
If D j is an FMA, we let
G a , b j = g g n 1 , | R a , b j | , 1 ,
where R a , b j is defined in (2) and g g n ( · ) is defined in (A17). If we recall from Lemma 2 and (5) that μ a , b j : R a , b j I a , b j = { 1 , , z + 1 } , where a + z Δ j is the smallest signifiable real number b on D j , we observe that G a , b j is a G-number representation of I a , b j . Thus, if we return to Example 2 and use the accessor function ( x ) i in (A9), then for G 0 , 1 j = [ 1 , 2 , 3 , 4 , 5 ] , we have
μ ( 0 ) = 1 = G 0 , 1 j 1 ; μ ( 0.3 ) = 2 = G 0 , 1 j 2 ; μ ( 0.6 ) = 3 = G 0 , 1 j 3 ; μ ( 0.9 ) = 4 = G 0 , 1 j 4 ; μ ( 1 ) = 5 = G 0 , 1 j 5 .
In general, for x R a , b j ,
μ a , b j x = t = G a , b j t I a , b j μ a , b j 1 G a , b j t = x .
Let, for t > 1 , τ 3 in (22), and x y in (A4),
G a , b t , j = τ 3 G a , b j , t ,
and, in particular, for a = 0 and b = 1 , let
G 0 , 1 t , j = τ 3 G 0 , 1 j , t .
Then, G 0 , 1 t , j is a G-number representation of I 0 , 1 j t , i.e., the t-th Cartesian power of I 0 , 1 j . Since both τ 3 and ∸ are primitive recursive functions, G a , b t , j N and G 0 , 1 t , j N are primitive recursively computable.
Example 3.
Let R = { 0 , 0.3 , 0.6 , 0.9 , 1 } , I = { 1 , 2 , 3 , 4 , 5 } and t = 2 . Then,
G 0 , 1 2 , j = τ 3 ( G 0 , 1 2 , 2 1 ) = τ 3 ( g g n ( 1 , | R | , 1 ) , 1 ) , 1 ) = τ 3 ( [ 1 , 2 , 3 , 4 , 5 ] , 1 ) = [ [ 1 , 1 ] , [ 1 , 2 ] , [ 1 , 3 ] , [ 1 , 4 ] , [ 1 , 5 ] , [ 2 , 1 ] , [ 2 , 2 ] , [ 2 , 3 ] , [ 2 , 4 ] , [ 2 , 5 ] , [ 3 , 1 ] , [ 3 , 2 ] , [ 3 , 3 ] , [ 3 , 4 ] , [ 3 , 5 ] , [ 4 , 1 ] , [ 4 , 2 ] , [ 4 , 3 ] , [ 4 , 4 ] , [ 4 , 5 ] , [ 5 , 1 ] , [ 5 , 2 ] , [ 5 , 3 ] , [ 5 , 4 ] , [ 5 , 5 ] ] .
We note that ( x , y ) I 2 iff [ x , y ] G 0 , 1 2 , j .
Let x R a , b j t , t > 0 , x ˜ g G a , b t , j , and let η a , b t , j : R a , b j t N and ζ a , b t , j : N R a , b j t be defined as
η a , b t , j x = μ a , b j ( x 0 ) , μ a , b j ( x t 1 ) = x ˜ ; ζ a , b t , j x ˜ = μ a , b j 1 x ˜ 1 , , μ a , b j 1 x ˜ t .
If R a , b j is signifiable, η a , b t , j ( x ) = x ˜ iff ζ a , b t , j ( x ˜ ) = x , for any x R a , b j t . If x ˜ is not signifiable, η a , b t , j and ζ a , b t , j are actually computable; if x ˜ is signifiable, the functions are absolutely actually computable.
Example 4.
To continue with Example 3, if x = ( 0.9 , 0.6 ) R 0 , 1 2 and x ˜ = [ 4 , 3 ] G 0 , 1 2 , j , then, if we abbreviate η 0 , 1 2 , ζ 0 , 1 2 to η 2 , ζ 2 , we have
η 2 ( x ) = ( μ ( 0.9 ) , μ ( 0.6 ) ) = [ 4 , 3 ] ; ζ 2 ( x ˜ ) = ( μ 1 ( ( x ˜ ) 1 ) , μ 1 ( ( x ˜ 2 ) ) ) = ( μ 1 ( 4 ) , μ 1 ( 3 ) ) = ( 0.9 , 0.6 ) .

6. Numbers Ω z , i j , e and Ω z j : Packing FANNs into Natural Numbers

Let us assume that μ 0 , 1 j is absolutely actually computable on a sufficiently significant FMA D j and abbreviate μ 0 , 1 j to μ , ζ 0 , 1 t , j to ζ t , and G 0 , 1 t , j in (25) to G t . Let x , y be as defined in (A5) and L t ( x ) be as defined in (A10). Then, for each input neuron n i 0 in an FANN N z j , let
Ω z , i j , 0 = Ω i 0 = [ G n n ( 0 ) 1 , μ α i 0 ζ n n ( 0 ) G n n ( 0 ) 1 , ( ) , , G n n ( 0 ) L t G n n ( 0 ) , μ α i 0 ζ n n ( 0 ) G n n ( 0 ) L t G n n ( 0 ) , ( ) ] .
We recall that E z j > 0 is the number of layers in N z j . Then, for a hidden or output neuron n i e , 0 < e < E z j , let
Ω z , i j , e = Ω i e = [ G n n ( e 1 ) 1 , μ α i e ζ n n ( e 1 ) G n n ( e 1 ) 1 , w e , , G n n ( e 1 ) L t G n n ( e 1 ) , μ α i e ζ n n ( e 1 ) G n n ( e 1 ) L t G n n ( e 1 ) , w e ] .
For an FANN N z j on D j and E = E z j 1 , let
Ω z j = 0 , Ω 0 0 , , Ω n n ( 0 ) 1 0 , , E , Ω 0 E , , Ω n n ( E ) 1 E .
An implication of the definitions of x , y in (A5) and the G-number in (A8) is that Ω z j is unique for N z j , because the only way for another FANN N k j on D j to have Ω k j = Ω z j is for N k j to have the same number of layers, the same number of neurons in each layer, the same activation function in each neuron, and the same synapse weights between the same neurons, i.e., N k j = N z j . Appendix A.3 in Appendix A gives several examples of how the Ω numbers are computed for N z j in Figure 1.
Lemma 4.
Let μ 0 , 1 j be absolutely actually computable on a sufficiently significant FMA D j and let N z j be an FANN implemented on D j . Let 0 i < n n ( 0 ) , 0 k < n n ( e ) , 0 < e < E z j , and G 0 , 1 t , j in (25) be signifiable on D j . Then, Ω z , i j , 0 = Ω i 0 N and Ω z , i j , e = Ω i e N .
Proof. 
We abbreviate μ 0 , 1 j to μ , ζ 0 , 1 t to ζ t , and G 0 , 1 t , j to G t , and let
z 0 = μ α i 0 ζ n n ( 0 ) G n n ( 0 ) t 0 , ( ) ; z e = μ α k e ζ n n ( e 1 ) G n n ( e 1 ) t e 1 , w e ,
where 0 < t 0 n n ( 0 ) and 0 < t e 1 n n ( e 1 ) . Since μ is absolutely actually computable and G t signifiable, ζ n n ( 0 ) , ζ n n ( 1 ) , , ζ n n ( e 1 ) are absolutely actually computable. Thus, z 0 , z e N . The statement of the lemma then follows from the definitions of x , y in (A5) and the G-number in (A8).    □

7. FANNs and Primitive Recursive Functions

For 0 e < E z j , 0 i < n n ( e ) , x N , let
α ˜ i e ( x ) = r a s c x , r a s c e , Ω z j i + 1 ,
where r ( · ) and a s c ( · ) are defined in (A6) and (A19), respectively. An example of computing α ˜ i e is given at the end of Appendix A.3 in the Appendix A.
Lemma 5.
Let μ 0 , 1 j , abbreviated as μ, be absolutely actually computable on a sufficiently significant FMA D j and let N z j be an FANN implemented on D j . Let G 0 , 1 t in (25), abbreviated as G t , be signifiable. Let 0 e < E j z , η 0 , 1 t , j ( x ) = η t ( x ) = x ˜ = μ a 0 e , , μ a n n ( e ) 1 e N , where a e is defined in (16). Then,
α ˜ i e ( x ˜ ) = μ α i 0 ζ n n ( 0 ) G n n ( 0 ) t , w 0 if e = 0 , μ α i e ζ n n ( e 1 ) G n n ( e 1 ) t , w e if e > 0 ,
where t = a s x x ˜ , G n n ( 0 ) , for 1 t L t G n n ( 0 ) and e = 0 ; t = a s x x ˜ , G n n ( e 1 ) , for 1 t L t G n n ( e 1 ) and e > 0 ; and a s x is as defined in (A18).
Proof. 
By (28)–(30) and (A18), we have
α ˜ i e ( x ˜ ) = r a s c x ˜ , r a s c e , Ω z j i + 1 = r a s c x ˜ , r e , Ω 0 e , , Ω n n ( e ) 1 e i + 1 = r a s c x ˜ , Ω 0 e , , Ω n n ( e ) 1 e i + 1 = r a s c x ˜ , Ω i e
If e = 0 , then t = a s x x ˜ , G n n ( 0 ) , for 1 t L t G n n ( 0 ) . Thus,
α ˜ i e ( x ˜ ) = r a s c x ˜ , Ω i e = μ α i 0 ζ n n ( 0 ) x ˜ , ( ) .
If e > 0 , then t = a s x x ˜ , G n n ( e 1 ) , for 1 t L t G n n ( e 1 ) . Thus,
α ˜ i e ( x ˜ ) = r a s c x ˜ , Ω i e = μ α i e ζ n n ( e 1 ) x ˜ , w e .
   □
If e = 0 and G n n ( 0 ) t = x ˜ , for 0 < t L t G n n ( 0 ) , let
a ˜ 0 = α ˜ 0 0 x ˜ , , α ˜ n n ( 0 ) 1 0 x ˜ .
If 0 < e < E z j and G n n ( e 1 ) t = x ˜ , for 0 < t L t G n n ( e 1 ) , let
a ˜ e = α ˜ 0 e x ˜ , , α ˜ n n ( e ) 1 e x ˜ .
Theorem 1.
Let N z j be an FANN with E z j > 0 layers on a sufficiently significant FMA D j , and let f z j ( x , e ) in (17) be absolutely actually computable. Let μ 0 , 1 j ( · ) be absolutely actually computable and G 0 , 1 t , j , for t { n n ( e ) | 0 e < E z j } , be signifiable. Then, if x ˜ = μ 0 , 1 j x 0 , , μ 0 , 1 j x n n ( 0 ) 1 = a ˜ 0 = η 0 , 1 n n ( 0 ) , j ( x ) , where η 0 , 1 t , j is defined in (26), there exists a primitive recursive function f ˜ z j ( x ˜ , e ) such that
f z j ( x , e ) = a e iff f ˜ z j ( x ˜ , e ) = a ˜ e .
Proof. 
Let us abbreviate f z j to f, μ 0 , 1 j to μ , μ 0 , 1 j 1 to μ 1 , η 0 , 1 t , j to η t , ζ 0 , 1 t , j to ζ t , and G 0 , 1 t , j to G t . Since G t is signifiable, ζ t and η t are absolutely actually computable. Let
f ˜ z j ( x ˜ , 0 ) = x ˜ , f ˜ z j ( x ˜ , e + 1 ) = α ˜ 0 e + 1 f ˜ z j x ˜ , e , , α ˜ n n ( e + 1 ) 1 e + 1 f ˜ z j x ˜ , e .
Let us abbreviate f ˜ z j to f ˜ , and let e = 0 . Then f ( x , 0 ) = a 0 = x and f ˜ ( x ˜ , 0 ) = x ˜ . We observe that
x ˜ = μ x 0 , , μ x n n ( 0 ) 1 = μ α 0 0 x , ( ) , , μ α n n ( 0 ) 1 0 x , ( ) = η n n ( 0 ) ( x ) .
Since μ is an absolutely actually computable bijection,
x = ( μ 1 μ x 0 , , μ 1 μ x n n ( 0 ) 1 = μ 1 x ˜ 1 , , μ 1 x ˜ n n ( 0 ) = ζ n n ( 0 ) ( x ˜ ) .
By (26), η n n ( 0 ) ( x ) = x ˜ iff ζ n n ( 0 ) ( x ˜ ) = x . Thus, f ( x , 0 ) = a 0 iff f ˜ ( x ˜ , 0 ) = x ˜ .
Let e = 1 . Then,
f ( x , 1 ) = a 1 = α 0 1 a 0 , w 1 , , α n n ( 1 ) 1 1 a 0 , w 1 = α 0 1 x , w 1 , , α n n ( 1 ) 1 1 x , w 1 .
By Lemma 5,
f ˜ ( x ˜ , 1 ) = α ˜ 0 1 f ˜ x ˜ , 0 , , α ˜ n n ( 1 ) 1 1 f ˜ x ˜ , 0 = α ˜ 0 1 x ˜ , , α ˜ n n ( 1 ) 1 1 x ˜ = μ α 0 1 x , w 1 , , μ α n n ( 1 ) 1 1 x , w 1 = μ α 0 1 a 0 , w 1 , , μ α n n ( 1 ) 1 1 a 0 , w 1 = μ a 0 1 , , , μ a n n ( 1 ) 1 1 = a ˜ 1 = η n n ( 1 ) ( a 1 ) .
Since μ is an absolutely actually computable bijection,
a 1 = μ 1 μ a ˜ 1 1 , , μ 1 μ a ˜ 1 n n ( 1 ) ,
whence, since ζ n n ( 1 ) ( a ˜ 1 ) = a 1 iff η n n ( 1 ) ( a 1 ) = a ˜ 1 , f ( x , 1 ) = a 1 iff f ˜ ( x ˜ , 1 ) = a ˜ 1 .
Let us assume f ( x , e ) = a e iff f ˜ ( x ˜ , e ) = a ˜ e for e 1 . Then,
f ( x , e + 1 ) = a e + 1 = ( α 0 e + 1 f x , e , w e + 1 , , α n n ( e + 1 ) 1 e + 1 f x , e , w e + 1 = α 0 e + 1 a e , w e + 1 , , α n n ( e + 1 ) 1 e + 1 a e , w e + 1 ,
and
f ˜ ( x ˜ , e + 1 ) = α ˜ 0 e + 1 f ˜ x ˜ , e , , α ˜ n n ( e + 1 ) 1 e + 1 f ˜ x ˜ , e = α ˜ 0 e + 1 a ˜ e , , α ˜ n n ( e + 1 ) 1 e + 1 a ˜ e = μ α 0 e + 1 a e , w e + 1 , , μ α n n ( e + 1 ) 1 e + 1 a e , w e + 1 = η n n ( e + 1 ) ( a e + 1 ) .
Then,
a e + 1 = μ 1 μ a ˜ e + 1 1 , , μ 1 μ a ˜ e + 1 n n ( e + 1 ) ,
whence, by induction, since ζ n n ( e + 1 ) ( a ˜ e + 1 ) = a e + 1 iff η n n ( e + 1 ) ( a e + 1 ) = a ˜ e + 1 , f ( x , e + 1 ) = a e + 1 iff f ˜ ( x ˜ , e + 1 ) = a ˜ e + 1 .    □
Let, for x R 0 , 1 j n n ( 0 ) and E z j > 0 ,
A z j ( x ) = f z j ( x , E z j 1 ) ,
and, for x ˜ = η n n ( 0 ) ( x ) , let
A ˜ z j ( x ) = f ˜ z j ( x ˜ , E z j 1 ) .
Then, A z j ( x ) is the absolutely actually computable function computed by N z j and, by Theorem 1, A ˜ z j is primitive recursive. We are now in a position to prove the final theorem of this article.
Theorem 2.
Let
N j = N 1 j , N 2 j , , N k j , k N + ,
be the set of FANNs implemented on a sufficiently significant FMA D j , and let
A j = A 1 j , A 2 j , , A k j , k N + ,
be the set of corresponding absolutely actually computable functions of the FANNs in N j , as defined in (33). There exists a bijection between N j and a class of primitive recursive functions.
Proof. 
Let
O j = Ω 1 j , Ω 2 j , , Ω k j , k N + ,
be the set of the numbers Ω z j defined in (28), each of which uniquely corresponds to N z j N j . Let
F j = A ˜ 1 j , A ˜ 2 j , , A ˜ k j , k N + ,
be a class of primitive recursive functions, one function per each Ω z j O z , as defined in (34). We observe that
| N j | = | A j | = | O j | = | F j | = k .
Let λ 1 : N j A j , λ 2 : A j O j , and λ 3 : O j F j be defined as
λ 1 j ( N j ) = A z j ; λ 2 j ( A z j ) = Ω z j ; λ 3 j ( Ω z j ) = A ˜ z j .
Then, λ j : N j F j , defined as
λ j ( N z j ) = λ 3 j λ 2 j λ 1 j N z j ,
is a bijection.    □

8. Discussion

The definition of the finite memory device or automation (FMD or FMA) in Section 2.2 has four main implications. First, a physical or abstract automaton is an FMD when its memory amount is quantifiable as a natural number. Second, characters and strings are not necessary, because bijections exist between any finite alphabet of symbols and natural numbers and, through Gödel numbering, between any strings over a finite alphabet and natural numbers, hence the term numerical memory used in the article. Third, an FSA of classical computability becomes an FMA when the quantity of its internal and external memory is finite, i.e., there is an upper bound in the form of a natural number on the quantity of the machine’s memory. It is irrelevant for the scope of this investigation whether the input tape of an FSA, the input and output tapes of such FSA modifications as the Mealy and Moore machines (Chapter 2 in [12]) or the finite state transducers (Chapter 3 in [13]), and the input tape and the stack of a pushdown automaton (PDA) (Chapter 5 in [12]) are considered internal or external memory. Fourth, a universal Turing machine (UTM) (Chapter 6 in [7]) is an FMA when the number of its tape cells is bounded by a natural number, which a fortiori makes any physical computer an FMA. Thus, only one type of universal computer is needed to define all FMA it can simulate.
Consider a universal computer U C capable of executing the universal L program U 1 constructed to prove the Universality Theorem (Theorem 3.1, Chapter 3 in [7]). The computer U C , equivalent to a UTM, takes an arbitrary L program P, an input to that program in the form of a natural number stored in its input register X 1 , which can be a Gödel number encoding an array of numbers, executes P on X 1 by encoding the memory of P as another Gödel number and returns the output of P as a natural number, which can also be a Gödel number encoding a sequence of natural numbers, saved in its output register Y. Since characters and character sequences can be bijectively mapped to natural numbers, U C can simulate any FSA or a modification thereof, e.g., a Mealy machine, a Moore machine, a finite state transducer, or a PDA. Technically speaking, there is no need to distinguish between the Mealy and Moore machines, because they are equivalent (Theorems 2.6, 2.7, Chapter 2 in [12]). When a limit is placed on the numerical memory of U C by way of the number of registers it can use and the size of the numbers signifiable in them, the input and output registers included, U C immediately becomes an FMD and so a fortiori any device that U C is capable of simulating.
The separation of computability into the two overlapping categories, general and actual, is necessary for theoretical and practical reasons. A theoretical reason, generally accepted in classical computability theory, is that it is of no advantage to put any memory limitations on automata or on the a priori counts of unit time steps that automata may take to execute programs that implement functions in order to show that those functions are computable. Were it not the case, we would not be able to investigate what is computable in principle. Rogers [10] succinctly expresses this point of view:
                 "[w]e thus require that a computation
                  terminate after some finite number
                  of steps; we do not insist on an a
                  priori ability to estimate this number."
An implication of the above assumption is that an automaton, explicit or implicit, on which the said computation is executed has access to, literally, astronomical quantities of numerical memory. For a thought experiment, consider an automaton programmable in L of Chapter 2 of [7] that we used in Section 3.1, and let a program P L j ( n ) , n N + , compute the G-number of the sequence ( 1 , , n ) , i.e., the function computed by P L j is f ( n ) = [ 1 , , n ] , as defined in (A8). Then, f ( n ) is a primitive recursive function and, hence, computable in the general sense of Definition 2. Thus, f ( n ) is signifiable for any n N + on the automaton. In particular, if n is the Eddington number, i.e., n = 10 80 N + , estimating the number of hydrogen atoms in the observable universe [14], there is a computation and, by implication, a variable in P L j to which the G-number of ( 1 , 2 , , 10 80 ) can be assigned.
The foregoing paragraph brings us to a practical reason for separating computability into the general and actual categories: it is of little use for an applied scientist who wants to implement a number-theoretic function f in a programming language L for an FMA D j to know that f is generally computable and the L program can, therefore, compute, in principle, some characteristic of arbitrarily large natural numbers, e.g., the Eddington number. If no natural number greater than some n N is signifiable on D j , the scientist must make provisions in the program for the non-signifiable numbers in order to achieve feasible results with absolutely actually computable functions.
Theorem 1 shows that the computation of a trained FANN on a finite memory device can be packed into a unique natural number. Once packed, the natural number can be used as an archive, after a fashion, to look up natural numbers that correspond, in the bijective sense of the term, to the real vectors computed by the function A z z of an FANN N z j implemented on the device. The correspondence is such that for any signifiable x , the output of N z j , i.e., A z j ( x ) = a , corresponds to the natural number a ˜ computed by the primitive recursive function A ˜ z j , i.e., A ˜ z j ( x ˜ ) = a ˜ , and the input x corresponds to the natural number x ˜ . Thus, A z j ( x ) = a iff A ˜ z j ( x ˜ ) = a ˜ . Furthermore, the function A ˜ z j is computable in the general sense and is absolutely actually computable on any FMA where the natural number Ω z j is signifiable.
A correspondence established in Theorem 2 should be construed so that the uniqueness of Ω z j does not imply the uniqueness of A z j because the same function can be computed by different FANNs. What it implies is that, for any two different FANNs N n j and N m j , n m (e.g., different numbers of layers or different numbers of nodes in a layer or different activation functions or different weights), implemented on the same FMA D j , Ω n j Ω m j . However, it may be the case that A m j ( x ) = A n j ( x ) for any signifiable x , and consequently, A ˜ m j ( x ˜ ) = A ˜ n j ( x ˜ ) .

9. Conclusions

To differentiate between feedforward artificial neural networks and their functions as abstract mathematical objects and the realizations of these networks and functions on finite memory devices, we introduced the categories of general and actual computability. We showed that correspondences are possible between trained feedforward artificial neural networks on finite memory devices and classes of primitive recursive functions. We argued that there are theoretical and practical reasons why computability should be separated into these categories. The categories are overlapping in the sense that some functions belong in both categories.

Funding

This research received no external funding.

Data Availability Statement

No additional data are provided for this article.

Conflicts of Interest

The author declares no conflict of interest with himself.

Abbreviations

The following abbreviations are used in this article:
ANNArtificial Neural Network
FANNFeedforward Artificial Neural Network
FMAFinite Memory Automaton or Automata
FMDFinite Memory Device
G-numberGödel Number
TMTuring Machine
UTMUniversal Turing Machine
FSAFinite State Automaton or Automata
PDAPushdown Automaton or Automata

Appendix A

Appendix A.1. Primitive Recursive Functions and Predicates

In this section, we define several functions shown to be primitive recursive in [7]. All smallcase variables in this section, e.g, x, y, z, t, n, and m, with and without subscripts, refer to natural numbers and the term number is synonymous with the term natural number.
The expression
( t ) z P ( t , x 1 , , x n )
is called the bounded existential quantification of the predicate P and holds iff P ( t , x 1 , , x n ) = 1 for at least one t such that 0 t z . The expression
( t ) z P ( t , x 1 , , x n )
is called a bounded universal quantification of P and holds iff P ( t , x 1 , , x n ) = 1 for every t such that 0 t z . If P ( t , x 1 , , x n ) is a predicate and z is a number, then
x = min t z { P ( t , x 1 , , x n ) }
is called the bounded minimalization of P and defines the smallest number t for which P holds or 0 if there is no such number. It is shown in [7] that (1) the predicates x = y , x y , x < y , x > y , x y , x y , and x | y , i.e., x divides y, are primitive recursive; (2) a finite logical combination of primitive recursive predicates is primitive recursive; and (3) if a predicate P ( · ) is primitive recursive, then so are its negation, its bounded minimalization, and its bounded universal and existential quantifications.
Let
x y = x y if x 0 , 0 if x < y .
The pairing function of natural numbers x and y, x , y : N N , is
x , y = z ,
where
z = 2 x ( 2 y + 1 ) 1 ; γ ( d ) { 2 d | ( z + 1 ) ( c ) z + 1 { 2 c ( z + 1 ) c d } } ; x = min d z + 1 γ ( d ) ; y = 1 2 z + 1 2 x 1 .
For any number z, there are unique x and y such that x , y = z . For example, if z = 27 , then
x = min d 28 γ ( d ) = 2 ; y = 1 2 28 2 2 1 = 3 ; 2 , 3 = 2 2 ( 2 · 3 + 1 ) 1 = 27 .
The functions l ( z ) and r ( z )
l ( z ) = min x z { ( y ) z { z = x , y } } r ( z ) = min y z { ( x ) z { z = x , y } }
return the left and right components of any number z so that l ( z ) , r ( z ) = z . Thus, if z = 27 = 2 , 3 , then l ( z ) = 2 , r ( z ) = 3 .
The symbol p n refers to the n-th prime, i.e., p 1 = 2 , p 2 = 3 , p 3 = 5 , etc., and p 0 = 0 , by definition. The primes are computed by the following primitive recursive function.
π ( i ) = p i .
Thus, π ( 0 ) = 0 , π ( 1 ) = 2 , π ( 2 ) = 3 , π ( 3 ) = 5 , π ( 4 ) = 7 , π ( 5 ) = 11 , etc. If ( a 1 , , a n ) is a sequence of numbers, the function
[ a 1 , , a n ] = i = 1 n π ( i ) a i
computes the Gödel number (G-number) of this sequence. The G-number of the empty number sequence ( ) is 1. Thus, the G-number of ( 3 , 101 , 7891 , 1 , 43 ) is [ 3 , 101 , 7891 , 1 , 43 ] = 2 3 · 3 101 · 5 7891 · 7 1 · 11 43 .
If x = [ a 1 , , a n ] , the accessor function
( x ) i = min t x { ¬ { π ( i ) t + 1 | x } }
returns the i-th element of x. Thus, if x = [ 1 , 7 , 13 ] , then ( x ) 1 = 1 , ( x ) 2 = 7 , ( x ) 3 = 13 , and ( x ) j = 0 for j = 0 or j > 3 .
The length of a Gödel number x is the position of the last non-zero prime power in x. Specifically, if x = [ a 1 , a 2 , , a n ] , its length is computed by the function L t ( · ) defined as
L t ( x ) = min i x { ( x ) i 0 ( j ) x { { j > i } { ( x ) j = 0 } } } .
Thus, L t ( 540 ) = L t ( [ 2 , 3 , 1 ] ) = 3 . L t ( [ a 1 , , a n ] ) = n iff a n 0 , [ ( x ) 1 , , ( x ) n ] = x when L t ( x ) = n , and L t ( 0 ) = L t ( 1 ) = 0 . L t ( [ x 1 , x 2 , , x n ] ) = L t ( [ x 1 , x 2 , , x n , 0 , , 0 ] ) , where x n 0 .
The function x / y returns the integer part of the quotient x / y . Thus, 7 / 2 = 3 , 2 / 5 = 0 , 8 / 5 = 1 , and x / 0 = 0 for any number x.

Appendix A.2. Gödel Number Operators

The functions in this section or variants thereof were shown to be primitive recursive in [11]. The function
s e t ( b , i , v ) = b π ( i ) ( b ) i · π ( i ) v if 1 i L t ( b ) b > 1 v > 0 , 0 otherwise
assigns the value of the i-th element of the G-number b to v. Thus, if b = [ 1 , 2 ] = 2 1 3 2 = 18 , i = 1 , and v = 3 , then
s e t ( [ 1 , 2 ] , 1 , 3 ) = b π ( 1 ) ( b ) 1 · π ( 1 ) 3 = [ 1 , 2 ] 2 ( [ 1 , 2 ] ) 1 · 2 3 = 2 1 · 3 2 2 1 · 2 3 = [ 3 , 2 ] = 72 .
The function c n t ( · ) in (A12), where s ( t ) = t + 1 is one of the three initial functions defined in (11) and ( x ) i is defined in (A9), returns the count of occurrences of x in y. Thus, if y = [ 1 , 2 , 1 , 3 ] , then c n t ( 1 , y ) = 2 . A convention in (A12) and other equations in this section is that the name of auxiliary functions end in “x”.
c n t ( x , y ) = c n t x ( x , y , L t ( y ) ) if y > 1 , 0 otherwise .
c n t x ( x , y , 0 ) = 0 , c n t x ( x , y , t + 1 ) = c n t x x ( x , y , t , c n t x ( x , y , t ) ) .
c n t x x ( x , y , t , c ) = 1 + c if ( y ) s ( t ) = x , c otherwise .
If y is a G-number, then the predicate
x g y c n t ( x , y ) 0
holds if x is an element of y. Thus, 1 g [ 3 , 4 , 1 , 5 ] , but 1 g [ 3 , 4 , 2 , 5 ] . The function
r a p ( x , y ) = y · { π ( L t ( y ) + 1 ) } x if x > 0 y > 1 0 g y , 0 otherwise
appends x to the right of the rightmost element of y. Thus,
r a p ( 1 , [ 1 ] ) = [ 1 ] · { π ( L t ( [ 1 ] ) + 1 ) } 1 = [ 1 ] · { π ( 2 ) } 1 = 2 1 · 3 1 = [ 1 , 1 ] ; r a p ( 8 , [ 2 , 3 , 5 ] ) = [ 2 , 3 , 5 ] · { π ( 4 ) } 8 = [ 2 , 3 , 5 , 8 ] ; r a p ( 5 , s e t ( [ 10 , 3 ] , 1 , 2 ) ) = r a p ( 5 , [ 2 , 3 ] ) = [ 2 , 3 , 5 ] .
Let
l c ( x 1 , x 2 , 0 ) = x 2 , l c ( x 1 , x 2 , t + 1 ) = r a p ( ( x 1 ) s ( t ) , l c ( x 1 , x 2 , t ) ) .
Then, the function
x l y = l c ( x , y , L t ( x ) ) if x > 1 y > 1 0 g x 0 g y , x if x > 1 y = 1 0 g x , 0 otherwise
places all numbers in y, in order, to the left of the first number in x, while the function
x r y = y l x if x > 1 y > 1 0 g x 0 g y , x if x > 1 y = 1 0 g x , 0 otherwise
places all numbers of y, in order, to the right of the rightmost number in x. We refer to the function in (A15) as left concatenation and to the function in (A16) as right concatenation. Thus, [ 3 , 5 ] l [ 7 , 11 ] = [ 7 , 11 , 3 , 5 ] ; [ 3 , 5 ] r [ 7 , 11 ] = [ 3 , 5 , 7 , 11 ] ; [ 2 , 3 ] l [ 1 ] = [ 1 , 2 , 3 ] ; [ 2 , 3 ] r [ 1 ] = [ 2 , 3 , 1 ] .
Let
g n x ( l , u , k , 0 ) = [ l ] , g n x ( l , u , k , t + 1 ) = g n x x ( l , u , k , g n x ( l , u , k , t ) , t ) ;
g n x x ( l , u , k , z , t ) = z r [ l + s ( t ) k ] if l + s ( t ) k u , z otherwise .
Then, for l > 0 and u > 0 , the function
g g n ( l , u , k ) = g n x ( l , u , k , s ( u l ) ) if k > 0 ( t ) u { l + t k = u t > 0 } , 0 otherwise .
generates a G-number whose numbers start at l and go to u in positive integer increments of k. Thus, g g n ( 1 , 2 , 1 ) = [ 1 , 2 ] ; g g n ( 1 , 2 , 2 ) = 0 ; g g n ( 1 , 3 , 1 ) = [ 1 , 2 , 3 ] ; g g n ( 1 , 3 , 2 ) = [ 1 , 3 ] ; g g n ( 1 , 3 , 3 ) = 0 . The abbreviation g g n stands for generator of Gödel numbers.
The function
a s x ( x , y ) = min t L t ( y ) { t > 0 x = l ( ( y ) t ) }
returns the smallest index t of i , j y such that x = i . Thus, if
y = [ 10 , 100 , 20 , 200 , 30 , 300 ] ,
then a s x ( 10 , y ) = 1 , a s x ( 20 , y ) = 2 , a s x ( 30 , y ) = 3 . The function
a s c ( x , y ) = ( y ) a s x ( x , y )
returns the pair from y at the index t returned by a s x ( · ) . Thus, if
y = [ 10 , 100 , 20 , 200 , 30 , 300 ] ,
then
a s c ( 10 , y ) = ( y ) a s x ( 10 , y ) = ( y ) 1 = 10 , 100 ; a s c ( 20 , y ) = ( y ) a s x ( 20 , y ) = ( y ) 2 = 20 , 200 ; a s c ( 30 , y ) = ( y ) a s x ( 30 , y ) = ( y ) 3 = 30 , 200 ; a s c ( 13 , y ) = ( y ) a s x ( 13 , y ) = ( y ) 0 = 0 .

Appendix A.3. Examples of Ω Numbers

Let us abbreviate G 0 , 1 t , j in (25) to G t and consider the FANN in Figure 1. Let us assume that, as in Example 3, R = { 0 , 0.3 , 0.6 , 0.9 , 1 } , I = { 1 , 2 , 3 , 4 , 5 } and t = 2 , and
G 0 , 1 2 , j = G 2 = [ [ 1 , 1 ] , [ 1 , 2 ] , [ 1 , 3 ] , [ 1 , 4 ] , [ 1 , 5 ] , [ 2 , 1 ] , [ 2 , 2 ] , [ 2 , 3 ] , [ 2 , 4 ] , [ 2 , 5 ] , [ 3 , 1 ] , [ 3 , 2 ] , [ 3 , 3 ] , [ 3 , 4 ] , [ 3 , 5 ] , [ 4 , 1 ] , [ 4 , 2 ] , [ 4 , 3 ] , [ 4 , 4 ] , [ 4 , 5 ] , [ 5 , 1 ] , [ 5 , 2 ] , [ 5 , 3 ] , [ 5 , 4 ] , [ 5 , 5 ] ] .
In other words, G 2 is a G-number such that [ x 1 , x 2 ] g G 2 iff ( x 1 , x 2 ) I 2 . G 3 , whose definition we omit for space reasons, is a G-number whose length is 125 such that [ x 1 , x 2 , x 3 ] g G 3 iff ( x 1 , x 2 , x 3 ) I 3 , e.g., [ 1 , 2 , 3 ] g G 3 iff ( 1 , 2 , 3 ) I 3 . We can compute Ω i e for the FANN N z j in Figure 1 as follows.
Ω 0 0 = G 2 1 , μ α 0 0 ζ 2 G 2 1 , ( ) , , G 2 25 , μ α 0 0 ζ 2 G 2 25 , ( ) ; Ω 1 0 = G 2 1 , μ α 1 0 ζ 2 G 2 1 , ( ) , , G 2 25 , μ α 1 0 ζ 2 G 2 25 , ( ) ; Ω 0 1 = G 2 1 , μ α 0 1 ζ 2 G 2 1 , w 1 , , G 2 25 , μ α 0 1 ζ 2 G 2 25 , w 1 ; Ω 1 1 = G 2 1 , μ α 1 1 ζ 2 G 2 1 , w 1 , , G 2 25 , μ α 1 1 ζ 2 G 2 25 , w 1 ; Ω 2 1 = G 2 1 , μ α 2 1 ζ 2 G 2 1 , w 1 , , G 2 25 , μ α 2 1 ζ 2 G 2 25 , w 1 ; Ω 0 2 = G 3 1 , μ α 0 2 ζ 3 G 3 1 , w 2 , , G 3 125 , μ α 0 2 ζ 3 G 3 125 , w 2 ; Ω 1 2 = G 3 1 , μ α 1 2 ζ 3 G 3 1 , w 2 , , G 3 125 , μ α 1 2 ζ 3 G 3 125 , w 2 .
We can compute individual elements of Ω i e . For example, since ( G 2 ) 17 = [ 4 , 2 ] ,
Ω 0 0 17 = G 2 17 , μ α 0 0 ζ 2 G 2 17 , ( ) = [ 4 , 2 ] , μ α 0 0 ζ 2 [ 4 , 2 ] , ( ) = [ 4 , 2 ] , μ α 0 0 ( 0.9 , 0.3 ) , ( ) = [ 4 , 2 ] , μ 0.9 = [ 4 , 2 ] , 4 N .
Since ( G 2 ) 12 = [ 3 , 2 ] ,
Ω 0 1 12 = G 2 12 , μ α 0 1 ζ 2 G 2 12 , w 1 = [ 3 , 2 ] , μ α 0 1 ζ 2 [ 3 , 2 ] , w 1 = [ 3 , 2 ] , μ α 0 1 ( 0.6 , 0.3 ) , w 1 = [ 3 , 2 ] , z N ,
where z = μ α 0 1 ( 0.6 , 0.3 ) , w 1 I . We know that [ 2 , 3 , 4 ] g G 3 because ( 2 , 3 , 4 ) I 3 . Thus, ( G 3 ) t = [ 2 , 3 , 4 ] , for 1 t 125 . Let us therefore assume, for the sake of this example, that ( G 3 ) 35 = [ 2 , 3 , 4 ] . Then,
Ω 1 2 35 = G 3 35 , μ α 1 2 ζ 3 G 3 35 , w 2 = [ 2 , 3 , 4 ] , μ α 1 2 ζ 3 [ 2 , 3 , 4 ] , w 2 = [ 2 , 3 , 4 ] , μ α 1 2 ( 0.3 , 0.6 , 0.9 ) , w 2 = [ 2 , 3 , 4 ] , μ α 1 2 ( 0.3 , 0.6 , 0.9 ) , w 2 = [ 2 , 3 , 4 ] , z N ,
where z = μ α 1 2 ( 0.3 , 0.6 , 0.9 ) , w 2 I .
Using (29), we can compute Ω z j for the FANN N z j in Figure 1 with the Ω numbers as
Ω z j = 0 , Ω 0 0 , Ω 1 0 , 1 , Ω 0 1 , Ω 1 1 , Ω 2 1 , 2 , Ω 0 2 , Ω 1 2 .
From Ω z j above, we can compute all α ˜ i e defined in (30) for N z j in Figure 1. For example, since ( G 2 ) 12 = [ 3 , 2 ] ,
α ˜ 1 1 ( [ 3 , 2 ] ) = r a s c x , r a s c 1 , Ω z j 2 = r a s c [ 3 , 2 ] , Ω 1 1 = r G 2 12 , μ α 1 1 ζ 2 G 2 12 , w 1 = μ α 1 1 ζ 2 G 2 12 , w 1 = μ α 1 1 ζ 2 [ 3 , 2 ] , w 1 = μ α 1 1 0.6 , 0.3 , w 1 I .

References

  1. McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
  2. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  3. Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
  4. Gripenberg, G. Approximation by neural networks with a bounded number of nodes at each level. J. Approx. Theory 2003, 122, 260–266. [Google Scholar] [CrossRef] [Green Version]
  5. Guliyev, N.; Ismailov, V. On the approximation by single hidden layer feedforward neural networks with fixed weights. Neural Netw. 2019, 98, 296–304. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Gödel, K. On formally undecidable propositions of Principia Mathematica and related systems I. In Kurt Gödel Collected Works Volume I Publications 1929–1936; Feferman, S., Dawson, J.W., Kleene, S.C., Moore, G.H., Solovay, R.M., van Heijenoort, J., Eds.; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
  7. Davis, M.; Sigal, R.; Weyuker, E. Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science, 2nd ed.; Harcourt, Brace & Company: Boston, MA, USA, 1994. [Google Scholar]
  8. Kleene, S.C. Introduction to Metamathematics; D. Van Nostrand: New York, NY, USA, 1952. [Google Scholar]
  9. Meyer, M.; Ritchie, D. The complexity of loop programs. In Proceedings of the ACM National Meeting, Washington, DC, USA, 14–16 November 1967; pp. 465–469. [Google Scholar]
  10. Rogers, H., Jr. Theory of Recursive Functions and Effective Computability; The MIT Press: Cambridge, MA, USA, 1988. [Google Scholar]
  11. Kulyukin, V. On primitive recursive characteristics of chess. Mathematics 2022, 10, 1016. [Google Scholar] [CrossRef]
  12. Hopcroft, J.E.; Ullman, J.D. Introduction to Automata Theory, Languages, and Computation; Narosa Publishing Hourse: New Delhi, India, 2002. [Google Scholar]
  13. Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
  14. Eddington, A.S. The constants of nature. In The World of Mathematics; Newman, J.R., Ed.; Simon and Schuster: New York, NY, USA, 1956; Volume 2, pp. 1074–1093. [Google Scholar]
Figure 1. A 3-layer fully connected feedforward artificial neural network (FANN); layer 0 includes the neurons n 0 0 and n 1 0 ; layer 1 includes the neurons n 0 1 , n 1 1 , and n 2 1 ; layer 2 includes the neurons n 0 2 and n 1 2 ; the two arrows coming into n 0 0 and n 1 0 signify that layer 0 is the input layer; the two arrows going out of n 0 2 and n 1 2 signify that layer 2 is the output layer; w i , j e , 0 < e < 3 , is the weight of the synapse from n i e 1 to n j e , e.g., w 0 , 0 1 is the weight of the synapse from n 0 0 to n 0 1 and w 2 , 1 2 is the weight of the synapse from n 2 1 to n 1 2 .
Figure 1. A 3-layer fully connected feedforward artificial neural network (FANN); layer 0 includes the neurons n 0 0 and n 1 0 ; layer 1 includes the neurons n 0 1 , n 1 1 , and n 2 1 ; layer 2 includes the neurons n 0 2 and n 1 2 ; the two arrows coming into n 0 0 and n 1 0 signify that layer 0 is the input layer; the two arrows going out of n 0 2 and n 1 2 signify that layer 2 is the output layer; w i , j e , 0 < e < 3 , is the weight of the synapse from n i e 1 to n j e , e.g., w 0 , 0 1 is the weight of the synapse from n 0 0 to n 0 1 and w 2 , 1 2 is the weight of the synapse from n 2 1 to n 1 2 .
Mathematics 11 02620 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kulyukin, V.A. On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions. Mathematics 2023, 11, 2620. https://doi.org/10.3390/math11122620

AMA Style

Kulyukin VA. On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions. Mathematics. 2023; 11(12):2620. https://doi.org/10.3390/math11122620

Chicago/Turabian Style

Kulyukin, Vladimir A. 2023. "On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions" Mathematics 11, no. 12: 2620. https://doi.org/10.3390/math11122620

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop