On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions

Kulyukin, Vladimir A.

doi:10.3390/math11122620

Open AccessArticle

On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions

by

Vladimir A. Kulyukin

Department of Computer Science, Utah State University, Logan, UT 84322, USA

Mathematics 2023, 11(12), 2620; https://doi.org/10.3390/math11122620

Submission received: 6 May 2023 / Revised: 26 May 2023 / Accepted: 5 June 2023 / Published: 8 June 2023

(This article belongs to the Special Issue Theory of Algorithms and Recursion Theory)

Download

Browse Figure

Versions Notes

Abstract

:

When realized on computational devices with finite quantities of memory, feedforward artificial neural networks and the functions they compute cease being abstract mathematical objects and turn into executable programs generating concrete computations. To differentiate between feedforward artificial neural networks and their functions as abstract mathematical objects and the realizations of these networks and functions on finite memory devices, we introduce the categories of general and actual computabilities and show that there exist correspondences, i.e., bijections, between functions computable by trained feedforward artificial neural networks on finite memory automata and classes of primitive recursive functions.

Keywords:

computability theory; theory of recursive functions; artificial neural networks; number theory

MSC:

03D32

1. Introduction

An offspring of McCollough and Pitts’ research on foundations of cybernetics [1], artificial neural networks (ANNs) entered mainstream machine learning after the discovery of backpropagation by Rumelhart, Hinton, and Williams [2]. ANNs proved to be universal approximators of different classes of functions when no limits are imposed on the number of artificial neurons in any layer (arbitrary width) or on the number of hidden layers (arbitrary depth) and even with bounded widths and depths (e.g., [3,4,5]). ANNs cease being abstract mathematical objects when implemented in specific programming languages on computational devices with finite quantities of internal and external memory, to which we interchangeably refer in our article as finite memory devices (FMDs) and finite memory automata (FMA). To differentiate between functions computable by ANNs in principle and functions computable by ANNs realized on FMA, we introduce the categories of general and actual computabilities and show that there exist correspondences, i.e., bijections, between functions computable by trained feedforward ANNs (FANNs) on FMA and classes of primitive recursive functions.

Our article is organized as follows. In Section 2, we expound the terms, definitions, and notational conventions for functions and predicates espoused in this article and define the term finite memory automaton. In Section 3, we explicate the categories of general and actual computabilities and elucidate their similarities and differences. In Section 4, we formalize FANNs in terms of recursively defined functions. In Section 5, we present primitive recursive techniques to pack finite sets and Cartesian powers thereof into Gödel numbers. In Section 6, we use the set packing techniques of Section 5 to show that functions computable by trained FANNs implemented on FMA can be archived into natural numbers. In Section 7, we show how such archives can be used to define primitive recursive functions corresponding to functions computable by FANNs. In Section 8, we discuss theoretical and practical reasons for separating computability into the general and actual categories and pursue some implications of the theorems proved in Section 7. In Section 9, we summarize our conclusions. For the reader’s convenience, Appendix A gives supplementary definitions, results, and examples that are referenced in the main text when relevant.

2. Preliminaries

2.1. Functions and Predicates

If f is a function,

d o m (f)

and

c o d o m (f)

denote the domain and the co-domain of f, respectively. Statements such as

f : S \mapsto R

abbreviate the logical conjunction

d o m (f) = S \land c o d o m (f) = R

. A function f is partial on a set S if

d o m (f)

is a proper subset of S, i.e.,

d o m (f) \subset S

. Thus, if

S = N = {0, 1, 2, \dots}

and

f (x) = x^{1 / 3}

, then f is partial on S, because

d o m (f) = {i^{3} | i \in N} \subset N

. If S and R are sets, then

S = R

is logically equivalent to the logical conjunction

S \subseteq R \land R \subseteq S

, i.e., S is a subset of R, and vice versa. If f is partial on S and

z \in S

, the following statements are equivalent: (1)

z \in d o m (f)

; (2) f is defined on z; (3)

f (z)

is defined; and (4)

f (z) ↓

. The following statements are also equivalent: (1)

z \notin d o m (f)

; (2) f is undefined on z; (3)

f (z)

is undefined; and (4)

f (z) ↑

. If f is partial on S and

d o m (f) = S

, then f is total on S. Thus,

f (x) = x + 1

is total on

N

. When

f : S \mapsto R

is a bijection, i.e., f is injective (one-to-one) and surjective (onto), f is a correspondence between S and R.

If S is a set, then

| S |

is the cardinality of S, i.e., the number of elements in S. S is finite if and only if (iff)

| S | \in N

. For

n > 0

,

S^{n}

is the n-th Cartesian power of S, i.e.,

S^{n} = {(s_{0}, \dots, s_{n - 1}) | s_{i} \in S, 0 \leq i \leq n - 1}

= {(s_{1}, \dots, s_{n}) | s_{i} \in S, 1 \leq i \leq n}

Thus, if

f : R^{2} \mapsto N

,

d o m (f) = {(x_{1}, x_{2}) | x_{1}, x_{2} \in R}

. The symbol

\vec{x}

is a sequence of numbers, i.e., a vector, from a set S, i.e.,

\vec{x} = (x_{0}, x_{1}, \dots, x_{n - 1}) = (x_{1}, x_{2}, \dots, x_{n}) \in S^{n}

;

()

is the empty sequence. If

\vec{x} \in S^{n}

, its individual elements are

{\vec{x}}_{0} = x_{0}

,

{\vec{x}}_{1} = x_{1}

,

\dots,

{\vec{x}}_{n - 1} = x_{n - 1}

or, equivalently,

{\vec{x}}_{1} = x_{1}

,

{\vec{x}}_{2} = x_{2}

,

\dots,

{\vec{x}}_{n} = x_{n}

. If

d o m (f) \subseteq S^{n}

and

\vec{x} \in S^{n}

,

f (\vec{x})

=

f (({\vec{x}}_{0}, \dots, {\vec{x}}_{n - 1}))

=

f (x_{0}, \dots, x_{n - 1})

=

f (x_{1}, \dots, x_{n})

. If

f : d o m (f) \mapsto c o d o m (f)

is a bijection, the inverse of f is

f^{- 1} : c o d o m (f) \mapsto d o m (f)

. When the arguments of f are evident, f or

f (\cdot)

abbreviate

f (\vec{x})

,

f (x_{0}, \dots, x_{n - 1})

, or

f (x_{1}, \dots, x_{n})

A total function

P : S^{n} \mapsto {0, 1}

is a predicate if, for any

\vec{x} \in S^{n}

,

P (\vec{x}) = 1

or

P (\vec{x}) = 0

, where 1 arbitrarily designates the logical truth and 0 designates a logical falsehood. The symbols ¬, ∧, ∨, →, respectively, refer to logical not, logical and, logical or, and logical implication. We abbreviate

P (\vec{x}) = 1

to

P (\vec{x})

and

P (\vec{x}) = 0

to

\neg P (\vec{x})

. If P and Q are predicates, then

\neg P \lor Q

is logically equivalent to

P \to Q

, i.e.,

\neg P \lor Q \equiv P \to Q

. For clarity, sub-predicates of compound predicates may be included in matching pairs of

{}

. Thus, if a compound predicate P consists of predicates

P_{1}

,

P_{2}

,

P_{3}

, and

P_{4}

, it can be defined as

P \equiv {{P_{1} \to P_{2}} \land {P_{3} \lor P_{4}}}

. The symbols ∃ and ∀ refer to the logical existential (there exists) and universal (for all) quantifiers, respectively. Thus, the statement

(\exists \vec{x} \in S^{n}) P (\vec{x})

is logically equivalent to the statement that

P (\vec{x})

holds for at least one

\vec{x}

in

d o m (P)

, while the statement

(\forall \vec{x} \in S^{n}) P (\vec{x})

is logically equivalent to the statement that

P (\vec{x})

holds for all

\vec{x}

in

d o m (P)

.

2.2. Finite Memory Automata

A finite memory device

D_{j}

is a physical or abstract automaton with a finite quantity of internal and external memory and an automated capability of executing programs, i.e., finite sequences of instructions written in a formalism, e.g., a programming language for

D_{j}

, and stored in the finite memory of

D_{j}

. Since bijections exist between expressions over any finite alphabet, i.e., a finite set of symbols or signs, and subsets of

N

[6], we call the memory of

D_{j}

numerical memory. The numerical memory consists of registers, each of which is a sequence of numerical unit cells, e.g., digital array cells, mechanical switches, and finite state machine tape cells. The quantity of numerical memory is the product of the number of registers and the number of unit cells in each register, i.e., this quantity is a natural number.

A cell holds exactly one elementary sign from a finite alphabet, e.g., { “.”, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9” }, or is empty. The sign of the empty cell is unique and is not an elementary sign. A number sign is a sequence of elementary signs in consecutive cells of a register with no empty cells to the left of the first elementary sign and possibly some empty cells to the right of the rightmost elementary sign. Thus, if “|” is the empty sign on

D_{j}

, the alphabet is { “.”, “-”, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9” }, and each register on

D_{j}

has seven cells, then “3.1||||”, “3.14|||”, “3.141||”, “3.1415|”, and “3.14159” are number signs conventionally interpreted as the real numbers 3.1, 3.14, 3.141, 3.1415, and 3.14159, respectively. An arbitrary number sign interpretation is fixed a priori for a given alphabet and

D_{j}

and does not change from sign to sign. Thus, if the alphabet is { “,”, “0”, “f0”, “ff0”, “fff0”, “ffff0”, “fffff0”, “ffffff0”, “fffffff0”, “ffffffff0”, “fffffffff0” } and the interpretation is such that “,” is interpreted as the decimal point, “0” as 0, “f0” as 1, “ff0” as 2, “fff0” as 3, etc., “*” is the empty sign, and each sign is read left to right, then, if each register on

D_{j}

has twenty three cells, the sign “f0,ffff0f0ffff0ff0f0***” is interpreted as 1.41421.

A real number x is signifiable on

D_{j}

iff a register on

D_{j}

can hold its sign. Put another way, a number is signifiable on

D_{j}

if, in a programming language L for

D_{j}

, the number’s sign can be assigned to a variable, i.e., stored in a designated register. When x is signifiable on

D_{j}

, we say that x is simply signifiable. A set or a sequence of numbers is signifiable if each number in the set or sequence is signifiable.

Δ_{j} > 0

is the smallest positive signifiable real number on

D_{j}

iff for any signifiable x, there is no signifiable y such that

x < y < x + Δ_{j}

. The finite set of real numbers in the closed interval between 0 and 1 signifiable on

D_{j}

is

R_{0, 1}^{j} \equiv {x \in R | x = i Δ_{j} < 1} \cup {1}, i \in N .

(1)

We note, in passing, a notational convention in Equation (1) to which we adhere in our article: if

D_{j}

is an FMA, then the Latin letter j in subscripts or superscripts of symbols is used to emphasize that they are defined with respect to

D_{j}

. Thus, if

D_{j}

and

D_{k}

are two FMDs with different quantities of numerical memory,

Δ_{j} \neq Δ_{k}

.

Lemma 1.

If

z = i Δ_{j}

is a maximal element of

{x \in R | x = i Δ_{j} < 1}

and

y = (i + 1) Δ_{j}

, then

y \geq 1

.

Proof.

If

y \in R_{0, 1}^{j}

, then

y = 1

, because 1 is the only number in

R_{0.1}^{j}

greater than z. If

y \notin R_{0, 1}^{j}

, then

y > 1

and

z < 1 < y

. □

A corollary of Lemma 1 is that if

a, b

are signifiable,

a < b

, then

R_{a, b}^{j} \equiv {x \in R | x = a + i Δ_{j} < b} \cup {b}, i \in N,

(2)

is the finite set of signifiable numbers in the closed interval from a to b such that there exists no signifiable number between any two consecutive members of

R_{a, b}^{j}

when the latter is sorted in non-descending order.

Lemma 2.

If

a, b

are signifiable and

b - a \geq Δ_{j}

, there exists a bijection

ψ_{a, b}^{j}

:

R_{a, b}^{j}

↦

Z_{a, b}^{j}

=

{0, \dots, z} \subset N

,

z > 0

, where

a + z Δ_{j} \geq b

. If

a + z Δ_{j}

is signifiable, it is the smallest signifiable number

\geq b

.

Proof.

Let

ψ_{a, b}^{j} (x) = \{\begin{matrix} k & if x = a + k Δ_{j} < b, \\ z & if {{x = a + z Δ_{j} = b} \lor \\ {a + (z - 1) Δ_{j} < x = b < a + z Δ_{j}}} . \end{matrix}

(3)

Let

r \in Z_{a, b}^{j}

. If

r = z

, then

ψ_{a, b}^{j} (x) = r

, for

x = a + z Δ_{j} = b

or

a + (z - 1) Δ_{j} < x = b < a + z Δ_{j}

. If

r < z

, then

ψ_{a, b}^{j} (x) = r

, for

x = a + r Δ_{j}

. Let

ψ_{a, b}^{j} (x) = ψ_{a, b}^{j} (y) = r

. If

r = z

, then

x = a + r Δ_{j} = y = b

or

a + (r - 1) Δ_{j} < x = b = y < a + r Δ_{j}

. If

r < z

, then

x = a + r Δ_{j} = y

. Let

a + z Δ_{j}

be signifiable. If

a + z Δ_{j} = b

, it is vacuously the smallest signifiable number

\geq b

. If

a + z Δ_{j} > b

, then, since

a + (z - 1) Δ_{j} < b < a + z Δ_{j}

, the assertion that

0 < b - (a + (z - 1) Δ_{j}) < Δ_{j}

or

0 < a + z Δ_{j} - b < Δ_{j}

leads to a contradiction. □

A corollary of Lemma 2 is that

{ψ_{a, b}^{j}}^{- 1} : Z_{a, b}^{j} \mapsto R_{a, b}^{j}

is

{ψ_{a, b}^{j}}^{- 1} (k) = \{\begin{matrix} x & if x = a + k Δ_{j} < b, \\ b & if {{b = a + k Δ_{j}} \lor \\ {a + (k - 1) Δ_{j} < b < a + k Δ_{j}}} . \end{matrix}

(4)

Lemmas 1 and 2 draw on the empirically verifiable fact manifested by division underflow errors in modern programming languages: given an FMD

D_{j}

and two signifiable real numbers a and b, with

a < b

, the set of signifiable real numbers in the closed interval between a and b is a proper finite subset of the set of real numbers

R

. Thus, bijections are possible between

R_{a, b}^{j}

and finite subsets of

N

. While these bijections may differ from FMA to FMA in that they depend on the exact quantity of memory on a given FMA, they differ only in terms of the cardinalities of their domains and co-domains: the larger the quantity of memory, the greater the cardinality. A constructive interpretation of Lemmas 1 and 2 is that if we take two signifiable real numbers a and b such that

b - a \geq Δ_{j}

, we can effectively enumerate the elements of

Z_{a, b}^{j}

by iteratively adding increasing integer multiples of

Δ_{j}

to a until we reach b, i.e.,

a + z Δ_{j} = b

, or go slightly above it, i.e.,

a + (z - 1) Δ_{j} < b < a + z Δ_{j}

, for

z > 0

.

To map the elements of

R_{a, b}^{j}

to

N^{+} = {1, 2, 3, \dots}

, we define the bijection

μ_{a, b}^{j} (x) : R_{a, b}^{j} \mapsto I_{a, b}^{j} = \{z + 1 | z \in Z_{a, b}^{j}\}

and its inverse

{μ_{a, b}^{j} (x)}^{- 1} : I_{a, b}^{j} \mapsto R_{a, b}^{j}

as

\begin{matrix} μ_{a, b}^{j} (x) & = & ψ_{a, b}^{j} (x) + 1; \\ {μ_{a, b}^{j}}^{- 1} (k) & = & {ψ_{a, b}^{j}}^{- 1} (k - 1), k > 0 . \end{matrix}

(5)

If we abbreviate

μ_{0, 1}^{j}

,

{μ_{0, 1}^{j}}^{- 1}

,

ψ_{0, 1}^{j}

,

{ψ_{0, 1}^{j}}^{- 1}

,

R_{0, 1}^{j}

, and

Z_{0, 1}^{j}

,

I_{0, 1}^{j}

to

μ

,

μ^{- 1}

,

ψ

,

ψ^{- 1}

, R, Z, and I, respectively, and let

Δ_{j} = 0.2

, we have the following example.

Example 1.

\begin{matrix} R = {0, 0.2, 0.4, 0.6, 0.8, 1}; Z = {0, 1, 2, 3, 4, 5}; I = {1, 2, 3, 4, 5, 6}; \\ ψ (0) = 0, ψ (0.2) = 1, ψ (0.4) = 2, ψ (0.6) = 3, ψ (0.8) = 4, ψ (1) = 5; \\ ψ^{- 1} (0) = 0, ψ^{- 1} (1) = 0.2, ψ^{- 1} (2) = 0.4, ψ^{- 1} (3) = 0.6, ψ^{- 1} (4) = 0.8, ψ^{- 1} (5) = 1; \\ μ (0) = 1, μ (0.2) = 2, μ (0.4) = 3, μ (0.6) = 4, μ (0.8) = 5, μ (1) = 6; \\ μ^{- 1} (1) = 0, μ^{- 1} (2) = 0.2, μ^{- 1} (3) = 0.4, μ^{- 1} (4) = 0.6, μ^{- 1} (5) = 0.8, μ^{- 1} (6) = 1 . \end{matrix}

For

Δ_{j} = 0.3

, we have another example.

Example 2.

\begin{matrix} R = {0, 0.3, 0.6, 0.9, 1}; Z = {0, 1, 2, 3, 4}; I = {1, 2, 3, 4, 5}; \\ ψ (0) = 0, ψ (0.3) = 1, ψ (0.6) = 2, ψ (0.9) = 3, ψ (1) = 4; \\ ψ^{- 1} (0) = 0, ψ^{- 1} (1) = 0.3, ψ^{- 1} (2) = 0.6, ψ^{- 1} (3) = 0.9, ψ^{- 1} (4) = 1; \\ μ (0) = 1, μ (0.3) = 2, μ (0.6) = 3, μ (0.9) = 4, μ (1) = 5; \\ μ^{- 1} (1) = 0, μ^{- 1} (2) = 0.3, μ^{- 1} (3) = 0.6, μ^{- 1} (4) = 0.9, μ^{- 1} (5) = 1 . \end{matrix}

3. Computability: General vs. Actual

Computability theory lacks a uniform, commonly accepted formalism for computable, partially computable, and primitive recursive functions. The treatment of such functions in our article is based, in part, on the formalism by Davis, Sigal, and Weyuker (Chapters 2 and 3 in [7]), which has, in turn, much in common with Kleene’s formalism (Chapter 9 in [8]). Alternative treatments include [9], where primitive recursive functions are formalized as loop programs consisting of assignment and iteration statements similar to DO statements in FORTRAN, and [10], where

λ

-calculus is used. These symbolically different treatments have one feature in common: computable, partially computable, and primitive recursive functions operate on natural numbers and the underlying automata, explicit or implicit, on which these functions can, in principle, be executed if implemented as programs in some formalism, have access to infinite numerical memory. To distinguish computability in principle from computability on finite memory automata, we introduce the categories of general and actual computabilities.

3.1. General Computability

As our formalism in this section, we use the programming language

L

developed in Chapter 2 in [7] and subsequently used in that book to define partially computable, computable, and primitive recursive functions and to prove various properties thereof. An

L

program

P

is a finite sequence of

L

instructions. The unique variable Y is designated as the output variable where the output of

P

on a given input is stored.

X_{1}, X_{2}, \dots

designate input variables, and

Z_{1}, Z_{2}, \dots

refer to internal variables, i.e., variables in

P

that are not input variables. No bounds are imposed on the magnitude of natural numbers assigned to variables.

L

has conditional dispatch instructions; line labels; elementary arithmetic operations on and comparisons of natural numbers; and macros, i.e., statements expandable into primitive

L

instructions.

A computation of

P

on some input

\vec{x} \in N^{m}

,

m > 0

, is a finite sequence of snapshots

(s_{1}, \dots, s_{k})

, where each snapshot

s_{1 \leq i \leq k}

,

k > 0

, specifies the number of the instruction in

P

to be executed and the value of each variable in

P

. The snapshot

s_{1}

is the initial snapshot, where the values of all input variables are set to their initial values, the program instruction counter is set to 1, i.e., the number of the first instruction in

P

, and the values of all the other variables in

P

are set to 0. The snapshot

s_{k}

in

(s_{1}, \dots, s_{k})

is a terminal snapshot, where the instruction counter is set to the number of the instructions in

P

plus 1. Not all snapshot sequences are computations. If

(s_{1}, s_{2}, \dots, s_{k})

is a computation of

P

on

\vec{x} \in N^{m}

, i.e.,

X_{1} = x_{1}

,

X_{2} = x_{2}

, …,

X_{m} = x_{m}

, then there is a function that, given the text of

P

and a snapshot

s_{1 \leq i < k}

in the computation, generates the next snapshot

s_{i + 1}

of the computation. This function can verify if

(s_{1}, \dots, s_{k})

constitutes the computation of

P

on

\vec{x}

. The existence of such functions implies that each instruction in

L

is interpreted unambiguously. If some program

P

in

L

takes m inputs and the values of the input variables are

X_{1} = x_{1}

,

X_{2} = x_{2}

, …,

X_{m} = x_{m}

, then

Ψ_{P}^{(m)} (x_{1}, x_{2}, \dots, x_{m}) = \{\begin{matrix} Y in s_{k} & if \exists a computation (s_{1}, \dots, s_{k}), k \geq 1, \\ ↑ & o t h e r w i s e \end{matrix}

(6)

denotes the value of Y in the terminal snapshot

s_{k}

if there exists a computation

(s_{1}, \dots, s_{k})

of

P

on

(x_{1}, x_{2}, \dots, x_{m})

and is undefined otherwise.

Definition 1.

A function

f : N^{m} \mapsto N

,

m \in N^{+}

, is partially computable if f is partial and there is an

L

program

P

such that Equation (7) holds.

(\forall \vec{x} \in N^{m}) f (\vec{x}) = Ψ_{P}^{(m)} (\vec{x})

(7)

Equation (7) is interpreted so that

f (\vec{x}) ↓

iff

Ψ_{P}^{(m)} (\vec{x}) ↓

and

f (\vec{x}) ↑

iff

Ψ_{P}^{(m)} (\vec{x}) ↑

.

Definition 2.

A function

f : N^{m} \mapsto N

,

0 < m \in N

, is computable if it is total, i.e.,

(\forall \vec{x} \in N^{m}) f (\vec{x}) ↓

, and partially computable.

Let

f : N^{k} \mapsto N

and

g_{i} : N^{n} \mapsto N

,

1 \leq i \leq k

,

n \in N^{+}

. Then,

h : N^{n} \mapsto N

is obtained by composition from

f, g_{1}, \dots, g_{k}

if

h (x_{1}, \dots, x_{n}) = f (g_{1} (x_{1}, \dots, x_{n}), \dots, g_{k} (x_{1}, \dots, x_{n})) .

(8)

Let

k \in N

,

n \in N^{+}

, and

ϕ : N^{2} \mapsto N

,

f : N^{n} \mapsto N

,

g : N^{n + 2} \mapsto N

be total. If h is obtained from

ϕ

by the recurrences in (9) or from f and g by the recurrences in (10), then h is obtained from

ϕ

or from f and g by primitive recursion or simply by recursion. The recurrences in (10) are isomorphic to Gödel’s recurrences (Section 2, Equation (2) in [6]) where he introduces the concept of recursively defined number-theoretic function. The three functions in (11) are the initial functions.

\begin{matrix} h (0) & = & k, \\ h (t + 1) & = & ϕ (t, h (t)) \end{matrix}

(9)

\begin{matrix} h (x_{1}, \dots, x_{n}, 0) & = & f (x_{1}, \dots, x_{n}), \\ h (x_{1}, \dots, x_{n}, t + 1) & = & g (t, h (x_{1}, \dots, x_{n}, t), x_{1}, \dots, x_{n}) \end{matrix}

(10)

\begin{matrix} s (x) = x + 1; \\ n (x) = 0; \\ u_{i}^{n} (x_{1}, \dots, x_{n}) = x_{i}, 1 \leq i \leq n, \end{matrix}

(11)

Definition 3.

A function is primitive recursive if it can be obtained from the initial functions by a finite number of applications of composition and recursion in (8)–(10).

An implication of Definition 3 is that if f is a primitive recursive function, then there is a sequence of functions

(f_{1}, \dots, f_{n} = f)

,

n > 0

, where every function in the sequence is an initial function or is obtained from the previous functions in the sequence by composition or recursion.

A class

C

of total functions is primitive recursively closed (PRC) if the initial functions are in it and any function obtained from the functions in

C

by composition or recursion is also in

C

. It has been shown (Chapter 3 in [7]) that (1) the class of computable functions is PRC; (2) the class of primitive recursive functions is PRC; and (3) a function is primitive recursive iff it belongs to every PRC class. A corollary of (3) is that every primitive recursive function is computable.

If

C

includes all functions of a certain type, we refer to it as the class of those functions, e.g., the class of partially computable functions, the class of computable functions, the class of primitive recursive functions, etc. When we say that

C^{'}

is a class of functions of a certain type, we mean that

C^{'} \subseteq C

, where

C

is the class of functions of that type.

3.2. Actual Computability

In general, the FMA defined in Section 2.2 is different from the finite state automata of classical computability theory, because the latter, e.g., a Turing machine (TM), do not impose any limitations on memory. A TM becomes an FMA iff the number of cells on its tape where it reads and writes symbols is finite. Analogously, a finite state automaton (FSA) of classical computability is an FMA iff there is a limit, expressed as a natural number, on the length of the input tape from which the FSA reads sign sequences over a given alphabet.

As is the case with general computability, we let

P_{L}^{j}

be a L program, i.e., a finite sequence of unambiguous instructions in a programming language L for an FMD

D_{j}

. Thus, if

D_{j}

is a physical computer with an operating system, e.g., Linux, a programming language for

D_{j}

can be Lisp, C, Perl, Python, etc. If

D_{j}

is an abstract FMA, e.g., a TM with a finite number of cells on its tape, then

D_{j}

is programmed with the standard quadruple formalism (Chapter 6 in [7]). If

D_{j}

is a mechanical device, then we assume that there is a formalism that consists of instructions such as “set switch i to position p”, “turn handle full circle clockwise t times”, etc. A state of

D_{j}

while executing

P_{L}^{j}

on some input

\vec{x}

includes the number of the instruction in

P_{L}^{j}

to execute next and, depending on

D_{j}

, may include the contents of each register, the signs on the finite input tape, or the state of each mechanical switch. As we did with general computability, we call such a state a snapshot of

D_{j}

for

P_{L}^{j} (\vec{x})

and define a computation of

P_{L}^{j} (\vec{x})

on

D_{j}

to be a finite sequence of snapshots

(s_{1}, \dots, s_{k})

,

k \geq 1

, where each subsequent snapshot is computed from the previous snapshot, the initial snapshot

s_{1}

has the values of all the variables in

P_{L}^{j}

appropriately specified and the instruction counter of

P_{L}^{j}

set to 1, and the terminal snapshot

s_{k}

has the instruction counter set to the number of the instructions in

P_{L}^{j}

plus 1. We let

Ψ_{P_{L}^{j}}^{(n)} (\vec{x})

denote the number sign corresponding to the output of

P_{L}^{j} (\vec{x})

executed on

D_{j}

. It is irrelevant to our discussion where this number sign is stored (e.g., in a register, a section of a finite tape, or the sequence of the positions of the mechanical switches examined left to right or right to left, etc.) so long as it is understood that the output, whenever there is a computation, is unambiguously interpreted as a real number according to an interpretation fixed a priori.

Definition 4.

A partial function

f : R^{m} \mapsto R

,

m \in N^{+}

, is actually partially computable on

D_{j}

if Equation (12) holds.

(\forall \vec{x} \in R^{m}) f (\vec{x}) = Ψ_{P_{L}^{j}}^{(m)} (\vec{x}) .

(12)

Equation (12) of actual computability is interpreted so that

f (\vec{x}) ↓

iff

Ψ_{P}^{(m)} (\vec{x}) ↓

, i.e.,

f (\vec{x}) = z

iff

Ψ_{P}^{(m)} (\vec{x}) = z

, for any

\vec{x} \in R^{m}

and

z \in R

signifiable on

D_{j}

, and

f (\vec{x}) ↑

iff

Ψ_{P}^{(m)} (\vec{x}) ↑

. However, unlike Equation (7) of general computability, which is defined only on natural numbers and every natural number is signifiable by implication, in actual computability, we have to make provisions for non-signifiable real numbers. Toward that end, we introduce the following inequality, which holds when a non-signifiable number is encountered during a computation of

P_{L}^{j} (\vec{x})

.

(\exists \vec{x} \in R^{m}) f (\vec{x}) \neq Ψ_{P_{L}^{j}}^{(m)} (\vec{x}) .

(13)

Inequality (13) can be illustrated with two examples. Let

D_{j}

have two cells per register, let

f : N^{2} \mapsto N

be

f (x_{1}, x_{2}) = x_{1} + x_{2}

, and let

P_{L}^{j} (x_{1}, x_{2})

be a program that implements f, i.e., adds two number signs of

x_{1}

and

x_{2}

and puts the number sign of

x_{1} + x_{2}

in a designated output register. Let number signs be interpreted in standard decimal notation. Furthermore, if some number x is not signifiable on

D_{j}

, only the first two elementary signs of the number sign of x are placed into a register, i.e., number signs are truncated to fit into registers, as is common in many programming languages. Then, after “100” is truncated to “10”,

f (99, 1) = 100 \neq Ψ_{P_{L}^{j}}^{(2)} (99, 1) = 10,

and

f (213, 13) = 226 \neq Ψ_{P_{L}^{j}}^{(2)} (213, 13) = 34,

because 213 is not signifiable on

D_{j}

and is truncated to “21.” In both cases,

f (x_{1}, x_{2})

, as a mathematical object, is total, and there is a computation of

P_{L}^{j} (x_{1}, x_{2})

on

x_{1} = 99, x_{2} = 1

and

x_{1} = 213, x_{2} = 13

, but during both computations, non-signifiable numbers, i.e., 100 and 213, are encountered.

Definition 5.

A function

f : R^{m} \mapsto R

,

m \in N^{+}

, is actually computable on

D_{j}

if it is total, i.e.,

(\forall \vec{x} \in R^{m}) f (\vec{x}) ↓

, and actually partially computable.

A program

P_{L}^{j}

that implements an actually computable

f (\vec{x})

is guaranteed to have a computation for any signifiable

\vec{x}

. However, Inequality (13) may still hold if a non-signifiable number is produced during a computation. Functions can be defined for a specific

D_{j}

so that they deal only with signifiable numbers, e.g., whose domains and codomains are, respectively, finite signifiable proper subsets of

R^{m}

and

R

. The next definition characterizes these functions.

Definition 6.

A function

f : R^{m} \mapsto R

,

m \in N^{+}

, is absolutely actually computable on

D_{j}

if it is actually computable and Inequality (13) holds for no computation of

P_{L}^{j} (\vec{x})

, where

\vec{x}

is signifiable on

D_{j}

.

An implication of Definitions 4–6 is that if

f : N^{m} \mapsto N

satisfies Definition 4, it is partially computable according to Definition 1, and if it satisfies Definitions 5 or 6, it is computable according to Definition 2, because, if no memory limitations are placed on registers, every natural number is signifiable.

We call an FMD

D_{j}

sufficiently significant if three conditions are satisfied. First, a programming language L for

D_{j}

exists with the same control structures as the programming language

L

described in Section 3.1 such that L (1) is capable of signifying a finite subset of

R

and (2) capable of specifying the following operations on numbers: addition, subtraction, multiplication, division, assignment, i.e., setting the value of a register to a number sign, comparison, i.e.,

a = b

,

a < b

,

a > b

,

a \leq b

,

a \geq b

, on any signifiable a and b, and the truncation of the signs of non-signifiable numbers to fit them into registers. Second, the finite memory of

D_{j}

suffices to hold L programs of length

\leq N \in N^{+}

, where the length of the program is the number of instructions in it. Third, the finite memory of

D_{j}

suffices, in addition to holding a program of at most N instructions, to hold number signs in

K \in N^{+}

registers.

Lemma 3.

Let an FMA

D_{j}

be sufficiently significant with

K \geq 7

,

a, b

signifiable,

b - a \geq Δ_{j}

, and let

a + z Δ_{j}

,

z > 0

, be the smallest signifiable number greater than or equal to b. Let

μ_{a, b}^{j}

:

R_{a, b}^{j}

↦

I_{a, b}^{j}

be the bijection in (5). Let

P_{L}^{j} (x)

,

x \in R_{a, b}^{j}

, be a program for

D_{j}

that iterates from a to

a + z Δ_{j} \geq b

in positive unit integer increments of

Δ_{j}

until k or z that satisfies the conditions in (3) is encountered, and the length of

P_{L}^{j}

\leq N

. Then,

μ_{a, b}^{j}

is absolutely actually computable.

Proof.

Since

a, b

, and

a + z Δ_{j}

are signifiable, so are

d o m (μ_{a, b}^{j})

and

c o d o m (μ_{a, b}^{j})

. The finite memory of

D_{j}

suffices to hold

P_{L}^{j}

, and

P_{L}^{j}

needs access to five signifiable numbers to iterate over

d o m (μ_{a, b}^{j})

: a, b, i,

Δ_{j}

,

a + i Δ_{j}

. Since

K \geq 7

, the signs of these numbers are placed in registers

ρ_{1}

,

ρ_{2}

,

ρ_{3}

,

ρ_{4}

, and

ρ_{5}

. After

x \in d o m (μ_{a, b}^{j})

is placed in register

ρ_{6}

,

P_{L}^{j}

sets

ρ_{3}

to 0. If

x < b

,

P_{L}^{j}

goes into a while loop with the condition of

ρ_{5} < ρ_{2}

, i.e.,

a + i Δ_{j} < b

. Inside the loop, when

ρ_{5} = ρ_{6}

,

ρ_{3}

is incremented by 1 and placed into the output register

ρ_{7}

, and

P_{L}^{j}

exits. Otherwise, the loop continues with

ρ_{3}

incremented by 1. If

x = b

,

P_{L}^{j}

goes into a while loop with the condition of

ρ_{5} \leq ρ_{2}

, i.e.,

a + i Δ_{j} \leq b

, and keeps incrementing

ρ_{3}

by 1 inside the loop. After the loop terminates,

ρ_{3}

is incremented by 1 and placed into the output register

ρ_{7}

, and

P_{L}^{j}

exits. □

A corollary of Lemma 3 is that

{μ_{a, b}^{j}}^{- 1}

is absolutely actually computable.

4. A Recursive Formalization of Feedforward Artificial Neural Networks

A trained feedforward artificial neural network (FANN)

N_{z}^{j}

implemented in a programming language L on a sufficiently significant FMA

D_{j}

is a finite set of artificial neurons, each of which is connected to a finite number of the neurons in the same set through the synapses, i.e., directed weighted edges (See Figure 1). The neurons are organized into

k + 1

layers

E_{0}, E_{1}, \dots, E_{k}

, with

E_{0}

being the input layer;

E_{k}

being the output layer; and

E_{e}

,

0 < e < k

, being the hidden layers. We let

E_{z}^{j}

denote the number of layers in

N_{z}^{j}

and

n_{z, i}^{j, e}

refer to the i-th neuron in layer

E_{e}

in

N_{z}^{j}

. We abbreviate

n_{z, i}^{j, e}

to

n_{i}^{e}

, because

n_{i}^{e}

always refers to a unique neuron in

N_{z}^{j}

. The function

n n_{z}^{j} (e) : N \mapsto N^{+}

specifies the number of neurons in layer

E_{e}

of

N_{z}^{j}

and is abbreviated

n n (e)

.

We assume that

N_{z}^{j}

is trained, i.e., the synapse weights are fixed automatically or manually, and fully connected, i.e., there is a synapse from every neuron in layer

E_{e - 1}

to every neuron in layer

E_{e}

. Each synapse has a weight, i.e., a signifiable real number, associated with it. We let

w_{i, j}^{e}

,

0 < e < E_{z}^{j}

, denote the weight of the synapse from

n_{i}^{e - 1}

to

n_{j}^{e}

(see Figure 1) and

{\vec{w}}^{e}

refer to a vector of all synaptic weights between

E_{e - 1}

and

E_{e}

. We define

{\vec{w}}^{0} = ()

. Thus, for the FANN

N_{z}^{j}

in Figure 1,

{\vec{w}}^{1} = (w_{0, 0}^{1}, w_{0, 1}^{1}, w_{0, 2}^{1}, w_{1, 0}^{1}, w_{1, 1}^{1}, w_{1, 2}^{1})

and

{\vec{w}}^{2} = (w_{0, 0}^{2}, w_{0, 1}^{2}, w_{1, 0}^{2}, w_{1, 1}^{2}, w_{2, 0}^{2}, w_{2, 1}^{2})

. We assume, without loss of generality, that all numbers in

{\vec{w}}^{e}

are in

R_{0, 1}^{j}

defined in (1), because, if that is not the case, they can be so scaled, nor is there any loss of generality associated with the assumption of full connectivity, because partial connectivity can be defined by setting the weights of the appropriate synapses to 0.

If

R_{0, 1}^{j}

is abbreviated to

R_{0, 1}

, each

n_{i}^{e}

in

N_{z}^{j}

,

e > 0

, computes an activation function

α_{i}^{e} ({\vec{a}}^{e - 1}, {\vec{w}}^{e}) : {R_{0, 1}}^{n n (e - 1)} \mapsto R_{0, 1},

(14)

where

{\vec{a}}^{e - 1}

is the vector of the activations, i.e., real signifiable numbers, of the neurons in layer

E_{e - 1}

. For

e = 0

,

α_{i}^{0} (\vec{x}, ()) = {\vec{x}}_{i},

(15)

where

\vec{x} \in {R_{0, 1}}^{n n (0)}

and

{\vec{x}}_{i} \in R_{0, 1}

,

0 \leq i < n n (0)

. Thus, if

n n (0) = 3

, as in Figure 1, then, given the input

\vec{x} = (x_{0}, x_{1}, x_{2}) = (0.0, 0.3, 0.6)

,

α_{0}^{0} (\vec{x}, ()) = {\vec{x}}_{0} = x_{0} = 0.0

,

α_{1}^{0} (\vec{x}, ()) = {\vec{x}}_{1} = x_{1} = 0.3

,

α_{2}^{0} (\vec{x}, ()) = {\vec{x}}_{2} = x_{2} = 0.6

. Since

N_{z}^{j}

is implemented on a sufficiently significant

D_{j}

, all activation functions

α_{i}^{e} (\cdot)

are absolutely actually computable. It is irrelevant to our discussion whether the activation functions are the same, e.g., sigmoid, for all or some neurons, or each neuron has its own activation function.

The term feedforward means that the activations of the neurons are computed layer by layer from the input layer to the output layer, because the activation functions of the neurons in the next layer require only the weights of the synapses connecting the next layer with the previous one and the activation values, i.e., the outputs of the activation functions of the neurons in the previous layer. To define the activation vectors of individual layers, let

\begin{matrix} {\vec{a}}^{0} & = & (α_{0}^{0} (\vec{x}, ()), \dots α_{n n (0) - 1}^{0} (\vec{x}, ())), \\ {\vec{a}}^{e} & = & (α_{0}^{e} ({\vec{a}}^{e - 1}, {\vec{w}}^{e}), \dots, α_{n n (e) - 1}^{e} ({\vec{a}}^{e - 1}, {\vec{w}}^{e})), \end{matrix}

(16)

where

0 < e < E_{z}^{j}

and

\vec{x}

is an input vector. For each

N_{z}^{j}

, we define the absolutely actually computable function that

N_{z}^{j}

computes as

\begin{matrix} f_{z}^{j} (\vec{x}, 0) & = & \vec{x}, \\ f_{z}^{j} (\vec{x}, e + 1) & = & (α_{0}^{e + 1} (f_{z}^{j} (\vec{x}, e), {\vec{w}}^{e + 1}), \dots, α_{n n (e + 1) - 1}^{e + 1} (f_{z}^{j} (\vec{x}, e), {\vec{w}}^{e + 1})) . \end{matrix}

(17)

If

e > E_{z}^{j} - 1

, let

f (\vec{x}, e) = ()

. The function

f_{z}^{j}

in (17) computes the feedforward activation of

N_{z}^{j}

layer by layer, i.e.,

f (\vec{x}, 0) = {\vec{a}}^{0}, f (\vec{x}, 1) = {\vec{a}}^{1}, \dots, f (\vec{x}, E_{z}^{j} - 1) = {\vec{a}}^{E_{z}^{j} - 1}

. For example, if

\vec{x} = (x_{0}, x_{1}) \in {R_{0, 1}}^{2}

is the input to

N_{z}^{j}

in Figure 1,

\begin{matrix} f_{z}^{j} (\vec{x}, 0) & = & {\vec{a}}^{0} = \vec{x}; \\ f_{z}^{j} (\vec{x}, 1) & = & (α_{0}^{1} (f_{z}^{j} (\vec{x}, 0), {\vec{w}}^{1}), α_{1}^{1} (f_{z}^{j} (\vec{x}, 0), {\vec{w}}^{1}), α_{2}^{1} (f_{z}^{j} (\vec{x}, 0), {\vec{w}}^{1})) \\ = & (α_{0}^{1} ({\vec{a}}^{0}, {\vec{w}}^{1}), α_{1}^{1} ({\vec{a}}^{0}, {\vec{w}}^{1}), α_{2}^{1} ({\vec{a}}^{0}, {\vec{w}}^{1})) \\ = & {\vec{a}}^{1} \in {R_{0, 1}^{j}}^{3}; \\ f_{z}^{j} (\vec{x}, 2) & = & (α_{0}^{2} (f_{z}^{j} (\vec{x}, 1), {\vec{w}}^{2})), α_{1}^{2} (f_{z}^{j} (\vec{x}, 1), {\vec{w}}^{2})) \\ = & (α_{0}^{2} ({\vec{a}}^{1}, {\vec{w}}^{2}), α_{1}^{2} ({\vec{a}}^{1}, {\vec{w}}^{2})) \\ = & {\vec{a}}^{2} \in {R_{0, 1}^{j}}^{2} . \end{matrix}

5. Finite Sets as Gödel Numbers

Our primitive recursive techniques to pack finite sets and Cartesian powers thereof into Gödel numbers in this section rely, in part, on our previous work on primitive recursive characteristics of chess [11], which, in turn, was based on several functions shown to be primitive recursive in [7]. For the reader’s convenience, Appendix A.1 in Appendix A gives the functions shown to be primitive recursive in [7] and gives the necessary auxiliary definitions and theorems. Appendix A.2 in Appendix A gives the functions or variants thereof shown to be primitive recursive in [11]. When we use the functions from [7,11] in this section, we refer to their definitions in the above two sections of Appendix A as necessary.

Let G be a Gödel number (G-number) as defined in (A8). The primitive recursive predicate

G P

in (18) uses the bounded existential quantification of a primitive recursive predicate defined in (A2) and the primitive recursive functions

{(x)}_{i}

and

L t (x)

, respectively, defined in (A9) and (A10).

\begin{matrix} G P (G) & \equiv & {L t (G) > 0} \land {{L t (G) = 1 \land L t ({(G)}_{1}) > 0} \lor \\ {{(\forall t)}_{\leq L t (G)} {{t > 1} \to {{L t ({(G)}_{t}) = L t ({(G)}_{1})} \land {L t ({(G)}_{t}) > 0}}}}} \end{matrix}

(18)

The logical structure of

G P

is

G P_{1} \land {G P_{2} \lor G P_{3}}

, where

G P_{1}

,

G P_{2}

, and

G P_{3}

are

\begin{matrix} G P_{1} & \equiv & {L t (G) > 0}; \\ G P_{2} & \equiv & {L t (G) = 1 \land L t ({(G)}_{1}) > 0}; \\ G P_{3} & \equiv & {(\forall t)}_{\leq L t (G)} {{t > 1} \to {{L t ({(G)}_{t}) = L t ({(G)}_{1})} \land {L t ({(G)}_{t}) > 0}}} . \end{matrix}

The predicate

G P

holds for G-numbers with at least one element and whose elements themselves have the same length, i.e., the same number of elements, greater than 0. Thus,

G P ([[1]])

,

G P ([[1], [2], [3]])

, and

G P ([[1, 2], [3, 4], [5, 6]])

, but

\neg G P ([[0]])

and

\neg G P ([[1], [3, 4, 5], [11, 10]])

.

Let G be a G-number, the predicate

\in_{g}

be as defined in (A13), the function

s (t)

be as defined in (11), and the function

x \otimes_{l} y

be as defined in (A15), and let

\begin{matrix} τ χ_{0} (G, 0) & = & 1, \\ τ χ_{0} (G, t + 1) & = & [[{(G)}_{s (t)}]] \otimes_{l} τ χ_{0} (G, t) . \end{matrix}

Then, the primitive recursive function

τ_{0} (G) = \{\begin{matrix} τ χ_{0} (G, L t (G)) & if L t (G) > 0 \land 0 \notin_{g} G, \\ 0 & otherwise \end{matrix}

(19)

turns a G-number into another G-number whose elements are the elements of the original G-number G, each of which is placed into a G-number whose length is 1. Thus,

τ_{0} ([11, 13]) = [[11], [13]]

. In general, if

G = [g_{1}, \dots, g_{n}]

,

L t (G) > 0

,

0 \notin_{g} G

, i.e.,

g_{i} \neq 0

, for

1 \leq i \leq n

, then

τ_{0} (G) = [[g_{1}], \dots, [g_{n}]]

.

Let

g \in N

, G be a G-number, the function

x \otimes_{r} y

be defined in (A16), and

\begin{matrix} τ χ_{1} (g, G, 0) & = & 1, \\ τ χ_{1} (g, G, t + 1) & = & [[g] \otimes_{r} [{(G)}_{s (t)}]] \otimes_{l} τ χ_{1} (g, G, t) . \end{matrix}

Then, the primitive recursive function

τ_{1} (g, G) = \{\begin{matrix} τ χ_{1} (g, G, L t (G)) & if g > 0 \land G P (G), \\ 0 & otherwise \end{matrix}

(20)

adds g to each element of G. Thus,

τ_{1} (1, [[2], [3]]) = [[1, 2], [1, 3]]

and

τ_{1} (3, [[1, 2], [4, 5]]) = [[3, 1, 2], [3, 4, 5]]

.

Let

G_{1}

and

G_{2}

be two G-numbers, and let

\begin{matrix} τ χ_{2} (G_{1}, G_{2}, 0) & = & 1, \\ τ χ_{2} (G_{1}, G_{2}, t + 1) & = & τ_{1} ({(G_{1})}_{s (t)}, G_{2}) \otimes_{l} τ χ_{2} (G_{1}, G_{2}, t) . \end{matrix}

Then, the primitive recursive function

τ_{2} (G_{1}, G_{2}) = \{\begin{matrix} τ χ_{2} (G_{1}, G_{2}, L t (G_{1})) & if 0 \notin_{g} G_{1} \land G P (G_{2}) \land L t (G_{1}) > 0, \\ 0 & otherwise \end{matrix}

(21)

adds each element of

G_{1}

to each element of

G_{2}

. Thus,

\begin{matrix} τ_{2} ([1], [[2], [3]]) & = & [[1, 2], [1, 3]]; \\ τ_{2} ([1, 2], [[4, 5], [6, 7]]) & = & [[1, 4, 5], [1, 6, 7], [2, 4, 5], [2, 6, 7]] . \end{matrix}

Let G be a G-number, and let

\begin{matrix} τ χ_{3} (G, 0) & = & τ_{0} (G), \\ τ χ_{3} (G, t + 1) & = & τ_{2} (G, τ_{3} (G, t)) . \end{matrix}

Then, the primitive recursive function

τ_{3} (G, t) = \{\begin{matrix} τ χ_{3} (G, t) & if 0 \notin_{g} G \land L t (G) > 0, \\ 0 & otherwise \end{matrix}

(22)

computes, for

t \in N^{+}

, a Gödel number whose components are Gödel numbers representing all sequences of

t + 1

elements of G. Thus,

\begin{matrix} τ_{3} ([1, 2], 1) & = & [[1, 1], [1, 2], [2, 1], [2, 2]] . \end{matrix}

Let

S = {a_{1}, a_{2}, \dots, a_{n}} \subset N^{+}

,

S \neq \emptyset

, and

G = [a_{1}, \dots, a_{n}]

. An induction on t shows that, for

t > 0

,

τ_{3} (G, t - 1)

is a G-number representation of

S^{t}

in the sense that

(a_{i_{1}}, \dots, a_{i_{t}}) \in S^{t}

iff

[a_{i_{1}}, \dots, a_{i_{t}}] \in_{g} τ_{3} (G, t - 1)

.

If

D_{j}

is an FMA, we let

G_{a, b}^{j} = g g n (1, | R_{a, b}^{j} |, 1),

(23)

where

R_{a, b}^{j}

is defined in (2) and

g g n (\cdot)

is defined in (A17). If we recall from Lemma 2 and (5) that

μ_{a, b}^{j}

:

R_{a, b}^{j}

↦

I_{a, b}^{j}

=

{1, \dots, z + 1}

, where

a + z Δ_{j}

is the smallest signifiable real number

\geq b

on

D_{j}

, we observe that

G_{a, b}^{j}

is a G-number representation of

I_{a, b}^{j}

. Thus, if we return to Example 2 and use the accessor function

{(x)}_{i}

in (A9), then for

G_{0, 1}^{j} = [1, 2, 3, 4, 5]

, we have

\begin{matrix} μ (0) & = & 1 & = & {(G_{0, 1}^{j})}_{1}; \\ μ (0.3) & = & 2 & = & {(G_{0, 1}^{j})}_{2}; \\ μ (0.6) & = & 3 & = & {(G_{0, 1}^{j})}_{3}; \\ μ (0.9) & = & 4 & = & {(G_{0, 1}^{j})}_{4}; \\ μ (1) & = & 5 & = & {(G_{0, 1}^{j})}_{5} . \end{matrix}

In general, for

x \in R_{a, b}^{j}

,

\begin{matrix} μ_{a, b}^{j} (x) & = t & = & {(G_{a, b}^{j})}_{t} \in I_{a, b}^{j} \\ {μ_{a, b}^{j}}^{- 1} ({(G_{a, b}^{j})}_{t}) & = & x . \end{matrix}

Let, for

t > 1

,

τ_{3}

in (22), and

x ∸ y

in (A4),

G_{a, b}^{t, j} = τ_{3} (G_{a, b}^{j}, t ∸),

(24)

and, in particular, for

a = 0

and

b = 1

, let

G_{0, 1}^{t, j} = τ_{3} (G_{0, 1}^{j}, t ∸) .

(25)

Then,

G_{0, 1}^{t, j}

is a G-number representation of

{I_{0, 1}^{j}}^{t}

, i.e., the t-th Cartesian power of

I_{0, 1}^{j}

. Since both

τ_{3}

and ∸ are primitive recursive functions,

G_{a, b}^{t, j} \in N

and

G_{0, 1}^{t, j} \in N

are primitive recursively computable.

Example 3.

Let

R = {0, 0.3, 0.6, 0.9, 1}

,

I = {1, 2, 3, 4, 5}

and

t = 2

. Then,

\begin{matrix} G_{0, 1}^{2, j} & = & τ_{3} (G_{0, 1}^{2}, 2 ∸ 1) \\ = & τ_{3} (g g n (1, | R |, 1), 1), 1) \\ = & τ_{3} ([1, 2, 3, 4, 5], 1) \\ = & [[1, 1], [1, 2], [1, 3], [1, 4], [1, 5], \\ [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], \\ [3, 1], [3, 2], [3, 3], [3, 4], [3, 5], \\ [4, 1], [4, 2], [4, 3], [4, 4], [4, 5], \\ [5, 1], [5, 2], [5, 3], [5, 4], [5, 5]] . \end{matrix}

We note that

(x, y) \in I^{2}

iff

[x, y] \in G_{0, 1}^{2, j}

.

Let

\vec{x} \in {R_{a, b}^{j}}^{t}

,

t > 0

,

\tilde{x} \in_{g} G_{a, b}^{t, j}

, and let

η_{a, b}^{t, j} : {R_{a, b}^{j}}^{t} \mapsto N

and

ζ_{a, b}^{t, j} : N \mapsto {R_{a, b}^{j}}^{t}

be defined as

\begin{matrix} η_{a, b}^{t, j} (\vec{x}) & = & [μ_{a, b}^{j} ({\vec{x}}_{0}), \dots μ_{a, b}^{j} ({\vec{x}}_{t - 1})] = \tilde{x}; \\ ζ_{a, b}^{t, j} (\tilde{x}) & = & ({μ_{a, b}^{j}}^{- 1} ({(\tilde{x})}_{1}), \dots, {μ_{a, b}^{j}}^{- 1} ({(\tilde{x})}_{t})) . \end{matrix}

(26)

If

R_{a, b}^{j}

is signifiable,

η_{a, b}^{t, j} (\vec{x}) = \tilde{x}

iff

ζ_{a, b}^{t, j} (\tilde{x}) = \vec{x}

, for any

\vec{x} \in {R_{a, b}^{j}}^{t}

. If

\tilde{x}

is not signifiable,

η_{a, b}^{t, j}

and

ζ_{a, b}^{t, j}

are actually computable; if

\tilde{x}

is signifiable, the functions are absolutely actually computable.

Example 4.

To continue with Example 3, if

\vec{x} = (0.9, 0.6) \in {R_{0, 1}}^{2}

and

\tilde{x} = [4, 3] \in G_{0, 1}^{2, j}

, then, if we abbreviate

η_{0, 1}^{2}

,

ζ_{0, 1}^{2}

to

η^{2}

,

ζ^{2}

, we have

\begin{matrix} η^{2} (\vec{x}) & = & (μ (0.9), μ (0.6)) & = & [4, 3]; \\ ζ^{2} (\tilde{x}) & = & (μ^{- 1} ({(\tilde{x})}_{1}), μ^{- 1} (({\tilde{x}}_{2}))) & = & (μ^{- 1} (4), μ^{- 1} (3)) & = & (0.9, 0.6) . \end{matrix}

6. Numbers $Ω_{z, i}^{j, e}$ and $Ω_{z}^{j}$ : Packing FANNs into Natural Numbers

Let us assume that

μ_{0, 1}^{j}

is absolutely actually computable on a sufficiently significant FMA

D_{j}

and abbreviate

μ_{0, 1}^{j}

to

μ

,

ζ_{0, 1}^{t, j}

to

ζ^{t}

, and

G_{0, 1}^{t, j}

in (25) to

G^{t}

. Let

〈 x, y 〉

be as defined in (A5) and

L t (x)

be as defined in (A10). Then, for each input neuron

n_{i}^{0}

in an FANN

N_{z}^{j}

, let

\begin{matrix} Ω_{z, i}^{j, 0} = Ω_{i}^{0} = \\ [〈{(G^{n n (0)})}_{1}, μ (α_{i}^{0} (ζ^{n n (0)} ({(G^{n n (0)})}_{1}), ()))〉, \\ \dots, \\ 〈{(G^{n n (0)})}_{L t (G^{n n (0)})}, μ (α_{i}^{0} (ζ^{n n (0)} ({(G^{n n (0)})}_{L t (G^{n n (0)})}), ()))〉] . \end{matrix}

(27)

We recall that

E_{z}^{j} > 0

is the number of layers in

N_{z}^{j}

. Then, for a hidden or output neuron

n_{i}^{e}

,

0 < e < E_{z}^{j}

, let

\begin{matrix} Ω_{z, i}^{j, e} = Ω_{i}^{e} = \\ [〈{(G^{n n (e - 1)})}_{1}, μ (α_{i}^{e} (ζ^{n n (e - 1)} ({(G^{n n (e - 1)})}_{1}), {\vec{w}}^{e}))〉, \\ \dots, \\ 〈{(G^{n n (e - 1)})}_{L t (G^{n n (e - 1)})}, μ (α_{i}^{e} (ζ^{n n (e - 1)} ({(G^{n n (e - 1)})}_{L t (G^{n n (e - 1)})}), {\vec{w}}^{e}))〉] . \end{matrix}

(28)

For an FANN

N_{z}^{j}

on

D_{j}

and

E = E_{z}^{j} - 1

, let

\begin{matrix} Ω_{z}^{j} & = & [〈0, [Ω_{0}^{0}, \dots, Ω_{n n (0) - 1}^{0}]〉, \dots, 〈E, [Ω_{0}^{E}, \dots, Ω_{n n (E) - 1}^{E}]〉] . \end{matrix}

(29)

An implication of the definitions of

〈 x, y 〉

in (A5) and the G-number in (A8) is that

Ω_{z}^{j}

is unique for

N_{z}^{j}

, because the only way for another FANN

N_{k}^{j}

on

D_{j}

to have

Ω_{k}^{j} = Ω_{z}^{j}

is for

N_{k}^{j}

to have the same number of layers, the same number of neurons in each layer, the same activation function in each neuron, and the same synapse weights between the same neurons, i.e.,

N_{k}^{j} = N_{z}^{j}

. Appendix A.3 in Appendix A gives several examples of how the

Ω

numbers are computed for

N_{z}^{j}

in Figure 1.

Lemma 4.

Let

μ_{0, 1}^{j}

be absolutely actually computable on a sufficiently significant FMA

D_{j}

and let

N_{z}^{j}

be an FANN implemented on

D_{j}

. Let

0 \leq i < n n (0)

,

0 \leq k < n n (e)

,

0 < e < E_{z}^{j}

, and

G_{0, 1}^{t, j}

in (25) be signifiable on

D_{j}

. Then,

Ω_{z, i}^{j, 0} = Ω_{i}^{0} \in N

and

Ω_{z, i}^{j, e} = Ω_{i}^{e} \in N

.

Proof.

We abbreviate

μ_{0, 1}^{j}

to

μ

,

ζ_{0, 1}^{t}

to

ζ^{t}

, and

G_{0, 1}^{t, j}

to

G^{t}

, and let

\begin{matrix} z_{0} & = & μ (α_{i}^{0} (ζ^{n n (0)} ({(G^{n n (0)})}_{t_{0}}), ())); \\ z_{e} & = & μ (α_{k}^{e} (ζ^{n n (e - 1)} ({(G^{n n (e - 1)})}_{t_{e - 1}}), {\vec{w}}^{e})), \end{matrix}

where

0 < t_{0} \leq n n (0)

and

0 < t_{e - 1} \leq n n (e - 1)

. Since

μ

is absolutely actually computable and

G^{t}

signifiable,

ζ^{n n (0)}, ζ^{n n (1)}, \dots, ζ^{n n (e - 1)}

are absolutely actually computable. Thus,

z_{0}, z_{e} \in N

. The statement of the lemma then follows from the definitions of

〈 x, y 〉

in (A5) and the G-number in (A8). □

7. FANNs and Primitive Recursive Functions

For

0 \leq e < E_{z}^{j}

,

0 \leq i < n n (e)

,

x \in N

, let

\begin{matrix} {\tilde{α}}_{i}^{e} (x) & = & r (a s c (x, {(r (a s c (e, Ω_{z}^{j})))}_{i + 1})), \end{matrix}

(30)

where

r (\cdot)

and

a s c (\cdot)

are defined in (A6) and (A19), respectively. An example of computing

{\tilde{α}}_{i}^{e}

is given at the end of Appendix A.3 in the Appendix A.

Lemma 5.

Let

μ_{0, 1}^{j}

, abbreviated as μ, be absolutely actually computable on a sufficiently significant FMA

D_{j}

and let

N_{z}^{j}

be an FANN implemented on

D_{j}

. Let

G_{0, 1}^{t}

in (25), abbreviated as

G^{t}

, be signifiable. Let

0 \leq e < E_{j}^{z}

,

η_{0, 1}^{t, j} (\vec{x}) = η^{t} (\vec{x}) = \tilde{x}

=

[μ ({\vec{a}}_{0}^{e}), \dots, μ ({\vec{a}}_{n n (e) - 1}^{e})] \in N

, where

{\vec{a}}^{e}

is defined in (16). Then,

{\tilde{α}}_{i}^{e} (\tilde{x}) = \{\begin{matrix} μ (α_{i}^{0} (ζ^{n n (0)} ({(G^{n n (0)})}_{t}), {\vec{w}}^{0})) & if e = 0, \\ μ (α_{i}^{e} (ζ^{n n (e - 1)} ({(G^{n n (e - 1)})}_{t}), {\vec{w}}^{e})) & if e > 0, \end{matrix}

where

t = a s x (\tilde{x}, G^{n n (0)})

, for

1 \leq t \leq L t (G^{n n (0)})

and

e = 0

;

t = a s x (\tilde{x}, G^{n n (e - 1)})

, for

1 \leq t \leq L t (G^{n n (e - 1)})

and

e > 0

; and

a s x

is as defined in (A18).

Proof.

By (28)–(30) and (A18), we have

\begin{matrix} {\tilde{α}}_{i}^{e} (\tilde{x}) & = & r (a s c (\tilde{x}, {(r (a s c (e, Ω_{z}^{j})))}_{i + 1})) \\ = & r (a s c (\tilde{x}, {(r (〈e, [Ω_{0}^{e}, \dots, Ω_{n n (e) - 1}^{e}]〉))}_{i + 1})) \\ = & r (a s c (\tilde{x}, {([Ω_{0}^{e}, \dots, Ω_{n n (e) - 1}^{e}])}_{i + 1})) \\ = & r (a s c (\tilde{x}, Ω_{i}^{e})) \end{matrix}

If

e = 0

, then

t = a s x (\tilde{x}, G^{n n (0)})

, for

1 \leq t \leq L t (G^{n n (0)})

. Thus,

\begin{matrix} {\tilde{α}}_{i}^{e} (\tilde{x}) & = & r (a s c (\tilde{x}, Ω_{i}^{e})) & = & μ (α_{i}^{0} (ζ^{n n (0)} (\tilde{x})), ()) . \end{matrix}

If

e > 0

, then

t = a s x (\tilde{x}, G^{n n (e - 1)})

, for

1 \leq t \leq L t (G^{n n (e - 1)})

. Thus,

\begin{matrix} {\tilde{α}}_{i}^{e} (\tilde{x}) & = & r (a s c (\tilde{x}, Ω_{i}^{e})) & = & μ (α_{i}^{e} (ζ^{n n (e - 1)} (\tilde{x})), {\vec{w}}^{e}) . \end{matrix}

□

If

e = 0

and

{(G^{n n (0)})}_{t} = \tilde{x}

, for

0 < t \leq L t (G^{n n (0)})

, let

\begin{matrix} {\tilde{a}}^{0} & = & [{\tilde{α}}_{0}^{0} (\tilde{x}), \dots, {\tilde{α}}_{n n (0) - 1}^{0} (\tilde{x})] . \end{matrix}

(31)

If

0 < e < E_{z}^{j}

and

{(G^{n n (e - 1)})}_{t} = \tilde{x}

, for

0 < t \leq L t (G^{n n (e - 1)})

, let

\begin{matrix} {\tilde{a}}^{e} & = & [{\tilde{α}}_{0}^{e} (\tilde{x}), \dots, {\tilde{α}}_{n n (e) - 1}^{e} (\tilde{x})] . \end{matrix}

(32)

Theorem 1.

Let

N_{z}^{j}

be an FANN with

E_{z}^{j} > 0

layers on a sufficiently significant FMA

D_{j}

, and let

f_{z}^{j} (\vec{x}, e)

in (17) be absolutely actually computable. Let

μ_{0, 1}^{j} (\cdot)

be absolutely actually computable and

G_{0, 1}^{t, j}

, for

t \in {n n (e) | 0 \leq e < E_{z}^{j}}

, be signifiable. Then, if

\tilde{x} = [μ_{0, 1}^{j} ({\vec{x}}_{0}), \dots, μ_{0, 1}^{j} ({\vec{x}}_{n n (0) - 1})] = {\tilde{a}}^{0} = η_{0, 1}^{n n (0), j} (\vec{x})

, where

η_{0, 1}^{t, j}

is defined in (26), there exists a primitive recursive function

{\tilde{f}}_{z}^{j} (\tilde{x}, e)

such that

\begin{matrix} f_{z}^{j} (\vec{x}, e) = {\vec{a}}^{e} & iff & {\tilde{f}}_{z}^{j} (\tilde{x}, e) = {\tilde{a}}^{e} . \end{matrix}

Proof.

Let us abbreviate

f_{z}^{j}

to f,

μ_{0, 1}^{j}

to

μ

,

{μ_{0, 1}^{j}}^{- 1}

to

μ^{- 1}

,

η_{0, 1}^{t, j}

to

η^{t}

,

ζ_{0, 1}^{t, j}

to

ζ^{t}

, and

G_{0, 1}^{t, j}

to

G^{t}

. Since

G^{t}

is signifiable,

ζ^{t}

and

η^{t}

are absolutely actually computable. Let

\begin{matrix} {\tilde{f}}_{z}^{j} (\tilde{x}, 0) & = & \tilde{x}, \\ {\tilde{f}}_{z}^{j} (\tilde{x}, e + 1) & = & [{\tilde{α}}_{0}^{e + 1} ({\tilde{f}}_{z}^{j} (\tilde{x}, e)), \dots, {\tilde{α}}_{n n (e + 1) - 1}^{e + 1} ({\tilde{f}}_{z}^{j} (\tilde{x}, e))] . \end{matrix}

Let us abbreviate

{\tilde{f}}_{z}^{j}

to

\tilde{f}

, and let

e = 0

. Then

f (\vec{x}, 0) = {\vec{a}}^{0} = \vec{x}

and

\tilde{f} (\tilde{x}, 0) = \tilde{x}

. We observe that

\begin{matrix} \tilde{x} & = & [μ ({\vec{x}}_{0}), \dots, μ ({\vec{x}}_{n n (0) - 1})] \\ = & [μ (α_{0}^{0} (\vec{x}, ())), \dots, μ (α_{n n (0) - 1}^{0} (\vec{x}, ()))] \\ = & η^{n n (0)} (\vec{x}) . \end{matrix}

Since

μ

is an absolutely actually computable bijection,

\begin{matrix} \vec{x} & = & (μ^{- 1} (μ ({\vec{x}}_{0})), \dots, μ^{- 1} (μ ({\vec{x}}_{n n (0) - 1})) \\ = & (μ^{- 1} ({(\tilde{x})}_{1}), \dots, μ^{- 1} ({(\tilde{x})}_{n n (0)})) \\ = & ζ^{n n (0)} (\tilde{x}) . \end{matrix}

By (26),

η^{n n (0)} (\vec{x}) = \tilde{x}

iff

ζ^{n n (0)} (\tilde{x}) = \vec{x}

. Thus,

f (\vec{x}, 0) = {\vec{a}}^{0}

iff

\tilde{f} (\tilde{x}, 0) = \tilde{x}

.

Let

e = 1

. Then,

\begin{matrix} f (\vec{x}, 1) & = & {\vec{a}}^{1} \\ = & (α_{0}^{1} ({\vec{a}}^{0}, {\vec{w}}^{1}), \dots, α_{n n (1) - 1}^{1} ({\vec{a}}^{0}, {\vec{w}}^{1})) \\ = & (α_{0}^{1} (\vec{x}, {\vec{w}}^{1}), \dots, α_{n n (1) - 1}^{1} (\vec{x}, {\vec{w}}^{1})) . \end{matrix}

By Lemma 5,

\begin{matrix} \tilde{f} (\tilde{x}, 1) & = & [{\tilde{α}}_{0}^{1} (\tilde{f} (\tilde{x}, 0)), \dots, {\tilde{α}}_{n n (1) - 1}^{1} (\tilde{f} (\tilde{x}, 0))] \\ = & [{\tilde{α}}_{0}^{1} (\tilde{x}), \dots, {\tilde{α}}_{n n (1) - 1}^{1} (\tilde{x})] \\ = & [μ (α_{0}^{1} (\vec{x}, {\vec{w}}^{1})), \dots, μ (α_{n n (1) - 1}^{1} (\vec{x}, {\vec{w}}^{1}))] \\ = & [μ (α_{0}^{1} ({\vec{a}}^{0}, {\vec{w}}^{1})), \dots, μ (α_{n n (1) - 1}^{1} ({\vec{a}}^{0}, {\vec{w}}^{1}))] \\ = & [μ ({\vec{a}}_{0}^{1},), \dots, μ ({\vec{a}}_{n n (1) - 1}^{1})] \\ = & {\tilde{a}}^{1} = η^{n n (1)} ({\vec{a}}^{1}) . \end{matrix}

Since

μ

is an absolutely actually computable bijection,

{\vec{a}}^{1} = (μ^{- 1} (μ ({({\tilde{a}}^{1})}_{1})), \dots, μ^{- 1} (μ ({({\tilde{a}}^{1})}_{n n (1)}))),

whence, since

ζ^{n n (1)} ({\tilde{a}}^{1}) = {\vec{a}}^{1}

iff

η^{n n (1)} ({\vec{a}}^{1}) = {\tilde{a}}^{1}

,

f (\vec{x}, 1) = {\vec{a}}^{1}

iff

\tilde{f} (\tilde{x}, 1) = {\tilde{a}}^{1}

.

Let us assume

f (\vec{x}, e) = {\vec{a}}^{e}

iff

\tilde{f} (\tilde{x}, e) = {\tilde{a}}^{e}

for

e \geq 1

. Then,

\begin{matrix} f (\vec{x}, e + 1) & = & {\vec{a}}^{e + 1} \\ = & (α_{0}^{e + 1} (f (\vec{x}, e), {\vec{w}}^{e + 1}), \dots, α_{n n (e + 1) - 1}^{e + 1} (f (\vec{x}, e), {\vec{w}}^{e + 1}) \\ = & (α_{0}^{e + 1} ({\vec{a}}^{e}, {\vec{w}}^{e + 1}), \dots, α_{n n (e + 1) - 1}^{e + 1} ({\vec{a}}^{e}, {\vec{w}}^{e + 1})), \end{matrix}

and

\begin{matrix} \tilde{f} (\tilde{x}, e + 1) & = & [{\tilde{α}}_{0}^{e + 1} (\tilde{f} (\tilde{x}, e)), \dots, {\tilde{α}}_{n n (e + 1) - 1}^{e + 1} (\tilde{f} (\tilde{x}, e))] \\ = & [{\tilde{α}}_{0}^{e + 1} ({\tilde{a}}^{e}), \dots, {\tilde{α}}_{n n (e + 1) - 1}^{e + 1} ({\tilde{a}}^{e})] \\ = & [μ (α_{0}^{e + 1} ({\vec{a}}^{e}, {\vec{w}}^{e + 1})), \dots, μ (α_{n n (e + 1) - 1}^{e + 1} ({\vec{a}}^{e}, {\vec{w}}^{e + 1}))] \\ = & η^{n n (e + 1)} ({\vec{a}}^{e + 1}) . \end{matrix}

Then,

{\vec{a}}^{e + 1} = (μ^{- 1} (μ ({({\tilde{a}}^{e + 1})}_{1})), \dots, μ^{- 1} (μ ({({\tilde{a}}^{e + 1})}_{n n (e + 1)}))),

whence, by induction, since

ζ^{n n (e + 1)} ({\tilde{a}}^{e + 1}) = {\vec{a}}^{e + 1}

iff

η^{n n (e + 1)} ({\vec{a}}^{e + 1}) = {\tilde{a}}^{e + 1}

,

f (\vec{x}, e + 1) = {\vec{a}}^{e + 1}

iff

\tilde{f} (\tilde{x}, e + 1) = {\tilde{a}}^{e + 1}

. □

Let, for

\vec{x} \in {R_{0, 1}^{j}}^{n n (0)}

and

E_{z}^{j} > 0

,

A_{z}^{j} (\vec{x}) = f_{z}^{j} (\vec{x}, E_{z}^{j} - 1),

(33)

and, for

\tilde{x} = η^{n n (0)} (\vec{x})

, let

{\tilde{A}}_{z}^{j} (\vec{x}) = {\tilde{f}}_{z}^{j} (\tilde{x}, E_{z}^{j} ∸ 1) .

(34)

Then,

A_{z}^{j} (\vec{x})

is the absolutely actually computable function computed by

N_{z}^{j}

and, by Theorem 1,

{\tilde{A}}_{z}^{j}

is primitive recursive. We are now in a position to prove the final theorem of this article.

Theorem 2.

Let

N_{j} = \{N_{1}^{j}, N_{2}^{j}, \dots, N_{k}^{j}\}, k \in N^{+},

(35)

be the set of FANNs implemented on a sufficiently significant FMA

D_{j}

, and let

A_{j} = \{A_{1}^{j}, A_{2}^{j}, \dots, A_{k}^{j}\}, k \in N^{+},

(36)

be the set of corresponding absolutely actually computable functions of the FANNs in

N_{j}

, as defined in (33). There exists a bijection between

N_{j}

and a class of primitive recursive functions.

Proof.

Let

O_{j} = \{Ω_{1}^{j}, Ω_{2}^{j}, \dots, Ω_{k}^{j}\}, k \in N^{+},

(37)

be the set of the numbers

Ω_{z}^{j}

defined in (28), each of which uniquely corresponds to

N_{z}^{j} \in N_{j}

. Let

F_{j} = \{{\tilde{A}}_{1}^{j}, {\tilde{A}}_{2}^{j}, \dots, {\tilde{A}}_{k}^{j}\}, k \in N^{+},

(38)

be a class of primitive recursive functions, one function per each

Ω_{z}^{j} \in O_{z}

, as defined in (34). We observe that

| N_{j} | = | A_{j} | = | O_{j} | = | F_{j} | = k .

Let

λ_{1} : N_{j} \mapsto A_{j}

,

λ_{2} : A_{j} \mapsto O_{j}

, and

λ_{3} : O_{j} \mapsto F_{j}

be defined as

\begin{matrix} λ_{1}^{j} (N_{j}) & = & A_{z}^{j}; \\ λ_{2}^{j} (A_{z}^{j}) & = & Ω_{z}^{j}; \\ λ_{3}^{j} (Ω_{z}^{j}) & = & {\tilde{A}}_{z}^{j} . \end{matrix}

Then,

λ_{j} : N_{j} \mapsto F_{j}

, defined as

λ_{j} (N_{z}^{j}) = λ_{3}^{j} (λ_{2}^{j} (λ_{1}^{j} (N_{z}^{j}))),

(39)

is a bijection. □

8. Discussion

The definition of the finite memory device or automation (FMD or FMA) in Section 2.2 has four main implications. First, a physical or abstract automaton is an FMD when its memory amount is quantifiable as a natural number. Second, characters and strings are not necessary, because bijections exist between any finite alphabet of symbols and natural numbers and, through Gödel numbering, between any strings over a finite alphabet and natural numbers, hence the term numerical memory used in the article. Third, an FSA of classical computability becomes an FMA when the quantity of its internal and external memory is finite, i.e., there is an upper bound in the form of a natural number on the quantity of the machine’s memory. It is irrelevant for the scope of this investigation whether the input tape of an FSA, the input and output tapes of such FSA modifications as the Mealy and Moore machines (Chapter 2 in [12]) or the finite state transducers (Chapter 3 in [13]), and the input tape and the stack of a pushdown automaton (PDA) (Chapter 5 in [12]) are considered internal or external memory. Fourth, a universal Turing machine (UTM) (Chapter 6 in [7]) is an FMA when the number of its tape cells is bounded by a natural number, which a fortiori makes any physical computer an FMA. Thus, only one type of universal computer is needed to define all FMA it can simulate.

Consider a universal computer

U C

capable of executing the universal L program

U_{1}

constructed to prove the Universality Theorem (Theorem 3.1, Chapter 3 in [7]). The computer

U C

, equivalent to a UTM, takes an arbitrary L program P, an input to that program in the form of a natural number stored in its input register

X_{1}

, which can be a Gödel number encoding an array of numbers, executes P on

X_{1}

by encoding the memory of P as another Gödel number and returns the output of P as a natural number, which can also be a Gödel number encoding a sequence of natural numbers, saved in its output register Y. Since characters and character sequences can be bijectively mapped to natural numbers,

U C

can simulate any FSA or a modification thereof, e.g., a Mealy machine, a Moore machine, a finite state transducer, or a PDA. Technically speaking, there is no need to distinguish between the Mealy and Moore machines, because they are equivalent (Theorems 2.6, 2.7, Chapter 2 in [12]). When a limit is placed on the numerical memory of

U C

by way of the number of registers it can use and the size of the numbers signifiable in them, the input and output registers included,

U C

immediately becomes an FMD and so a fortiori any device that

U C

is capable of simulating.

The separation of computability into the two overlapping categories, general and actual, is necessary for theoretical and practical reasons. A theoretical reason, generally accepted in classical computability theory, is that it is of no advantage to put any memory limitations on automata or on the a priori counts of unit time steps that automata may take to execute programs that implement functions in order to show that those functions are computable. Were it not the case, we would not be able to investigate what is computable in principle. Rogers [10] succinctly expresses this point of view:

"[w]e thus require that a computation

terminate after some finite number

of steps; we do not insist on an a

priori ability to estimate this number."

An implication of the above assumption is that an automaton, explicit or implicit, on which the said computation is executed has access to, literally, astronomical quantities of numerical memory. For a thought experiment, consider an automaton programmable in

L

of Chapter 2 of [7] that we used in Section 3.1, and let a program

P_{L}^{j} (n)

,

n \in N^{+}

, compute the G-number of the sequence

(1, \dots, n)

, i.e., the function computed by

P_{L}^{j}

is

f (n) = [1, \dots, n]

, as defined in (A8). Then,

f (n)

is a primitive recursive function and, hence, computable in the general sense of Definition 2. Thus,

f (n)

is signifiable for any

n \in N^{+}

on the automaton. In particular, if n is the Eddington number, i.e.,

n = 10^{80} \in N^{+}

, estimating the number of hydrogen atoms in the observable universe [14], there is a computation and, by implication, a variable in

P_{L}^{j}

to which the G-number of

(1, 2, \dots, 10^{80})

can be assigned.

The foregoing paragraph brings us to a practical reason for separating computability into the general and actual categories: it is of little use for an applied scientist who wants to implement a number-theoretic function f in a programming language L for an FMA

D_{j}

to know that f is generally computable and the L program can, therefore, compute, in principle, some characteristic of arbitrarily large natural numbers, e.g., the Eddington number. If no natural number greater than some

n \in N

is signifiable on

D_{j}

, the scientist must make provisions in the program for the non-signifiable numbers in order to achieve feasible results with absolutely actually computable functions.

Theorem 1 shows that the computation of a trained FANN on a finite memory device can be packed into a unique natural number. Once packed, the natural number can be used as an archive, after a fashion, to look up natural numbers that correspond, in the bijective sense of the term, to the real vectors computed by the function

A_{z}^{z}

of an FANN

N_{z}^{j}

implemented on the device. The correspondence is such that for any signifiable

\vec{x}

, the output of

N_{z}^{j}

, i.e.,

A_{z}^{j} (\vec{x}) = \vec{a}

, corresponds to the natural number

\tilde{a}

computed by the primitive recursive function

{\tilde{A}}_{z}^{j}

, i.e.,

{\tilde{A}}_{z}^{j} (\tilde{x}) = \tilde{a}

, and the input

\vec{x}

corresponds to the natural number

\tilde{x}

. Thus,

A_{z}^{j} (\vec{x}) = \vec{a}

iff

{\tilde{A}}_{z}^{j} (\tilde{x}) = \tilde{a}

. Furthermore, the function

{\tilde{A}}_{z}^{j}

is computable in the general sense and is absolutely actually computable on any FMA where the natural number

Ω_{z}^{j}

is signifiable.

A correspondence established in Theorem 2 should be construed so that the uniqueness of

Ω_{z}^{j}

does not imply the uniqueness of

A_{z}^{j}

because the same function can be computed by different FANNs. What it implies is that, for any two different FANNs

N_{n}^{j}

and

N_{m}^{j}

,

n \neq m

(e.g., different numbers of layers or different numbers of nodes in a layer or different activation functions or different weights), implemented on the same FMA

D_{j}

,

Ω_{n}^{j} \neq Ω_{m}^{j}

. However, it may be the case that

A_{m}^{j} (\vec{x}) = A_{n}^{j} (\vec{x})

for any signifiable

\vec{x}

, and consequently,

{\tilde{A}}_{m}^{j} (\tilde{x}) = {\tilde{A}}_{n}^{j} (\tilde{x})

.

9. Conclusions

To differentiate between feedforward artificial neural networks and their functions as abstract mathematical objects and the realizations of these networks and functions on finite memory devices, we introduced the categories of general and actual computability. We showed that correspondences are possible between trained feedforward artificial neural networks on finite memory devices and classes of primitive recursive functions. We argued that there are theoretical and practical reasons why computability should be separated into these categories. The categories are overlapping in the sense that some functions belong in both categories.

Funding

This research received no external funding.

Data Availability Statement

No additional data are provided for this article.

Conflicts of Interest

The author declares no conflict of interest with himself.

Abbreviations

The following abbreviations are used in this article:

ANN	Artificial Neural Network
FANN	Feedforward Artificial Neural Network
FMA	Finite Memory Automaton or Automata
FMD	Finite Memory Device
G-number	Gödel Number
TM	Turing Machine
UTM	Universal Turing Machine
FSA	Finite State Automaton or Automata
PDA	Pushdown Automaton or Automata

Appendix A

Appendix A.1. Primitive Recursive Functions and Predicates

In this section, we define several functions shown to be primitive recursive in [7]. All smallcase variables in this section, e.g, x, y, z, t, n, and m, with and without subscripts, refer to natural numbers and the term number is synonymous with the term natural number.

The expression

{(\exists t)}_{\leq z} P (t, x_{1}, \dots, x_{n})

(A1)

is called the bounded existential quantification of the predicate P and holds iff

P (t, x_{1}, \dots, x_{n}) = 1

for at least one t such that

0 \leq t \leq z

. The expression

{(\forall t)}_{\leq z} P (t, x_{1}, \dots, x_{n})

(A2)

is called a bounded universal quantification of P and holds iff

P (t, x_{1}, \dots, x_{n}) = 1

for every t such that

0 \leq t \leq z

. If

P (t, x_{1}, \dots, x_{n})

is a predicate and z is a number, then

x = min_{t \leq z} {P (t, x_{1}, \dots, x_{n})}

(A3)

is called the bounded minimalization of P and defines the smallest number t for which P holds or 0 if there is no such number. It is shown in [7] that (1) the predicates

x = y

,

x \neq y

,

x < y

,

x > y

,

x \leq y

,

x \geq y

, and

x | y

, i.e., x divides y, are primitive recursive; (2) a finite logical combination of primitive recursive predicates is primitive recursive; and (3) if a predicate

P (\cdot)

is primitive recursive, then so are its negation, its bounded minimalization, and its bounded universal and existential quantifications.

Let

x ∸ y = \{\begin{matrix} x - y & if x \geq 0, \\ 0 & if x < y . \end{matrix}

(A4)

The pairing function of natural numbers x and y,

〈 x, y 〉 : N \to N

, is

\begin{matrix} 〈 x, y 〉 = z, \end{matrix}

(A5)

where

\begin{matrix} z = 2^{x} (2 y + 1) ∸ 1; \\ γ (d) \equiv {2^{d} | (z + 1) \land {(\forall c)}_{\leq z + 1} {2^{c} ∤ (z + 1) \lor c \leq d}}; \\ x = min_{d \leq z + 1} γ (d); \\ y = \frac{1}{2} (\frac{z + 1}{2^{x}} ∸ 1) . \end{matrix}

For any number z, there are unique x and y such that

〈 x, y 〉 = z

. For example, if

z = 27

, then

\begin{matrix} x = min_{d \leq 28} γ (d) = 2; \\ y = \frac{1}{2} (\frac{28}{2^{2}} ∸ 1) = 3; \\ 〈 2, 3 〉 = 2^{2} (2 \cdot 3 + 1) ∸ 1 = 27 . \end{matrix}

The functions

l (z)

and

r (z)

\begin{matrix} l (z) & = min_{x \leq z} {{(\exists y)}_{\leq z} {z = 〈 x, y 〉}} \\ r (z) & = min_{y \leq z} {{(\exists x)}_{\leq z} {z = 〈 x, y 〉}} \end{matrix}

(A6)

return the left and right components of any number z so that

〈 l (z), r (z) 〉 = z

. Thus, if

z = 27 = 〈 2, 3 〉

, then

l (z) = 2

,

r (z) = 3

.

The symbol

p_{n}

refers to the n-th prime, i.e.,

p_{1} = 2

,

p_{2} = 3

,

p_{3} = 5

, etc., and

p_{0} = 0

, by definition. The primes are computed by the following primitive recursive function.

π (i) = p_{i} .

(A7)

Thus,

π (0) = 0

,

π (1) = 2

,

π (2) = 3

,

π (3) = 5

,

π (4) = 7

,

π (5) = 11

, etc. If

(a_{1}, \dots, a_{n})

is a sequence of numbers, the function

[a_{1}, \dots, a_{n}] = \prod_{i = 1}^{n} π {(i)}^{a_{i}}

(A8)

computes the Gödel number (G-number) of this sequence. The G-number of the empty number sequence

()

is 1. Thus, the G-number of

(3, 101, 7891, 1, 43)

is

[3, 101, 7891, 1, 43] = 2^{3} \cdot 3^{101} \cdot 5^{7891} \cdot 7^{1} \cdot 11^{43}

.

If

x = [a_{1}, \dots, a_{n}]

, the accessor function

{(x)}_{i} = min_{t \leq x} {\neg {π {(i)}^{t + 1} | x}}

(A9)

returns the i-th element of x. Thus, if

x = [1, 7, 13]

, then

{(x)}_{1} = 1

,

{(x)}_{2} = 7

,

{(x)}_{3} = 13

, and

{(x)}_{j} = 0

for

j = 0

or

j > 3

.

The length of a Gödel number x is the position of the last non-zero prime power in x. Specifically, if

x = [a_{1}, a_{2}, \dots, a_{n}]

, its length is computed by the function

L t (\cdot)

defined as

L t (x) = min_{i \leq x} {{(x)}_{i} \neq 0 \land {(\forall j)}_{\leq x} {{j > i} \to {{(x)}_{j} = 0}}} .

(A10)

Thus,

L t (540) = L t ([2, 3, 1]) = 3

.

L t ([a_{1}, \dots, a_{n}]) = n

iff

a_{n} \neq 0

,

[{(x)}_{1}, \dots, {(x)}_{n}] = x

when

L t (x) = n

, and

L t (0) = L t (1) = 0

.

L t ([x_{1}, x_{2}, \dots, x_{n}]) = L t ([x_{1}, x_{2}, \dots, x_{n}, 0, \dots, 0])

, where

x_{n} \neq 0

.

The function

⌊ x / y ⌋

returns the integer part of the quotient

x / y

. Thus,

⌊ 7 / 2 ⌋ = 3

,

⌊ 2 / 5 ⌋ = 0

,

⌊ 8 / 5 ⌋ = 1

, and

⌊ x / 0 ⌋ = 0

for any number x.

Appendix A.2. Gödel Number Operators

The functions in this section or variants thereof were shown to be primitive recursive in [11]. The function

s e t (b, i, v) = \{\begin{matrix} ⌊ \frac{b}{π {(i)}^{{(b)}_{i}}} ⌋ \cdot π {(i)}^{v} & if 1 \leq i \leq L t (b) \land b > 1 \land v > 0, \\ 0 & otherwise \end{matrix}

(A11)

assigns the value of the i-th element of the G-number b to v. Thus, if

b = [1, 2] = 2^{1} 3^{2} = 18

,

i = 1

, and

v = 3

, then

s e t ([1, 2], 1, 3) = ⌊ \frac{b}{π {(1)}^{{(b)}_{1}}} ⌋ \cdot π {(1)}^{3} = ⌊ \frac{[1, 2]}{2^{{([1, 2])}_{1}}} ⌋ \cdot 2^{3} = ⌊ \frac{2^{1} \cdot 3^{2}}{2^{1}} ⌋ \cdot 2^{3} = [3, 2] = 72 .

The function

c n t (\cdot)

in (A12), where

s (t) = t + 1

is one of the three initial functions defined in (11) and

{(x)}_{i}

is defined in (A9), returns the count of occurrences of x in y. Thus, if

y = [1, 2, 1, 3]

, then

c n t (1, y) = 2

. A convention in (A12) and other equations in this section is that the name of auxiliary functions end in “x”.

c n t (x, y) = \{\begin{matrix} c n t x (x, y, L t (y)) & if y > 1, \\ 0 & otherwise . \end{matrix}

(A12)

\begin{matrix} c n t x (x, y, 0) & = & 0, \\ c n t x (x, y, t + 1) & = & c n t x x (x, y, t, c n t x (x, y, t)) . \end{matrix}

c n t x x (x, y, t, c) = \{\begin{matrix} 1 + c & if {(y)}_{s (t)} = x, \\ c & otherwise . \end{matrix}

If y is a G-number, then the predicate

x \in_{g} y \equiv c n t (x, y) \neq 0

(A13)

holds if x is an element of y. Thus,

1 \in_{g} [3, 4, 1, 5]

, but

1 \notin_{g} [3, 4, 2, 5]

. The function

r a p (x, y) = \{\begin{matrix} y \cdot {π (L t (y) + 1)}^{x} & if x > 0 \land y > 1 \land 0 \notin_{g} y, \\ 0 & otherwise \end{matrix}

(A14)

appends x to the right of the rightmost element of y. Thus,

\begin{matrix} r a p (1, [1]) & = & [1] \cdot {π (L t ([1]) + 1)}^{1} = [1] \cdot {π (2)}^{1} \\ = & 2^{1} \cdot 3^{1} = [1, 1]; \\ r a p (8, [2, 3, 5]) & = & [2, 3, 5] \cdot {π (4)}^{8} \\ = & [2, 3, 5, 8]; \\ r a p (5, s e t ([10, 3], 1, 2)) & = & r a p (5, [2, 3]) \\ = & [2, 3, 5] . \end{matrix}

Let

\begin{matrix} l c (x_{1}, x_{2}, 0) & = & x_{2}, \\ l c (x_{1}, x_{2}, t + 1) & = & r a p ({(x_{1})}_{s (t)}, l c (x_{1}, x_{2}, t)) . \end{matrix}

Then, the function

x \otimes_{l} y = \{\begin{matrix} l c (x, y, L t (x)) & if x > 1 \land y > 1 \land 0 \notin_{g} x \land 0 \notin_{g} y, \\ x & if x > 1 \land y = 1 \land 0 \notin_{g} x, \\ 0 & otherwise \end{matrix}

(A15)

places all numbers in y, in order, to the left of the first number in x, while the function

x \otimes_{r} y = \{\begin{matrix} y \otimes_{l} x & if x > 1 \land y > 1 \land 0 \notin_{g} x \land 0 \notin_{g} y, \\ x & if x > 1 \land y = 1 \land 0 \notin_{g} x, \\ 0 & otherwise \end{matrix}

(A16)

places all numbers of y, in order, to the right of the rightmost number in x. We refer to the function in (A15) as left concatenation and to the function in (A16) as right concatenation. Thus,

[3, 5] \otimes_{l} [7, 11] = [7, 11, 3, 5]

;

[3, 5] \otimes_{r} [7, 11] = [3, 5, 7, 11]

;

[2, 3] \otimes_{l} [1] = [1, 2, 3]

;

[2, 3] \otimes_{r} [1] = [2, 3, 1]

.

Let

\begin{matrix} g n x (l, u, k, 0) & = & [l], \\ g n x (l, u, k, t + 1) & = & g n x x (l, u, k, g n x (l, u, k, t), t); \end{matrix}

g n x x (l, u, k, z, t) = \{\begin{matrix} z \otimes_{r} [l + s (t) k] & if l + s (t) k \leq u, \\ z & otherwise . \end{matrix}

Then, for

l > 0

and

u > 0

, the function

g g n (l, u, k) = \{\begin{matrix} g n x (l, u, k, s (u ∸ l)) & if k > 0 \land {(\exists t)}_{\leq u} {l + t k = u \land t > 0}, \\ 0 & otherwise . \end{matrix}

(A17)

generates a G-number whose numbers start at l and go to u in positive integer increments of k. Thus,

g g n (1, 2, 1) = [1, 2]

;

g g n (1, 2, 2) = 0

;

g g n (1, 3, 1) = [1, 2, 3]

;

g g n (1, 3, 2) = [1, 3]

;

g g n (1, 3, 3) = 0

. The abbreviation

g g n

stands for generator of Gödel numbers.

The function

a s x (x, y) = min_{t \leq L t (y)} {t > 0 \land x = l ({(y)}_{t})}

(A18)

returns the smallest index t of

〈 i, j 〉 \in y

such that

x = i

. Thus, if

y = [〈 10, 100 〉, 〈 20, 200 〉, 〈 30, 300 〉],

then

a s x (10, y) = 1

,

a s x (20, y) = 2

,

a s x (30, y) = 3

. The function

a s c (x, y) = {(y)}_{a s x (x, y)}

(A19)

returns the pair from y at the index t returned by

a s x (\cdot)

. Thus, if

y = [〈 10, 100 〉, 〈 20, 200 〉, 〈 30, 300 〉],

then

\begin{matrix} a s c (10, y) & = & {(y)}_{a s x (10, y)} & = & {(y)}_{1} & = & 〈 10, 100 〉; \\ a s c (20, y) & = & {(y)}_{a s x (20, y)} & = & {(y)}_{2} & = & 〈 20, 200 〉; \\ a s c (30, y) & = & {(y)}_{a s x (30, y)} & = & {(y)}_{3} & = & 〈 30, 200 〉; \\ a s c (13, y) & = & {(y)}_{a s x (13, y)} & = & {(y)}_{0} & = & 0 . \end{matrix}

Appendix A.3. Examples of Ω Numbers

Let us abbreviate

G_{0, 1}^{t, j}

in (25) to

G^{t}

and consider the FANN in Figure 1. Let us assume that, as in Example 3,

R = {0, 0.3, 0.6, 0.9, 1}

,

I = {1, 2, 3, 4, 5}

and

t = 2

, and

\begin{matrix} G_{0, 1}^{2, j} & = & G^{2} \\ = & [[1, 1], [1, 2], [1, 3], [1, 4], [1, 5], \\ [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], \\ [3, 1], [3, 2], [3, 3], [3, 4], [3, 5], \\ [4, 1], [4, 2], [4, 3], [4, 4], [4, 5], \\ [5, 1], [5, 2], [5, 3], [5, 4], [5, 5]] . \end{matrix}

In other words,

G^{2}

is a G-number such that

[x_{1}, x_{2}] \in_{g} G^{2}

iff

(x_{1}, x_{2}) \in I^{2}

.

G^{3}

, whose definition we omit for space reasons, is a G-number whose length is 125 such that

[x_{1}, x_{2}, x_{3}] \in_{g} G^{3}

iff

(x_{1}, x_{2}, x_{3}) \in I^{3}

, e.g.,

[1, 2, 3] \in_{g} G^{3}

iff

(1, 2, 3) \in I^{3}

. We can compute

Ω_{i}^{e}

for the FANN

N_{z}^{j}

in Figure 1 as follows.

\begin{matrix} Ω_{0}^{0} = [〈{(G^{2})}_{1}, μ (α_{0}^{0} (ζ^{2} ({(G^{2})}_{1}), ()))〉, \dots, 〈{(G^{2})}_{25}, μ (α_{0}^{0} (ζ^{2} ({(G^{2})}_{25}), ()))〉]; \\ Ω_{1}^{0} = [〈{(G^{2})}_{1}, μ (α_{1}^{0} (ζ^{2} ({(G^{2})}_{1}), ()))〉, \dots, 〈{(G^{2})}_{25}, μ (α_{1}^{0} (ζ^{2} ({(G^{2})}_{25}), ()))〉]; \\ Ω_{0}^{1} = [〈{(G^{2})}_{1}, μ (α_{0}^{1} (ζ^{2} ({(G^{2})}_{1}), {\vec{w}}^{1}))〉, \dots, 〈{(G^{2})}_{25}, μ (α_{0}^{1} (ζ^{2} ({(G^{2})}_{25}), {\vec{w}}^{1}))〉]; \\ Ω_{1}^{1} = [〈{(G^{2})}_{1}, μ (α_{1}^{1} (ζ^{2} ({(G^{2})}_{1}), {\vec{w}}^{1}))〉, \dots, 〈{(G^{2})}_{25}, μ (α_{1}^{1} (ζ^{2} ({(G^{2})}_{25}), {\vec{w}}^{1}))〉]; \\ Ω_{2}^{1} = [〈{(G^{2})}_{1}, μ (α_{2}^{1} (ζ^{2} ({(G^{2})}_{1}), {\vec{w}}^{1}))〉, \dots, 〈{(G^{2})}_{25}, μ (α_{2}^{1} (ζ^{2} ({(G^{2})}_{25}), {\vec{w}}^{1}))〉]; \\ Ω_{0}^{2} = [〈{(G^{3})}_{1}, μ (α_{0}^{2} (ζ^{3} ({(G^{3})}_{1}), {\vec{w}}^{2}))〉, \dots, 〈{(G^{3})}_{125}, μ (α_{0}^{2} (ζ^{3} ({(G^{3})}_{125}), {\vec{w}}^{2}))〉]; \\ Ω_{1}^{2} = [〈{(G^{3})}_{1}, μ (α_{1}^{2} (ζ^{3} ({(G^{3})}_{1}), {\vec{w}}^{2}))〉, \dots, 〈{(G^{3})}_{125}, μ (α_{1}^{2} (ζ^{3} ({(G^{3})}_{125}), {\vec{w}}^{2}))〉] . \end{matrix}

We can compute individual elements of

Ω_{i}^{e}

. For example, since

{(G^{2})}_{17} = [4, 2]

,

\begin{matrix} {(Ω_{0}^{0})}_{17} & = & 〈{(G^{2})}_{17}, μ (α_{0}^{0} (ζ^{2} ({(G^{2})}_{17}), ()))〉 \\ = & 〈[4, 2], μ (α_{0}^{0} (ζ^{2} ([4, 2]), ()))〉 \\ = & 〈[4, 2], μ (α_{0}^{0} ((0.9, 0.3), ()))〉 \\ = & 〈[4, 2], μ (0.9)〉 \\ = & 〈[4, 2], 4〉 \in N . \end{matrix}

Since

{(G^{2})}_{12} = [3, 2]

,

\begin{matrix} {(Ω_{0}^{1})}_{12} & = & 〈{(G^{2})}_{12}, μ (α_{0}^{1} (ζ^{2} ({(G^{2})}_{12}), {\vec{w}}^{1}))〉 \\ = & 〈[3, 2], μ (α_{0}^{1} (ζ^{2} ([3, 2]), {\vec{w}}^{1}))〉 \\ = & 〈[3, 2], μ (α_{0}^{1} ((0.6, 0.3), {\vec{w}}^{1}))〉 \\ = & 〈[3, 2], z〉 \in N, \end{matrix}

where

z = μ (α_{0}^{1} ((0.6, 0.3), {\vec{w}}^{1})) \in I

. We know that

[2, 3, 4] \in_{g} G^{3}

because

(2, 3, 4) \in I^{3}

. Thus,

{(G^{3})}_{t} = [2, 3, 4]

, for

1 \leq t \leq 125

. Let us therefore assume, for the sake of this example, that

{(G^{3})}_{35} = [2, 3, 4]

. Then,

\begin{matrix} {(Ω_{1}^{2})}_{35} & = & 〈{(G^{3})}_{35}, μ (α_{1}^{2} (ζ^{3} ({(G^{3})}_{35}), {\vec{w}}^{2}))〉 \\ = & 〈[2, 3, 4], μ (α_{1}^{2} (ζ^{3} ([2, 3, 4]), {\vec{w}}^{2}))〉 \\ = & 〈[2, 3, 4], μ (α_{1}^{2} ((0.3, 0.6, 0.9), {\vec{w}}^{2}))〉 \\ = & 〈[2, 3, 4], μ (α_{1}^{2} ((0.3, 0.6, 0.9), {\vec{w}}^{2}))〉 \\ = & 〈[2, 3, 4], z〉 \in N, \end{matrix}

where

z = μ (α_{1}^{2} ((0.3, 0.6, 0.9), {\vec{w}}^{2})) \in I

.

Using (29), we can compute

Ω_{z}^{j}

for the FANN

N_{z}^{j}

in Figure 1 with the

Ω

numbers as

\begin{matrix} Ω_{z}^{j} & = & [〈0, [Ω_{0}^{0}, Ω_{1}^{0}]〉, 〈1, [Ω_{0}^{1}, Ω_{1}^{1}, Ω_{2}^{1}]〉, 〈2, [Ω_{0}^{2}, Ω_{1}^{2}]〉] . \end{matrix}

From

Ω_{z}^{j}

above, we can compute all

{\tilde{α}}_{i}^{e}

defined in (30) for

N_{z}^{j}

in Figure 1. For example, since

{(G^{2})}_{12} = [3, 2]

,

\begin{matrix} {\tilde{α}}_{1}^{1} ([3, 2]) & = & r (a s c (x, {(r (a s c (1, Ω_{z}^{j})))}_{2})) \\ = & r (a s c ([3, 2], Ω_{1}^{1})) \\ = & r (〈{(G^{2})}_{12}, μ (α_{1}^{1} (ζ^{2} ({(G^{2})}_{12}), {\vec{w}}^{1}))〉) \\ = & μ (α_{1}^{1} (ζ^{2} ({(G^{2})}_{12}), {\vec{w}}^{1})) \\ = & μ (α_{1}^{1} (ζ^{2} ([3, 2]), {\vec{w}}^{1})) \\ = & μ (α_{1}^{1} ((0.6, 0.3), {\vec{w}}^{1})) \in I . \end{matrix}

References

McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
Gripenberg, G. Approximation by neural networks with a bounded number of nodes at each level. J. Approx. Theory 2003, 122, 260–266. [Google Scholar] [CrossRef] [Green Version]
Guliyev, N.; Ismailov, V. On the approximation by single hidden layer feedforward neural networks with fixed weights. Neural Netw. 2019, 98, 296–304. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gödel, K. On formally undecidable propositions of Principia Mathematica and related systems I. In Kurt Gödel Collected Works Volume I Publications 1929–1936; Feferman, S., Dawson, J.W., Kleene, S.C., Moore, G.H., Solovay, R.M., van Heijenoort, J., Eds.; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
Davis, M.; Sigal, R.; Weyuker, E. Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science, 2nd ed.; Harcourt, Brace & Company: Boston, MA, USA, 1994. [Google Scholar]
Kleene, S.C. Introduction to Metamathematics; D. Van Nostrand: New York, NY, USA, 1952. [Google Scholar]
Meyer, M.; Ritchie, D. The complexity of loop programs. In Proceedings of the ACM National Meeting, Washington, DC, USA, 14–16 November 1967; pp. 465–469. [Google Scholar]
Rogers, H., Jr. Theory of Recursive Functions and Effective Computability; The MIT Press: Cambridge, MA, USA, 1988. [Google Scholar]
Kulyukin, V. On primitive recursive characteristics of chess. Mathematics 2022, 10, 1016. [Google Scholar] [CrossRef]
Hopcroft, J.E.; Ullman, J.D. Introduction to Automata Theory, Languages, and Computation; Narosa Publishing Hourse: New Delhi, India, 2002. [Google Scholar]
Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
Eddington, A.S. The constants of nature. In The World of Mathematics; Newman, J.R., Ed.; Simon and Schuster: New York, NY, USA, 1956; Volume 2, pp. 1074–1093. [Google Scholar]

Figure 1. A 3-layer fully connected feedforward artificial neural network (FANN); layer 0 includes the neurons

n_{0}^{0}

and

n_{1}^{0}

; layer 1 includes the neurons

n_{0}^{1}

,

n_{1}^{1}

, and

n_{2}^{1}

; layer 2 includes the neurons

n_{0}^{2}

and

n_{1}^{2}

; the two arrows coming into

n_{0}^{0}

and

n_{1}^{0}

signify that layer 0 is the input layer; the two arrows going out of

n_{0}^{2}

and

n_{1}^{2}

signify that layer 2 is the output layer;

w_{i, j}^{e}

,

0 < e < 3

, is the weight of the synapse from

n_{i}^{e - 1}

to

n_{j}^{e}

, e.g.,

w_{0, 0}^{1}

is the weight of the synapse from

n_{0}^{0}

to

n_{0}^{1}

and

w_{2, 1}^{2}

is the weight of the synapse from

n_{2}^{1}

to

n_{1}^{2}

.

Figure 1. A 3-layer fully connected feedforward artificial neural network (FANN); layer 0 includes the neurons

n_{0}^{0}

and

n_{1}^{0}

; layer 1 includes the neurons

n_{0}^{1}

,

n_{1}^{1}

, and

n_{2}^{1}

; layer 2 includes the neurons

n_{0}^{2}

and

n_{1}^{2}

; the two arrows coming into

n_{0}^{0}

and

n_{1}^{0}

signify that layer 0 is the input layer; the two arrows going out of

n_{0}^{2}

and

n_{1}^{2}

signify that layer 2 is the output layer;

w_{i, j}^{e}

,

0 < e < 3

, is the weight of the synapse from

n_{i}^{e - 1}

to

n_{j}^{e}

, e.g.,

w_{0, 0}^{1}

is the weight of the synapse from

n_{0}^{0}

to

n_{0}^{1}

and

w_{2, 1}^{2}

is the weight of the synapse from

n_{2}^{1}

to

n_{1}^{2}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kulyukin, V.A. On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions. Mathematics 2023, 11, 2620. https://doi.org/10.3390/math11122620

AMA Style

Kulyukin VA. On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions. Mathematics. 2023; 11(12):2620. https://doi.org/10.3390/math11122620

Chicago/Turabian Style

Kulyukin, Vladimir A. 2023. "On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions" Mathematics 11, no. 12: 2620. https://doi.org/10.3390/math11122620

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions

Abstract

1. Introduction

2. Preliminaries

2.1. Functions and Predicates

2.2. Finite Memory Automata

3. Computability: General vs. Actual

3.1. General Computability

3.2. Actual Computability

4. A Recursive Formalization of Feedforward Artificial Neural Networks

5. Finite Sets as Gödel Numbers

6. Numbers $Ω_{z, i}^{j, e}$ and $Ω_{z}^{j}$ : Packing FANNs into Natural Numbers

7. FANNs and Primitive Recursive Functions

8. Discussion

9. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Primitive Recursive Functions and Predicates

Appendix A.2. Gödel Number Operators

Appendix A.3. Examples of Ω Numbers

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On Correspondences between Feedforward Artificial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions

Abstract

1. Introduction

2. Preliminaries

2.1. Functions and Predicates

2.2. Finite Memory Automata

3. Computability: General vs. Actual

3.1. General Computability

3.2. Actual Computability

4. A Recursive Formalization of Feedforward Artificial Neural Networks

5. Finite Sets as Gödel Numbers

6. Numbers Ω z , i j , e and Ω z j : Packing FANNs into Natural Numbers

7. FANNs and Primitive Recursive Functions

8. Discussion

9. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Primitive Recursive Functions and Predicates

Appendix A.2. Gödel Number Operators

Appendix A.3. Examples of Ω Numbers

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6. Numbers $Ω_{z, i}^{j, e}$ and $Ω_{z}^{j}$ : Packing FANNs into Natural Numbers