A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks

Chen, Ray-Ming

doi:10.3390/math13172903

Open AccessArticle

A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks

by

Ray-Ming Chen

^1,2

¹

Department of Mathematical Sciences, College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou 325060, China

²

Department of Mathematical Sciences, College of Science, Mathematics and Technology, Kean University, 1000 Morris Avenue, Union, NJ 07083, USA

Mathematics 2025, 13(17), 2903; https://doi.org/10.3390/math13172903

Submission received: 2 August 2025 / Revised: 2 September 2025 / Accepted: 5 September 2025 / Published: 8 September 2025

(This article belongs to the Special Issue Advanced Research in Neural Networks, Machine Learning, and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Convolutional Neural Networks (CNNs) are a highly used machine learning architecture in various fields. Typical descriptions of CNNs are based on low-dimension and tensor representations in the feature extraction part. In this article, we extend the setting of CNNs to any arbitrary dimension and linearize the whole setting via the typical layers of neurons. In essence, a partial and a full network construct the entire process of a standard CNN, with the partial network being used to linearize the feature extraction. By doing so, we link the tensor-style representation of CNNs with the pure network representation. The outcomes serve two main purposes: to relate CNNs with other machine learning frameworks and to facilitate intuitive representations.

Keywords:

CNN; network representation; linearization; forward and backward propagation

MSC:

68T07

1. Introduction

CNNs, based on the concept of inner product and sliding windows, serve as a vital analyzing tool in data science and artificial intelligence [1,2,3]. They are not only theoretically justified but also practically accepted: radiology [4], image processing [5,6,7], protein analysis [8], computer vision [9], signal processing [10], movement recognition [11], cracked surface detection [12], plant disease detection [13], handwritten character classification [14], etc. Most of the presentation of CNNs is described by graphs [15,16]. This causes two main issues: (1) determining how to unify CNNs with ANNs under the framework—in the literature, they are not unified under the same setting, i.e., neurons and weights, but an informal description of the feature extraction and learning, together with ANNs [17]; (2) how to extend CNNs (typically applied to 2D or 3D objects) to any arbitrary n-dimensional objects. The typical settings in the literature about CNNs are focused on 2D or 3D object learning, such as computer vision [18] and object/image status classification [19,20]. Despite the usual textbooks or applications (2D/3D image recognition/classification, etc.), standard CNNs can also be applied to the high-dimensional data. Indeed, they could also easily be applied via the concepts of high-dimensional tensors for both the input tensor/object and the kernel/filter tensors [21,22]. The main idea of this article is not to show how to apply CNNs to high-dimensional data but to reveal how to convert a typical/standard high-dimensional CNN to a linearized high-dimensional CNN. The main purpose is to yield a coherent network under the same framework. Though these papers mention the need to consider multiple dimensions, or more than three dimensions, they do not provide a theoretical framework to accommodate such a need. To overcome these two limitations, we unify CNNs and ANNs by linearizing the CNN and then combining the systems into one system via partial and full networks, and we extend the CNN to any arbitrary dimensions via tensor power and tensor product. This unified high-dimensional CNN, together with the ANN system, provides both theoretical and practical advantages, since all the objects are treated as one rather than split into various channels. We divide this article into various sections: from a basic introduction of background knowledge, notations, and related definitions to the derivations of theoretical settings and theories and an illustrative example to validate the theory and the practice.

2. Some Basic Notations

Let

N

denote the set of all the natural numbers. Let

R

denote the set of all the real numbers. Let

R^{\infty} \equiv R \cup {+ \infty, - \infty}

denote the set of all the real numbers, including infinities.

⌊α⌋

and

⌈α⌉

are the floor and ceiling functions evaluated at the real value

α \in R^{\infty}

, respectively; in particular,

⌊\infty⌋ = 0

. For convenience, let us simply define

⌊\frac{α}{0}⌋ = 0

for all

α \in R_{0}^{+}

, the set of all the non-negative real numbers. Let

\bar{n}

denote the set

{1, 2, \dots, n}

for any arbitrary

n \in N

. Let

\vec{n}

denote the vector of

[1, 2, \dots, n]

. Let the Cartesian product set

{\bar{n}}_{1} \times {\bar{n}}_{2}

denote the set

{(p, q) : p \in {\bar{n}}_{1}, q \in {\bar{n}}_{2}}

and

(\bar{m_{1} : m_{2}}) \times (\bar{m_{3} : m_{4}})

denote the set

{(p, q) : m_{1} \leq p \leq m_{2}, m_{3} \leq q \leq m_{4}}

for any

n_{1}, n_{2}, m_{1}, m_{2}, m_{3}, m_{4} \in N

. Let the n-ary Cartesian product

\prod_{j = 1}^{N} {\bar{n}}_{j}

denote the set

{(i_{1}, i_{2}, \dots, i_{N}) : i_{j} \in {\bar{n}}_{j}

for all

1 \leq j \leq N}

. If S is a set, we use

| S |

to denote the size (cardinality) of the set S. Let

\vec{i} \in N^{k}

, the k-ary Cartesian product over sets

N

, be an n-ary column vector. Let

\vec{i} [m]

(or

i_{m}

) denote the mth element in the vector

\vec{i}

. Let

| \vec{i} |

denote the length of vector

\vec{i}

. Let

N \in N

be arbitrary. Let

n_{1}, n_{2}, \dots, n_{N} \in N

be arbitrary. Suppose A is an arbitrary k-dimensional tensor after all the preliminary processing of the original object (for example, the padding of an input text or image), or A could be identified by a function

f_{A} : \prod_{j = 1}^{k} {\bar{n}}_{j} \to R^{D}

(or

A \equiv f_{A} : \prod_{j = 1}^{k} {\bar{n}}_{j} \to R^{D}

), where

D = \prod_{j = 1}^{k} n_{j}

. Let

{\vec{σ}}_{j} = [σ_{1 j}, σ_{2 j}, \dots, σ_{k j}] \in N^{k}

denote the stride vector, where

σ_{i j}

denotes the number of strides taken in the ith dimension at the jth convolutional layer. Let

{\vec{f}}_{j} \equiv [f_{1 j}, f_{2 j}, \dots, f_{k j}]

-sectional tensor

F^{i, j}

denotes the ith feature/filter in the jth layer. Let

F^{j} = {F^{i, j}}_{i = 1}^{m_{j}}

, where

m_{j}

denotes the number of filters/features in layer j.

3. Basic Definitions and Properties

Definition 1.

We use

T_{\vec{τ}}

to denote a tensor

T

with sectional/directional vector

\vec{τ}

, i.e.,

T_{\vec{τ}} : \prod_{k = 1}^{s} \bar{\vec{τ} [k]} \to R

, where

s = | \vec{τ} |

is the number of sections (or directions) of

T

. If

S \subseteq \prod_{k = 1}^{s} \bar{\vec{τ} [k]}

, we use

T_{\vec{τ}} ↾_{S}

to denote its partial function, i.e.,

T_{\vec{τ}} ↾_{S} (\vec{γ}) : = T_{\vec{τ}} (\vec{γ})

. Furthermore,

T_{\vec{τ}} (S) = {T_{\vec{τ}} (\vec{γ}) : \vec{γ} \in S}

.

Remark 1.

In most of the cases, kernels are representable by a product set (or tensor product). But in some cases, we might need to consider the irregular kernels, i.e., those that could not be represented directly by product sets. If that is the case, in our setting, we could simply pad the undefined cell by 0. For example, a kernel

κ : {(1, 2), (2, 1)} \to R

could be extended to a product set

κ^{+} : {(1, 1), (1, 2), (2, 1), (2, 2)}

in which

κ^{+} (1, 1) = κ^{+} (2, 2) = 0; κ^{+} (1, 2) : = κ (1, 2); κ^{+} (2, 1) : = κ (2, 1)

. Such a setting could rescue us from redesigning the setting/modeling from scratch. This leads to another question: Why, or under which circumstance, shall we tend to adopt irregular kernels? An obvious one is that even the input data is not representable by a product set or a tensor. If this is the case, we could apply the same technique to pad the undefined cells by 0. Such accommodation suits our setting and is much more consistent with the settings of standard CNNs.

Definition 2.

Let

\vec{σ}

denote a stride vector whose elements indicate the strides with respect to each section (or direction).

Definition 3.

Let

F_{\vec{f}}

denote a filter (tensor), or kernel, where

\vec{f}

is its sectional vector, i.e.,

F_{\vec{f}} : \prod_{k = 1}^{s} \bar{\vec{f} [k]} \to R

. If we need to denote the ith filter in the jth layer, we use the notation

F_{\vec{f}}^{i, j}

.

Remark 2.

For each kernel

F_{i, j}

(the ith kernel in the jth layer), we could associate it with a stride vector

{\vec{σ}}_{i, j}

. This is based on the assumption that a uniform stride is applicable and appropriate for the sliding and inner product operation. In some cases, we might encounter non-uniform strides [23]. A much more generalized setting is where the strides (or the positions where the inner product takes place) are determined by a set of positional vectors (for example, such positions are randomly chosen). Then, this case is beyond our setting, since our setting is based on the standard CNN in which the strides normally are assumed to be uniform. In order not to reformulate our setting, we could add another masking layer after the feature maps [24]. Therefore, all the stride vectors consist of

\vec{1} \equiv (1, 1, \dots, 1)

. Then, each feature map acts on the mask consisting of all values of 1 if the feature map is activated and 0 if the feature map is not activated. After this masking layer, in the pooling layer, one gets rid of all the values of 0 and their associated positions. The remaining values are linearized and fed into the ANN part.

Remark 3.

Another interesting aspect of feature extraction is the dilated convolution neural network [25,26]. Since normally the dilated parts are padded by 0, the setting is still comparable with our setting regarding kernels. This is because the parameters to be learned lie in the kernels. To accommodate our setting with the dilated kernel is to fix the padded parameters and to learn the other non-padded parts in the kernel.

In order to present the cropped sub-tensors of

T_{\vec{τ}}

, given the filter

F_{\vec{f}}

and stride vector

\vec{σ}

, we use a power tensor to collect such sub-tensors as follows:

Definition 4

(power tensor one). Let

P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) = {T_{\vec{f}}^{\vec{γ}}}_{\vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{f} [k]}{\vec{σ} [k]} ⌋ + 1}}

, where

T_{\vec{f}}^{\vec{γ}} = T_{\vec{τ}} (\prod_{k = 1}^{s} \bar{(\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ}) [k] : ((\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{f}) [k]})

, where ⊙ indicates the Hadamard product and

\vec{1} = [1, 1, \dots, 1]

.

Remark 4.

In the definition, one observes that the floor function might contain a denominator of 0; this is normally defined by extended real numbers. As long as the numerator is non-negative, we could regard the evaluated value of the floor function as 0. As for a non-negative numerator, this is normally true by adding padding into the CNN systems. The same argument goes for the following definitions containing denominator 0.

Definition 5

(power tensor two). Let

P_{\vec{p}}^{\vec{σ}} (A_{\vec{γ}}) = {A_{\vec{p}}^{\vec{η}}}_{\vec{η} \in \prod_{n = 1}^{s} \bar{⌊ \frac{\vec{γ} [n] - \vec{p} [n]}{\vec{σ} [n]} ⌋ + 1}}

, where

A_{\vec{p}}^{\vec{η}} = A_{\vec{γ}} (\prod_{n = 1}^{s} \bar{(\vec{1} + (\vec{η} - \vec{1}) ⊙ \vec{σ}) [n] : ((\vec{η} - \vec{1}) ⊙ \vec{σ} + \vec{f}) [n]})

.

Definition 6

(maximum pooling function). Define

μ_{\vec{γ}}^{*} : \prod_{n = 1}^{s} \bar{⌊ \frac{\vec{γ} [n] - \vec{p} [n]}{\vec{σ} [n]} ⌋ + 1} \to \prod_{n = 1}^{s} \bar{\vec{γ} [n]}

by

μ_{\vec{γ}}^{*} (\vec{θ}) = argmax {A_{\vec{γ}} (\vec{θ}) : \vec{θ} \in \prod_{n = 1}^{s} \bar{(\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ}) [n] : ((\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{p}) [n]}} .

for all

\vec{γ} \in \prod_{n = 1}^{s} \bar{⌊ \frac{\vec{τ} [n] - \vec{f} [n]}{\vec{σ} [n]} ⌋ + 1}

.

For a standard CNN, one could consult Figure 1. The figure shows a complete data set training. Suppose there are B training data points, i.e.,

1 \leq j \leq B

. The process consists mainly of three operators: convolution, activation function a, and pooling function p. The convolution operator (or inner product) acts on the input tensor and the filters/features.

F_{\vec{f}}^{n} [m; j]

(

1 \leq n \leq N

denotes the nth filter at the mth recursive step for the jth input, and

\vec{f}

is the dimensional vector of the filter). The activation function is denoted by a, while the pooling function is denoted by p. In the whole process, we insert an auxiliary operator

P_{\vec{v}}^{\vec{u}}

(named a power tensor operator), where

\vec{u}

describes the strides assigned for all the dimensions and

\vec{v}

is the dimensional vector of a truncated tensor (the tensor used to truncate its preceding tensor). The floor function is used to calculate the dimensional vector of the truncated tensors. Finally, the resulting (column) vector

{}_{j}X_{m}

comes from flattening and stacking all the pooled tensors at the mth recursive step for the jth input. In this diagram, we do not directly take padding into consideration. It could be done by a slight tweaking of the setting. In addition,

F_{\vec{f}}^{n} [1, j + 1]

is defined to be

F_{\vec{f}}^{n} [N, j]

, where (suppose we fix the number) N is the total number of recursive steps for the backward induction.

\prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{f} [k]]}{\vec{σ} [k]} ⌋ + 1} \equiv \prod_{k = 1}^{s} \bar{\vec{q} [k]}

, where

\vec{q} [k] = ⌊ \frac{\vec{τ} [k] - \vec{f} [k]}{\vec{σ} [k]} ⌋ + 1

. Moreover,

{\vec{q}}_{j} [k] = ⌊ \frac{\vec{q} [k] - {\vec{p}}_{j} [k]}{{\vec{σ}}_{j} [k]} ⌋ + 1

and

\prod_{k = 1}^{s} \bar{⌊ \frac{\vec{q} [k] - {\vec{p}}_{j} [k]]}{{\vec{σ}}_{j} [k]} ⌋ + 1} \equiv \prod_{k = 1}^{s} \bar{{\vec{q}}_{j} [k]}

.

A high-dimension tensor

T_{\vec{τ}}

is fed into the CNN system. The cropped sub-tensors

P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) = {T_{\vec{f}}^{\vec{γ}}}_{\vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{f} [k]}{\vec{σ} [k]} ⌋ + 1}}

interact with the sequence of filters/kernels/tensors

{F_{\vec{f}}^{n}}_{n = 1}^{N}

to yield a sequence of feature maps/inner products

{\{\{H_{\vec{γ}}^{n} \equiv 〈 T_{\vec{τ}}^{\vec{γ}}, F_{\vec{f}}^{n} 〉 : \vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{f} [k]}{\vec{σ} [k]} ⌋ + 1}\}\}}_{n = 1}^{N} .

After applying the activation function a on the feature maps, one has

{\{\{A_{\vec{γ}}^{n} \equiv a (〈 T_{\vec{τ}}^{\vec{γ}}, F_{\vec{f}}^{n} 〉) : \vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{f} [k]}{\vec{σ} [k]} ⌋ + 1}\}\}}_{n = 1}^{N} .

Then, one forms their sub-tensors via a sequence of power tensors

{P_{{\vec{p}}_{n}}^{{\vec{σ}}_{n}}}_{n = 1}^{N}

and yields

{\{\{P_{{\vec{p}}_{n}}^{{\vec{σ}}_{n}} (A_{\vec{γ}}^{n}) \equiv P_{{\vec{p}}_{n}}^{{\vec{σ}}_{n}} \circ a (〈 T_{\vec{τ}}^{\vec{γ}}, F_{\vec{f}}^{n} 〉) : \vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - {\vec{p}}_{n} [k]}{{\vec{σ}}_{n} [k]} ⌋ + 1}\}\}}_{n = 1}^{N}

\equiv {\{\{\{A_{{\vec{p}}_{n}}^{\vec{γ}, \vec{η}} : \vec{η} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{γ} [k] - {\vec{p}}_{n} [k]}{{\vec{σ}}_{n} [k]} ⌋ + 1}\} : \vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{p} [k]}{\vec{σ} [k]} ⌋ + 1}\}\}}_{n = 1}^{N} .

Next, one applies pooling functions, p, on this set to form

{\{\{p (\{A_{{\vec{p}}_{n}}^{\vec{γ}, \vec{η}} : \vec{η} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{γ} [k] - {\vec{p}}_{n} [k]}{{\vec{σ}}_{n} [k]} ⌋ + 1}\}) : \vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{p} [k]}{\vec{σ} [k]} ⌋ + 1}\}\}}_{n = 1}^{N} .

Lastly, one flattens them into one column vector

\vec{X}

with respect to

\vec{γ}

and n via f and

\tilde{f}

:

\tilde{f} ({f (\{p (\{A_{{\vec{p}}_{n}}^{\vec{γ}, \vec{η}} : \vec{η} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{γ} [k] - {\vec{p}}_{n} [k]}{{\vec{σ}}_{n} [k]} ⌋ + 1}\}) : \vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{p} [k]}{\vec{σ} [k]} ⌋ + 1}\})}_{n = 1}^{N}) .

= \tilde{f} ({\{C_{\vec{γ}}^{n} : \vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{p} [k]}{\vec{σ} [k]} ⌋ + 1}\}}_{n = 1}^{N}) = \vec{X} .

This could also be represented by the diagram presented in Figure 2.

In order to perform the backward propagation/pass/induction, we take the max pooling function p and keep track of all their indices as follows (fix n):

μ_{n} (\vec{γ}) : = a r g m a x \{A_{{\vec{p}}_{n}}^{\vec{γ}, \vec{η}} : \vec{η} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{γ} [k] - {\vec{p}}_{n} [k]}{{\vec{σ}}_{n} [k]} ⌋ + 1}\},

for all

\vec{γ} \in \prod_{k = 1}^{s} \bar{⌊ \frac{\vec{τ} [k] - \vec{p} [k]}{\vec{σ} [k]} ⌋ + 1}

and for all

1 \leq n \leq N

.

Example 1.

Suppose

T_{\vec{τ}} = {[t_{i, j}]}_{1 \leq i \leq 6}^{1 \leq j \leq 7}

is a 6-by-7 matrix (tensor), where

\vec{τ} = [6, 7]

. Suppose the stride vector

\vec{σ} = [1, 2]

and

\vec{f} = [2, 3]

. Then,

s = 2

and

P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) = {T_{\vec{f}}^{\vec{γ}}}_{\vec{γ} \in \bar{5} \times \bar{3}} = [\begin{matrix} T_{\vec{τ}} (\bar{1 : 2} \times \bar{1 : 3}) & T_{\vec{τ}} (\bar{1 : 2} \times \bar{3 : 5}) & T_{\vec{τ}} (\bar{1 : 2} \times \bar{5 : 7}) \\ T_{\vec{τ}} (\bar{2 : 3} \times \bar{1 : 3}) & T_{\vec{τ}} (\bar{2 : 3} \times \bar{3 : 5}) & T_{\vec{τ}} (\bar{2 : 3} \times \bar{5 : 7}) \\ T_{\vec{τ}} (\bar{3 : 4} \times \bar{1 : 3}) & T_{\vec{τ}} (\bar{3 : 4} \times \bar{3 : 5}) & T_{\vec{τ}} (\bar{3 : 4} \times \bar{5 : 7}) \\ T_{\vec{τ}} (\bar{4 : 5} \times \bar{1 : 3}) & T_{\vec{τ}} (\bar{4 : 5} \times \bar{3 : 5}) & T_{\vec{τ}} (\bar{4 : 5} \times \bar{5 : 7}) \\ T_{\vec{τ}} (\bar{5 : 6} \times \bar{1 : 3}) & T_{\vec{τ}} (\bar{5 : 6} \times \bar{3 : 5}) & T_{\vec{τ}} (\bar{5 : 6} \times \bar{5 : 7}) \end{matrix}]

. The convolution between T and some feature matrix is shown in Figure 3.

As for the procedure of linearizing the power tensor, one starts from linearizing each

T ↾_{D_{\vec{I}}}

whose size is

\prod_{p = 1}^{h} f_{p j}

. Hence, the total neurons in this layer should be

(\prod_{p = 1}^{h} f_{p j}) \cdot (\prod_{p = 1}^{h} q_{p j})

, i.e., the power tensor is stacked by the sequence

{T ↾_{D_{L^{- 1} (n)}}}_{n = 1}^{M}

, where

M = \prod_{p = 1}^{h} q_{p j}

and each

T ↾_{D_{L^{- 1} (n)}}

is a column vector with the length

\prod_{p = 1}^{h} n_{p j}

.

4. Theoretical Settings

As for the procedure of linearizing the power tensor, one starts by linearizing each

T ↾_{D_{\vec{I}}}

whose size is

\prod_{p = 1}^{h} f_{p j}

. Hence, the total neurons in this layer should be

(\prod_{p = 1}^{h} f_{p j}) \cdot (\prod_{p = 1}^{h} q_{p j})

, i.e., the power tensor is stacked by the sequence

{T ↾_{D_{L^{- 1} (n)}}}_{n = 1}^{M}

, where

M = \prod_{p = 1}^{h} q_{p j}

and each

T ↾_{D_{L^{- 1} (n)}}

is a column vector with the length

\prod_{p = 1}^{h} n_{p j}

.

Definition 7

(

\vec{v} \hat{\otimes} \vec{w}

). For any column vector

\vec{v}

and

\vec{w}

, define

[\begin{matrix} v_{1} \\ v 2 \\ ⋮ \\ v_{p} \end{matrix}] \hat{\otimes} [\begin{matrix} w_{1} \\ w 2 \\ ⋮ \\ w_{q} \end{matrix}] : =

[\begin{matrix} (v_{1}, w_{1}) & (v_{1}, w_{2}) & \dots & (v_{1}, w_{q}) \\ (v_{2}, w_{1}) & (v_{2}, w_{2}) & \dots & (v_{2}, w_{q}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ (v_{p}, w_{1}) & (v_{p}, w_{2}) & \dots & (v_{p}, w_{q}) \end{matrix}]

and

{\vec{v}}^{1} \hat{\otimes} {\vec{v}}^{2} \hat{\otimes} \dots \hat{\otimes} {\vec{v}}^{n} : = {{\vec{v}}^{1} [t_{1}]}_{t_{1} = 1}^{| {\vec{v}}^{1} |} \times {{\vec{v}}^{2} [t_{2}]}_{t_{2} = 1}^{| {\vec{v}}^{2} |} \times \dots \times {{\vec{v}}^{n} [t_{n}]}_{t_{n} = 1}^{| {\vec{v}}^{n} |}

.

Definition 8.

Define the positional window/vector in the ith dimension by

{\vec{I}}_{i}

.

Claim 1

(dimensional multipliers). If

d i m (T) = [τ_{1}, τ_{2}, \dots, τ_{h}]

,

d i m (F^{j}) = [f_{1 j}, f_{2 j}, \dots, f_{h, j}]

and the stride vector is

{\vec{σ}}_{j} = [σ_{1 j}, σ_{2 j}, \dots, σ_{h, j}]

, then

q_{i j} = ⌊ \frac{τ_{i} - f_{i j}}{σ_{i j}} ⌋ + 1

, i.e.,

Q (T / F^{j}) = [⌊ \frac{τ_{1} - f_{1 j}}{σ_{1 j}} ⌋ + 1, ⌊ \frac{τ_{2} - f_{2 j}}{σ_{2 j}} ⌋ + 1, \dots, ⌊ \frac{τ_{h} - f_{h, j}}{σ_{h, j}} ⌋ + 1]

.

Proof.

Based on the concept of arithmetic sequence and the given conditions, for each

1 \leq i \leq h

, there exists a maximum

e_{i} \in N

such that

1 + e_{i} \cdot σ_{i j} + (f_{i j} - 1) \leq τ_{i}

, i.e.,

e_{i} \leq \frac{τ_{i} - f_{i j}}{σ_{i j}}

. Since

q_{i j} = e_{i} + 1

, the result follows immediately via the representation of the floor function. □

Definition 9.

Let

\to_{q_{i j}}^{f_{i j}}

denote the column vector

[q_{i j}, q_{i j} + 1, q_{i j} + 2, \dots, q_{i j} + f_{i j} - 1]

.

Definition 10.

The set of all the positional indices of

T / F^{j}

(or

Q (T / F^{j})

) is defined by

\prod_{i = 1}^{h} {\bar{q}}_{i, j}

Lemma 1

(

\vec{I}

-index tensor). The

\vec{I}

-index tensor of T is

T [\vec{I}] = T ↾_{D_{\vec{I}}}

(a partial function), where the partial domain

D_{\vec{I}} = \to_{1 + (i_{1} - 1) \cdot σ_{1 j}}^{f_{1 j}} \hat{\otimes} \to_{1 + (i_{2} - 1) \cdot σ_{2 j}}^{f_{2 j}} \hat{\otimes} \dots \hat{\otimes} \to_{1 + (i_{h} - 1) \cdot σ_{h j}}^{f_{h j}}

, where

\vec{I} = [i_{1}, i_{2}, \dots, i_{h}] \in {\bar{q}}_{1 j} \times {\bar{q}}_{2 j} \dots {\bar{q}}_{h, j}

.

Proof.

Based on Definitions 7–10 and Claim 1, the results follow immediately. □

Claim 2

(linearity for matrix). The function

l : {\bar{n}}_{1} \times {\bar{n}}_{2} \to \bar{n_{1} \cdot n_{2}}

defined by

l (i, j) : = i + (j - 1) \cdot n_{1}

is a bijective function for any given

n_{1}, n_{2} \in N

.

Proof.

Since

| {\bar{n}}_{1} \times {\bar{n}}_{2} | = | \bar{n_{1} \cdot n_{2}} |

, it suffices to show that l is injective. Suppose

l (i, j) = l (i^{'}, j^{'})

, or

i + (j - 1) \cdot n_{1} = i^{'} + (j^{'} - 1) \cdot n_{1}

, i.e.,

i - i^{'} = (j^{'} - j) \cdot n_{1}

. Since

- n_{1} < i - i^{'} < n_{1}

, i.e.,

- n_{1} < (j^{'} - j) \cdot n_{1} < n_{1}

, i.e.,

- 1 < j^{'} - j < 1

, one has

j = j^{'}

and thus

i = i^{'}

. □

Lemma 2.

(linearity of spatial indices of a tensor) The function

l : \prod_{j = 1}^{M} {\bar{n}}_{j} \to \bar{\prod_{j = 1}^{M} n_{j}}

defined by

l (i_{1}, i_{2}, \dots, i_{M}) : = i_{1} + \sum_{k = 2}^{M} [(i_{k} - 1) \cdot \prod_{p = 1}^{k - 1} n_{p}]

is a bijective function.

Proof.

We show this by mathematical induction. For

M = 2

, it is shown to be true in Claim 2. Suppose

M = n

holds, i.e., for any arbitrary

l (i_{1}, i_{2}, \dots, i_{n}) = l (i_{1}^{'}, i_{2}^{'}, \dots, i_{n}^{'})

, one has

(i_{1}, i_{2}, \dots, i_{n}) = (i_{1}^{'}, i_{2}^{'}, \dots, i_{n}^{'})

. Next, we show

M = n + 1

also holds. Suppose

l (i_{1}, i_{2}, \dots, i_{n}, i_{n + 1}) = l (i_{1}^{'}, i_{2}^{'}, \dots, i_{n}^{'}, i_{n + 1}^{'})

. Then,

i_{1} + \sum_{k = 2}^{n + 1} [(i_{k} - 1) \cdot \prod_{p = 1}^{k - 1} n_{p}] = i_{1}^{'} + \sum_{k = 2}^{n + 1} [(i_{k}^{'} - 1) \cdot \prod_{p = 1}^{k - 1} n_{p}]

, i.e.,

i_{1} - i^{'} = n_{1} \cdot \sum_{k = 2}^{n + 1} [(i_{k}^{'} - i_{k}) \cdot \prod_{p = 1}^{k - 1} n_{p}] = 0

. Since

1 \leq i_{1}, i_{1}^{'} < n_{1}

, one has

i_{1} = i_{1}^{'}

, i.e.,

\sum_{k = 2}^{n + 1} [(i_{k} - 1) \cdot \prod_{p = 1}^{k - 1} n_{p}] = \sum_{k = 2}^{n + 1} [(i_{k}^{'} - 1) \cdot \prod_{p = 1}^{k - 1} n_{p}]

. Now, set

h = k - 1

. Then,

\sum_{h = 1}^{n} [(i_{h + 1} - 1) \cdot \prod_{p = 1}^{h} n_{p}] = \sum_{h = 1}^{n} [(i_{h + 1}^{'} - 1) \cdot \prod_{p = 1}^{h} n_{p}]

, i.e.,

l (i_{2}, \dots, i_{n}, i_{n + 1}) = l (i_{2}^{'}, \dots, i_{n}^{'}, i_{n + 1}^{'})

. Based on the mathematical assumption, one has

i_{j} = i_{j}^{'}

for all

2 \leq j \leq n + 1

. □

Lemma 2 is vital in converting a tensor-style representation of a CNN into a neuron-style representation of a CNN, as demonstrated in Corollaries 1–6.

Corollary 1

(vectorization/linearization of the indices of an input tensor

T_{\vec{τ}}

).

l_{T} : \prod_{k = 1}^{s} \bar{\vec{τ} [k]} \to \bar{\prod_{k = 1}^{s} \vec{τ} [k]}

is a bijective function if

l_{T}

is defined in the same manner as l in Lemma 2.

Corollary 2.

(vectorization/linearization of the indices of sub-tensors or power tensors of

T_{\vec{τ}}

)

l_{\vec{τ}} : \prod_{j = 1}^{s} \bar{⌊ \frac{\vec{τ} [j] - \vec{f} [j]}{\vec{σ} [j]} ⌋ + 1} \to \bar{| P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) |}

is a bijective function if

l_{\vec{τ}}

is defined in the same manner as l in Lemma 2.

Proof.

If we take

M = s

and

n_{j} = ⌊ \frac{\vec{τ} [j] - \vec{f} [j]}{\vec{σ} [j]} ⌋ + 1

, then the result follows immediately. □

Corollary 3

(vectorization/linearization of the number of convolutional operations, or sum of product sop, between

T_{\vec{τ}}

and all the kernels in a given layer).

l_{s o p} : \bar{| P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) |} \times \bar{N_{1}} \to \bar{| P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) | \times N_{1}}

is a bijective function if

l_{s o p}

is defined in the manner of l in Lemma 2, where

N_{1}

is the number of filters (see Figure 4).

Corollary 4

(vectorization/linearization of the indices of a kernel/filter).

l_{F} : \prod_{k = 1}^{s} \bar{\vec{f} [k]} \to \bar{\prod_{k = 1}^{s} \vec{f} [k]}

is a bijective function if

l_{F}

is defined in the same manner as l in Lemma 2.

Corollary 5.

l^{- 1} (n) = (t_{1} \mod n_{1}, t_{2} \mod n_{2}, \dots, t_{N - 1} \mod n_{N - 1}, t_{N} \mod n_{N})

, where, by applying the floor and ceiling functions,

$t_{N} = ⌈ \frac{n}{\prod_{p = 1}^{N - 1} n_{p j}} ⌉$ ;
$t_{N - 1} = ⌈ \frac{r_{N}}{\prod_{p = 1}^{N - 2} n_{p j}} ⌉$ , where $r_{N} = n - {\tilde{q}}_{N} \cdot \prod_{p = 1}^{N - 1} n_{p j}$ and ${\tilde{q}}_{N} = ⌊ \frac{n}{\prod_{p = 1}^{N - 1} n_{p j}} ⌋$ ;
$t_{N - 2} = ⌈ \frac{r_{N - 1}}{\prod_{p = 1}^{N - 3} n_{p j}} ⌉$ , where $r_{N - 1} = r_{N} - {\tilde{q}}_{N - 1} \cdot \prod_{p = 1}^{N - 2} n_{p j}$ and ${\tilde{q}}_{N - 1} = ⌊ \frac{r_{N}}{\prod_{p = 1}^{N - 2} n_{p j}} ⌋$ . Repeat the whole process until
$t_{2} = ⌈ \frac{r_{3}}{\prod_{p = 1}^{1} n_{p j}} ⌉$ , where $r_{3} = r_{4} - {\tilde{q}}_{3} \cdot \prod_{p = 1}^{2} n_{p j}$ and ${\tilde{q}}_{3} = ⌊ \frac{r_{4}}{\prod_{p = 1}^{2} n_{p j}} ⌋$ ;
$t_{1} = r_{2}$ , where $r_{2} = r_{3} - {\tilde{q}}_{2} \cdot n_{1 j}$ and ${\tilde{q}}_{2} = ⌊ \frac{r_{3}}{\prod_{p = 1}^{1} n_{p j}} ⌋$ .

Remark 5.

Since

l_{τ}, l_{τ}, l_{s o p}

are defined over the sizes of the sections (or sectional vectors), any further expansion, such as zero-padding of T (the input object with sectional vector τ and zero-padding sectional vector

τ^{'}

) will not alter its bijectivity, i.e.,

l_{τ^{'}}, l_{τ^{'}}, l_{s o p^{'}}

are all bijective (this could be easily shown via Lemma 2).

Corollary 6.

For the ith filter in the layer (stage) j, or

F^{i j}

, the filter could be linearized by

F^{i j} [n]

for all

1 \leq n \leq \prod_{p = 1}^{h} f_{p j}

, i.e., in the jth layer (stage) related to the feature maps, there are

m_{j} \cdot \prod_{p = 1}^{h} f_{p j}

linearized nodes whose values are decided by the presented results.

Proof.

For each

1 \leq i \leq m_{j}

,

F^{i j} [\prod_{p = 1}^{k} {\bar{f}}_{p j}] = F^{i j} [l^{- 1} (\prod_{p = 1}^{k} f_{p j})]

, by Lemma 2, the result follows immediately. □

Example 2.

Suppose T is a 6-by-7 matrix. Suppose the stride vector

{\vec{σ}}_{j} = [σ_{1 j}, σ_{2 j}] = [1, 2]

.

S e c (F^{j}) \equiv {\vec{f}}_{j} = [f_{1 j}, f_{2 j}] = [2, 3]

. Then,

h = 2

,

S e c (T) \equiv \vec{τ} = [τ_{1}, τ_{2}] = [6, 7]

. Based on Claim 1,

Q (T / F^{j}) = [q_{1 j}, q_{2 j}] = [⌊ \frac{τ_{1} - f_{1 j}}{σ_{1 j}} ⌋ + 1, ⌊ \frac{τ_{2} - f_{2 j}}{σ_{2 j}} ⌋ + 1] = [5, 3]

,

I (T / F^{j}) = \bar{5} \times \bar{3} = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (5, 1), (5, 2), (5, 3)}

. Suppose

\vec{I} = [4, 2]

. Based on Lemma 1,

D_{\vec{I}} = \to_{1 + (4 - 1) \cdot σ_{1 j}}^{f_{1 j}} \hat{\otimes} \to_{1 + (2 - 1) \cdot σ_{2 j}}^{f_{2 j}} = \to_{4}^{2} \hat{\otimes} \to_{3}^{3} = [\begin{matrix} 4 \\ 5 \end{matrix}] \hat{\otimes} [\begin{matrix} 3 \\ 4 \\ 5 \end{matrix}] = [\begin{matrix} (4, 3) & (4, 4) & (4, 5) \\ (5, 3) & (5, 4) & (5, 5) \end{matrix}]

and thus

T ↾_{D_{\vec{I}}} = [\begin{matrix} T (4, 4) & T (4, 5) & T (4, 6) \\ T (5, 4) & T (5, 5) & T (5, 6) \end{matrix}]

. Hence,

P_{\vec{f}}^{\vec{σ}}

could be represented by

P_{S e c (F^{j})}^{T} \equiv P_{\vec{f}}^{\vec{σ}} = {[T (\to_{1 + (i_{1} - 1) \cdot σ_{1 j}}^{2} \hat{\otimes} \to_{1 + (i_{2} - 1) \cdot σ_{2 j}}^{3})]}_{(i_{1}, i_{2}) \in \bar{5} \times \bar{3}} = {[T (\to_{i}^{2} \hat{\otimes} \to_{2 \cdot i_{2} - 1}^{3})]}_{(i_{1}, i_{2}) \in \bar{5} \times \bar{3}} = [\begin{matrix} T (\to_{1}^{2} \hat{\otimes} \to_{1}^{3}) & T (\to_{1}^{2} \hat{\otimes} \to_{3}^{3}) & T (\to_{1}^{2} \hat{\otimes} \to_{5}^{3}) \\ T (\to_{2}^{2} \hat{\otimes} \to_{1}^{3}) & T (\to_{2}^{2} \hat{\otimes} \to_{3}^{3}) & T (\to_{2}^{2} \hat{\otimes} \to_{5}^{3}) \\ T (\to_{3}^{2} \hat{\otimes} \to_{1}^{3}) & T (\to_{3}^{2} \hat{\otimes} \to_{3}^{3}) & T (\to_{3}^{2} \hat{\otimes} \to_{5}^{3}) \\ T (\to_{4}^{2} \hat{\otimes} \to_{1}^{3}) & T (\to_{4}^{2} \hat{\otimes} \to_{3}^{3}) & T (\to_{5}^{2} \hat{\otimes} \to_{5}^{3}) \\ T (\to_{5}^{2} \hat{\otimes} \to_{1}^{3}) & T (\to_{5}^{2} \hat{\otimes} \to_{3}^{3}) & T (\to_{5}^{2} \hat{\otimes} \to_{5}^{3}) \end{matrix}]

.

Example 3.

Suppose T is a 26-by-37-by-19-by-28 tensor. Suppose the stride vector

{\vec{σ}}_{j} = [σ_{1 j}, σ_{2 j}, σ_{3 j}, σ_{4 j}] = [6, 3, 7, 5]

.

S e c (F^{j}) = [f_{1 j}, f_{2 j}, f_{3 j}, f_{4 j}] = [3, 4, 5, 2]

. Then,

h = 4

,

S e c (T) = [τ_{1}, τ_{2}, τ_{3}, τ_{4}] = [26, 37, 19, 28]

. Based on Claim 1,

Q (T / F^{j}) = [q_{1 j}, q_{2 j}, q_{3 j}, q_{4 j}] = [⌊ \frac{τ_{1} - f_{1 j}}{σ_{1 j}} ⌋ + 1, ⌊ \frac{τ_{2} - f_{2 j}}{σ_{2 j}} ⌋ + 1, ⌊ \frac{τ_{3} - f_{3 j}}{σ_{3 j}} ⌋ + 1, ⌊ \frac{τ_{4} - f_{4 j}}{σ_{4 j}} ⌋ + 1] = [4, 12, 3, 6]

,

Q (T / F^{j}) = \bar{4} \times \bar{12} \times \bar{3} \times \bar{6}

. Suppose

\vec{I} = [4, 7, 2, 5]

. Based on Lemma 1,

D_{\vec{I}} = \to_{1 + (i_{1} - 1) \cdot σ_{1 j}}^{f_{1 j}} \hat{\otimes} \to_{1 + (i_{2} - 1) \cdot σ_{2 j}}^{f_{2 j}} \hat{\otimes} \to_{1 + (i_{3} - 1) \cdot σ_{3 j}}^{f_{3 j}} \hat{\otimes} \to_{1 + (i_{4} - 1) \cdot σ_{4 j}}^{f_{4 j}} = \to_{19}^{3} \hat{\otimes} \to_{19}^{4} \hat{\otimes} \to_{8}^{5} \hat{\otimes} \to_{21}^{2} = [\begin{matrix} 19 \\ 20 \\ 21 \end{matrix}] \hat{\otimes} [\begin{matrix} 19 \\ 20 \\ 21 \\ 22 \end{matrix}] \hat{\otimes} [\begin{matrix} 8 \\ 9 \\ 10 \\ 11 \\ 12 \end{matrix}] \hat{\otimes} [\begin{matrix} 21 \\ 22 \end{matrix}]

and thus

T ↾_{D_{\vec{I}}} = T (\to_{19}^{3} \hat{\otimes} \to_{19}^{4} \hat{\otimes} \to_{8}^{5} \hat{\otimes} \to_{21}^{2})

. Hence,

P_{\vec{f}}^{\vec{σ}}

could be represented by

P_{S e c (F^{j})}^{T} \equiv P_{\vec{f}}^{\vec{σ}} = {[T (\to_{6 \cdot i_{1} - 5}^{3} \hat{\otimes} \to_{3 \cdot i_{2} - 2}^{4} \hat{\otimes} \to_{7 \cdot i_{3} - 6}^{5} \hat{\otimes} \to_{5 \cdot i_{4} - 4}^{2})]}_{\vec{I} \in \bar{4} \times \bar{12} \times \bar{3} \times \bar{6}}

, where

\vec{I} = [i_{1}, i_{2}, i_{3}, i_{4}]

.

Definition 11.

T / F^{i j} = {[〈 T ↾_{D_{\vec{I}}}, F^{i j} 〉]}_{\vec{I} \in {\bar{q}}_{1 j} \times {\bar{q}}_{2 j} \dots {\bar{q}}_{h, j}}

, or a tensor consisting of the standard inner products between input tensors

T ↾_{D_{\vec{I}}}

and the feature tensor

F^{j}

at stage (layer) j.

Now, each of these inner products is captured by the function sum of products, as we did for the fully connected network. With Definition 11 and Corollary 6, we could reach the conclusion that the total number of neurons in this layer is

(\prod_{p = 1}^{h} q_{p j}) \cdot m_{j}

. Since we are going to look into the forward and backward propagations, we partition the indices of the tensors of the previous layer.

5. Theories and Methods

Claim 3.

The number of neurons in the first layer is

\prod_{n = 1}^{s} \vec{τ} [n]

under the linearization in Corollary 1, and the value associated with each neuronkis

T (l_{τ}^{- 1} (k))

for all

1 \leq k \leq \prod_{n = 1}^{s} \vec{τ} [n]

, where

l_{τ}

is defined in Corollary 1.

Claim 4.

The number of neurons in the second layer is

N_{2} = | P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) | \cdot N_{1}

, where

N_{1}

is the number of filters/kernels, i.e., the label set for the neurons in the second layer is

{s o p_{j}}_{j = 1}^{N_{2}} = {s o p_{j}}_{j = 1}^{K \cdot N_{1}}

, where

K = | P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) |

, where

s o p_{j}

stands for ‘sum of product’ of thejth node in the referred layer (or in the standard CNN, it is thejth convoluted value).

Lemma 3.

The indices from the first layer described in Claim 3 that links to the jth neuron in the second layer described in Claim 4 is the set

l_{T} (\prod_{k = 1}^{s} \bar{(\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ}) [k] : ((\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{f}) [k]})

, where

\vec{γ} = l_{\vec{τ}}^{- 1} ({(l_{s o p}^{- 1} (j))}_{1})

, and

{(l_{s o p}^{- 1} (j))}_{1} = k

, if

l_{s o p}^{- 1} (j) = (k, n)

, and where

l_{T} (\prod_{k = 1}^{s} \bar{(\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ}) [k] : ((\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{f}) [k]}) \equiv \{l_{T} (\vec{v}) : \vec{v} \in \prod_{k = 1}^{s} \bar{(\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ}) [k] : ((\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{f}) [k]}\}

.

Proof.

Given an index j in the second layer, based on Corollary 3:

l_{s o p} : \bar{| P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) |} \times \bar{N_{1}} \to \bar{| P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) | \times N_{1}}

, one has

(k, n) = l_{s o p}^{- 1} (j)

, where

k \in | P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) |

. Then, based on Corrollary 2,

l_{\vec{τ}} : \prod_{j = 1}^{s} \bar{⌊ \frac{\vec{τ} [j] - \vec{f} [j]}{\vec{σ} [j]} ⌋ + 1} \to \bar{| P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) |}

, one has

l_{\vec{τ}}^{- 1} (k) = \vec{γ}

, which, via Definition 4 and Corollary 1, links to the result. □

Theorem 1.

s o p_{j} = 〈T_{τ} (l_{s o p}^{- 1} {(j)}_{1}), F^{(l_{s o p}^{- 1} {(j)}_{2}), 1}〉

, where

l_{s o p}^{- 1} {(j)}_{1} \equiv {(l_{s o p}^{- 1} (j))}_{1}

and

l_{s o p}^{- 1} {(j)}_{2} \equiv {(l_{s o p}^{- 1} (j))}_{2}

.

Proof.

Based on Corollary 3, the result follows immediately. □

Theorem 2.

ω_{i j} = \{\begin{matrix} F_{\vec{f}}^{j_{2}, 1} [l_{\vec{τ}}^{- 1} (i)], if i \in l_{T} (\prod_{k = 1}^{s} \bar{(\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ}) [k] : ((\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{f}) [k]}), \\ 0, otherwise . \end{matrix} where \vec{γ} = l_{\vec{τ}}^{- 1} ({(l_{s o p}^{- 1} (j))}_{1})

, for all

1 \leq i \leq \prod_{k = 1}^{s} \vec{τ} [k]

and

1 \leq j \leq N_{2} \equiv | P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) | \times N_{1}

;

j_{2} = {(l_{F}^{- 1} (j))}_{2}

;

F_{\vec{f}}^{i, j}

is defined in Definition 3.

Proof.

Let us define a (weight) function

ω : \bar{\prod_{k = 1}^{s} \vec{τ} [k]} \times \bar{P_{\vec{f}}^{\vec{σ}} (T_{\vec{τ}}) | \times N_{1}} \to R

. Before that, let us define some settings. Since there are

| \vec{q} | = \prod_{k = 1}^{s} \vec{q} [k]

sub-tensors, or

| P_{\vec{f}}^{\vec{σ}} | = | \vec{q} |

, and the indices for the

\vec{γ}

th sub-tensor (the size of each sub-tensor is

| \vec{f} |

), based on the proof in Lemma 3, are

\prod_{k = 1}^{s} \bar{(\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ}) [k] : ((\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{f}) [k]} \subseteq \prod_{k = 1}^{s} \bar{\vec{τ} [k]}

. Now, we define

w_{i j} : = w (i, j)

. If one observes j, they will find j is decided by a pair of values, say

j = (j_{1}, j_{2})

. Since i could link to all the nodes in the next layer (intermediate output layer, or SOP layer), the value of

ω (i, j)

would need to involve j as well. Then, we have to check whether i is activated (convoluted) with respect to j, or more precisely, i is located in the

\vec{γ}

th sub-tensor, where

\vec{γ} = l_{\vec{τ}}^{- 1} ({(l_{s o p}^{- 1} (j))}_{1})

. In summary, neuron i is activated (convoluted) if

i \in \prod_{k = 1}^{s} \bar{(\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ}) [k] : ((\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{f}) [k]}

and its activated value is defined to be

ω_{i, j}

, where i will indicate the ith linearized element in the

j_{2}

filter/kernel and is decided by

l_{\vec{τ}}^{- 1} (i)

. Hence,

w_{i j}

is defined as the statement. The whole argument could also refer to Figure 4 and Figure 5. □

Remark 6.

{ω_{i j}}_{1 \leq i \leq \prod_{k = 1}^{s} \vec{τ} [k]}^{1 \leq j \leq N_{2}}

is a set of weights between an input layer, whose input object/tensor is

T_{\vec{τ}}

, with

\prod_{k = 1}^{s} \vec{τ} [k]

neurons and an intermediate output layer, whose intermediate output object/values is

s o p

(sum of product), with

N_{2}

neurons.

ω_{i j}

is the weight between the ith neuron in the input layer and the jth neuron in the intermediate output neuron.

Theorem 3.

The CNN is represented by two networks: a partial network (or parameterized fully connected network) for feature extraction and a fully connected network.

Proof.

The result follows immediately from Theorems 1 and 2 (see Figure 5). □

6. Illustrative Example

Since the main difference between the standard CNN (sCNN) and this linearized CNN (lCNN) lies in the encoding part, the calculation of the set of weights

ω_{i, j}

is vital. In this section, we exploit R programming (version 4.5.1) to demonstrate the equivalence between the sCNN and lCNN, i.e., an illustrative example for Theorems 2 and 3 (see https://github.com/raymingchen/linearized-high-dimension-CNN.git, accessed on 1 September 2025). Suppose the input image is a cute puppy (https://unsplash.com/photos/brown-short-coated-dog-on-green-grass-field-dgr0ZDbeOqw, a 3D RGB tensor, accessed on 1 September 2025). To simplify the computation, we extract the R part of the input tensor, or puppyR (a 200-by-200 matrix, or (1,2) cell at Figure 6) as our input tensor. Suppose we consider four kernels (the first two are randomly generated, while the last two are some deterministic features):

F^{1, 1} =

[\begin{matrix} 40.01257 & - 71.03048 & - 90.265515 & 64.15175 & 0.6114631 & - 28.811208 \\ 91.03395 & 22.87645 & 13.136873 & - 15.21936 & 79.4672556 & 9.257303 \\ 92.50072 & 90.29318 & 9.735757 & - 86.47676 & - 41.9774183 & - 7.608247 \\ - 70.20737 & - 35.65345 & 51.91615 & 95.19638 & 52.5290581 & - 84.474351 \\ - 32.68287 & 40.95929 & - 28.365139 & 44.18469 & 83.1533521 & - 13.688504 \\ 38.91981 & - 91.67589 & 14.902744 & - 97.88255 & 74.1104222 & 95.176985 \end{matrix}]

;

F^{2, 1} =

[\begin{matrix} 1.125705 & 7.66730341 & - 2.626473 & - 2.0697611 \\ 3.793908 & 5.55425965 & 3.695838 & - 4.7750866 \\ - 5.391265 & 1.76043044 & - 4.390247 & - 0.1502723 \\ - 2.170834 & 0.03233433 & - 3.177388 & - 6.2366755 \end{matrix}]

;

F^{3, 1} = [\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}]

F^{4, 1} = [\begin{matrix} 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{matrix}]

.

Then, we perform the standard CNN (or sCNN) and linearized CNN (or lCNN) for the convolution (this operation is sufficient to show whether the two representations are equivalent). The convoluted results based on sCNN and lCNN are shown in the 4th and 5th rows in Figure 6. The stride vector is fixed at

\vec{σ} = (3, 5)

. Based on these, one finds that the number of power tensors (or sub-tensors) with respect to

F^{1, 1}, F^{2, 1}, F^{3, 1}, F^{4, 1}

are

65 * 39 = 2535, 66 * 40 = 2640, 66 * 40 = 2640, 66 * 40 = 2640

, respectively. For the linearized CNN, there are

200 * 200

= 40,000 nodes in the first/input layer, and there are 2535 + 2640 + 2640 + 2640 = 10,455 nodes in the second layer (intermediate output, or sop layer). Then, we construct (based on Theorem 4) the weights linking layer one and layer two by a 40,000-by-10,455 weight matrix W. Then, the values associated with the nodes of the 2nd layer are computed by a 10,455-by-1 column vector

s o p = W^{t} P_{l}

, where t denotes the transpose operator, and

P_{l}

is the linearized 40,000-by-1 column vector of puppyR. In order to show the equivalence of sCNN and lCNN, we have to reshape

s o p

into four matrices: a 65-by-39 matrix

M A 1

, and three 66-by-40

M A 2, M A 3, M A 4

matrices. The feature map of sCNN and

s o p

of lCNN are identical in the convoluted operation. Since the operations (forward/backward propagation) are also identical regardless of whether it is an sCNN or lCNN, we have demonstrated the equivalence between the two representations.

7. Conclusions

We have reformulated the typical CNN architecture via two sets of networks: (1) a parameterized fully connected network and (2) an (unparameterized) fully connected network (see Figure A1 and Figure A2). This representation will directly link the relation between the CNN and ANN (see Figure 2). Indeed, the CNN could be regarded as a special case of a (parameterized) ANN. In addition, it also offers a very intuitive representation of the transforming process. The main contribution of this work is to incorporate the typical two stages of CNNs, feature extraction and ANN, into one framework. The technique part is the linearization and the high-dimension extension.

Limitations and Future Work

There are some limitations to this setting; for example, we have not conducted any other studies on some newly developed CNNs [16,17] or discussed stacked CNNs [18,19], dense networks [19], or other variants, even though the settings are similar. In the future, we will incorporate high-dimensional CNNs into high-dimensional ANNs and test their efficacy and efficiency if the architecture is unified. We could also study other variants, such as stacked neural networks [27,28,29]. Another drawback for this manuscript is that we do not provide an experiment to implement the framework devised in this article for the following reasons:

This is purely a theoretical framework for incorporating and expanding existing theories and practices, and its efficacy is guaranteed by the minimized error via forward/backward propagation.
Our models could be easily downgraded to 1D, 2D, and 3D data or CNN architectures, and those architectures are always proven to be efficient and faithful.
For higher dimensions, for example, each input data point is a pair of pictures, and this could be represented by a set if two 3D products, or six dimensions. Therefore, theoretically, it is not different from the usual practice, but with significant computational resources—this is beyond our computational capacity. For future research, if a team is capable of implementing the whole algorithm, then it could implement such mechanisms.

Funding

This research was funded by the Internal (Faculty/Staff) Start-Up Research Grant of Wenzhou-Kean University (Project No. ISRG2023029).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Appendix A.1. Calculation of Quotient Tensor $\vec{q}$ and ${{\vec{q}}_{n}}_{n = 1}^{N}$

This part shows the rationale of Definitions 4 and 5. Based on the formula of arithmetic progresion, for each k, one has the relation

1 + (n_{k} - 1) \cdot \vec{σ} [k] + (\vec{f} [k] - 1) \leq \vec{τ} [k]

, where

n_{k}

is regarded as a quotient in the direction k, i.e.,

1 + (\vec{q} [k] - 1) * \vec{σ} [k] + (\vec{f} [k] - 1) \leq \vec{τ} [k]

, i.e.,

(\vec{q} [k] - 1) * \vec{σ} [k] \leq \vec{τ} [k] - \vec{f} [k]

, i.e.,

\vec{q} [k] \leq \frac{\vec{τ} [k] - \vec{f} [k]}{\vec{σ} [k]} + 1

, i.e.,

\vec{q} [k] = ⌊ \frac{\vec{τ} [k] - \vec{f} [k]}{\vec{σ} [k]} ⌋ + 1

. For each particular

\vec{γ} \in \prod_{k = 1}^{s} \bar{\vec{q} [k]}

sub-tensor

T_{\vec{f}}^{\vec{γ}}

, its location along k direction/section (or

L o c (T_{\vec{f}}^{\vec{γ}} [k])

) lies between

1 + (\vec{γ} [k] - 1) \cdot \vec{σ} [k]

and

1 + (\vec{γ} [k] - 1) \cdot \vec{σ} [k] + (\vec{f} [k] - 1) = (\vec{γ} [k] - 1) \cdot \vec{σ} [k] + \vec{f} [k]

, where

1 \leq k \leq s = | \vec{τ} |

. Now, taking all the k values into consideration, one has the locations/indices of

T_{\vec{f}}^{\vec{γ}}

(or

L o c (T_{\vec{f}}^{\vec{γ}})

)

\vec{1} + (\vec{γ} - \vec{1}) ⊙ \vec{σ} \leq L o c (T_{\vec{f}}^{\vec{γ}}) \leq (\vec{γ} - \vec{1}) ⊙ \vec{σ} + \vec{f}

. By the same analogy, one computes the value of each

{\vec{q}}_{n}

.

Appendix A.2. Some Knowledge About Neural Networks

Definition A1.

Suppose

A \equiv [a_{i j}]

is an m-by-n matrix and B is an n-by-1 matrix (vector). Define an m-by-n matrix

A ⊡ B : = [c_{i j}]

, where

c_{i j} = a_{i j} \cdot b_{j 1}

.

Definition A2.

Suppose A is an n-by-1 matrix and

B \equiv [b_{i j}]

is an m-by-n matrix (vector). Define an m-by-n matrix

A ⊠ B : = [c_{i j}]

, where

c_{i j} = a_{i 1} \cdot b_{i j}

.

For example,

[\begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{matrix}] ⊡ [\begin{matrix} b_{11} \\ b_{21} \end{matrix}] = [\begin{matrix} a_{11} \cdot b_{11} & a_{12} \cdot b_{21} \\ a_{21} \cdot b_{11} & a_{22} \cdot b_{21} \\ a_{31} \cdot b_{11} & a_{32} \cdot b_{21} \end{matrix}]

and

[\begin{matrix} a_{11} \\ a_{21} \\ a_{31} \end{matrix}] ⊠ [\begin{matrix} b_{11} & b_{12} \\ b_{21} & b_{22} \\ b_{31} & b_{32} \end{matrix}] = [\begin{matrix} a_{11} \cdot b_{11} & a_{11} \cdot b_{12} \\ a_{21} \cdot b_{21} & a_{21} \cdot b_{22} \\ a_{31} \cdot b_{31} & a_{31} \cdot b_{32} \end{matrix}] .

Regarding its partial derivatives and its updated values (

A N N [m]

denotes the updated forward neural network at step m):

(step 1) $L ω_{i j}^{l} = s o p_{j}^{l} ω_{i j}^{l} \times {\hat{y}}_{j} s o p_{j}^{l} \times L {\hat{y}}_{j}$ and $ω_{i j}^{l} [m + 1] = ω_{i j}^{l} [m] - η \times L ω_{i j}^{l} |_{A N N [m]}$ ;
(step 2) $L ω_{i j}^{l - 1} = s o p_{j}^{l - 1} ω_{i j}^{l - 1} \times σ_{j}^{l - 1} s o p_{j}^{l - 1} \times ({[s o p_{k}^{l} σ_{j}^{l - 1}]}_{1 \times n_{l}}^{1 \leq k \leq n_{l}} ⊡ {[{\hat{y}}_{k} s o p_{k}^{l}]}_{n_{l} \times 1}^{1 \leq k \leq n_{l}}) {[L {\hat{y}}_{k}]}_{n_{1} \times 1}^{1 \leq k \leq n_{l}}$ and $ω_{i j}^{l - 1} [m + 1] = ω_{i j}^{l - 1} [m] - η \times L ω_{i j}^{l - 1} |_{A N N [m]}$ ;
(step 3) $L ω_{i j}^{l - 2} = s o p_{j}^{l - 2} ω_{i j}^{l - 2} \times σ_{j}^{l - 2} s o p_{j}^{l - 2} \times ({[s o p_{k}^{l - 1} σ_{j}^{l - 2}]}_{1 \times n_{l - 1}}^{1 \leq k \leq n_{l - 1}} ⊡ {[σ_{k}^{l - 1} s o p_{k}^{l - 1}]}_{n_{l - 1} \times 1}^{1 \leq k \leq n_{l - 1}})$ $({[s o p_{p}^{l} σ_{k}^{l - 1}]}_{n_{l - 1} \times n_{l}} ⊡ {[{\hat{y}}_{q} s o p_{p}^{l}]}_{n_{l} \times 1}^{1 \leq p \leq n_{l}}) {[L {\hat{y}}_{q}]}_{n_{1} \times 1}^{1 \leq q \leq n_{l}}$ and $ω_{i j}^{l - 2} [m + 1] = ω_{i j}^{l - 2} [m] - η \times L ω_{i j}^{l - 2} |_{A N N [m]}$ ;
(step 4) $L ω_{i j}^{l - 3} = s o p_{j}^{l - 3} ω_{i j}^{l - 3} \times σ_{j}^{l - 3} s o p_{j}^{l - 3} \times ({[s o p_{k}^{l - 2} σ_{j}^{l - 3}]}_{1 \times n_{l - 2}}^{1 \leq k \leq n_{l - 2}} ⊡ {[σ_{k}^{l - 2} s o p_{k}^{l - 2}]}_{n_{l - 2} \times 1}^{1 \leq k \leq n_{l - 2}})$ $({[s o p_{p}^{l - 1} σ_{k}^{l - 2}]}_{1 \times n_{l - 1}}^{1 \leq p \leq n_{l - 1}} ⊡ {[σ_{q}^{l - 1} s o p_{p}^{l - 1}]}_{n_{l - 1} \times 1}^{1 \leq p \leq n_{l - 1}}) ({[s o p_{s}^{l} σ_{q}^{l - 1}]}_{n_{l - 1} \times n_{l}} ⊡ {[{\hat{y}}_{s} s o p_{s}^{l}]}_{n_{l} \times 1}^{1 \leq p \leq n_{l}}) {[L {\hat{y}}_{s}]}_{n_{l} \times 1}^{1 \leq s \leq n_{l}}$ and $ω_{i j}^{l - 3} [m + 1] = ω_{i j}^{l - 3} [m] - η \times L ω_{i j}^{l - 3} |_{A N N [m]}$ ;
⋮
(step $l - 1$ ) $L ω_{i j}^{2} = s o p_{j}^{2} ω_{i j}^{2} \times σ_{j}^{2} s o p_{j}^{2} \times ({[s o p_{k_{3}}^{3} σ_{j}^{2}]}_{1 \times n_{3}}^{1 \leq k_{3} \leq n_{3}} ⊡ {[σ_{k_{3}}^{3} s o p_{k_{3}}^{3}]}_{n_{3} \times 1}^{1 \leq k_{3} \leq n_{3}})$ $\prod_{h = 4}^{l} ({[s o p_{k_{h}}^{h} σ_{k_{h - 1}}^{h - 1}]}_{1 \leq k_{h - 1} \leq n_{h - 1}}^{1 \leq k_{h} \leq n_{h}} ⊡ {[σ_{k_{h}}^{h} s o p_{k_{h}}^{h}]}_{1 \leq k_{h} \leq n_{h}}) {[L {\hat{y}}_{i}]}_{n_{1} \times 1}^{1 \leq i \leq n_{l}}$ and $ω_{i j}^{2} [m + 1] = ω_{i j}^{2} [m] - η \times L ω_{i j}^{2} |_{A N N [m]}$ , where $\prod$ represents the recursive matrix multiplications.
(step l) $L ω_{i j}^{1} = s o p_{j}^{1} ω_{i j}^{1} \times σ_{j}^{1} s o p_{j}^{1} \times ({[s o p_{k_{2}}^{2} σ_{j}^{1}]}_{1 \times n_{2}}^{1 \leq k_{2} \leq n_{2}} ⊡ {[σ_{k_{2}}^{2} s o p_{k_{2}}^{2}]}^{1 \leq k_{2} \leq n_{2}})$ $\prod_{h = 3}^{l} ({[s o p_{k_{h}}^{h} σ_{k_{h - 1}}^{h - 1}]}_{1 \leq k_{h - 1} \leq n_{h - 1}}^{1 \leq k_{h} \leq n_{h}} ⊡ {[σ_{k_{h}}^{h} s o p_{k_{h}}^{h}]}_{1 \leq k_{h} \leq n_{h}}) {[L {\hat{y}}_{i}]}_{n_{1} \times 1}^{1 \leq i \leq n_{l}}$ and $ω_{i j}^{1} [m + 1] = ω_{i j}^{1} [m] - η \times L ω_{i j}^{1} |_{A N N [m]}$ .

This could be further represented in matrix form. The fundamental structure (pattern) is

\nabla L (W^{t}) = [s o p_{j}^{t} ω_{i j}^{t}]_{1 \leq i \leq n_{t - 1}}^{1 \leq j \leq n_{t}} ⊡ {[L s o p_{j}^{t}]}_{n_{t} \times 1}^{1 \leq j \leq n_{t}}

.

We list some of the indices of size, but we omit the majority of them:

(step 1) $\nabla L (W^{l}) \equiv {[L ω_{i j}^{l}]}_{1 \leq i \leq n_{l - 1}}^{1 \leq j \leq n_{l}} = {[s o p_{j}^{l} ω_{i j}^{l}]}_{1 \leq i \leq n_{l - 1}}^{1 \leq j \leq n_{l}} ⊡ {({[{\hat{y}}_{j} s o p_{j}^{l}]}_{n_{l} \times 1}^{1 \leq j \leq n_{l}} ⊙ {[L {\hat{y}}_{j}]}_{n_{l} \times 1}^{1 \leq j \leq n_{l}})}_{n_{l} \times 1}$ and thus $W^{l} [m + 1] = W^{l} [m] - η \times \nabla L (W^{l}) |_{A N N [m]}$ ;
(step 2) $\nabla L (W^{l - 1}) \equiv {[L ω_{i j}^{l - 1}]}_{1 \leq i \leq n_{l - 2}}^{1 \leq j \leq n_{l - 1}} = [s o p_{j}^{l - 1} ω_{i j}^{l - 1}] ⊡ {([σ_{j}^{l - 1} s o p_{j}^{l - 1}] ⊠ {[s o p_{k}^{l} σ_{j}^{l - 1}]}_{1 \leq j \leq n_{l - 1}}^{1 \leq k \leq n_{l}})$ $({[{\hat{y}}_{k} s o p_{k}^{l}]}_{n_{l} \times 1}^{1 \leq k \leq n_{l}} ⊙ {[L {\hat{y}}_{k}]}_{n_{l} \times 1}^{1 \leq k \leq n_{l}}$ and $W^{l - 1} [m + 1] = W^{l - 1} [m] - η \times \nabla L (W^{l - 1}) |_{A N N [m]}$ ;
(step 3) $\nabla L (W^{l - 2}) \equiv {[L ω_{i j}^{l - 2}]}_{1 \leq i \leq n_{l - 3}}^{1 \leq j \leq n_{l - 2}} = [s o p_{j}^{l - 2} ω_{i j}^{l - 2}] ⊡ {([σ_{k}^{l - 2} s o p_{j}^{l - 2}] ⊠ {[s o p_{p}^{l - 1} σ_{k}^{l - 2}]}_{1 \leq k \leq n_{l - 2}}^{1 \leq p \leq n_{l - 1}})$ $({[σ_{p}^{l - 1} s o p_{p}^{l - 1}]}_{n_{l - 1} \times 1}^{1 \leq p \leq n_{l - 1}} ⊠ {[s o p_{q}^{l} σ_{p}^{l - 1}]}_{n_{l - 1} \times n_{l}}) ({[{\hat{y}}_{q} s o p_{q}^{l}]}_{n_{l} \times 1}^{1 \leq q \leq n_{l}} ⊙ {[L {\hat{y}}_{q}]}_{n_{l} \times 1}^{1 \leq q \leq n_{l}})}$ and $W^{l - 2} [m + 1] = W^{l - 2} [m] - η \times \nabla L (W^{l - 2}) |_{A N N [m]}$ ;
(step 4) $\nabla L (W^{l - 3}) \equiv {[L ω_{i j}^{l - 3}]}_{1 \leq i \leq n_{l - 4}}^{1 \leq j \leq n_{l - 3}} = [s o p_{j}^{l - 3} ω_{i j}^{l - 3}] ⊡ {([σ_{v}^{l - 3} s o p_{j}^{l - 3}] ⊠ {[s o p_{k}^{l - 2} σ_{v}^{l - 3}]}_{1 \times n_{l - 2}}^{1 \leq k \leq n_{l - 2}})$ $({[σ_{k}^{l - 2} s o p_{k}^{l - 2}]}_{n_{l - 2} \times 1}^{1 \leq k \leq n_{l - 2}} ⊠ {[s o p_{s}^{l - 1} σ_{k}^{l - 2}]}_{1 \leq k \leq n_{l - 2}}^{1 \leq s \leq n_{l - 1}}) ({[σ_{s}^{l - 1} s o p_{s}^{l - 1}]}_{n_{l - 1} \times 1}^{1 \leq s \leq n_{l - 1}} ⊠ {[s o p_{q}^{l} σ_{s}^{l - 1}]}_{n_{l - 1} \times n_{l}})$ $({[{\hat{y}}_{q} s o p_{q}^{l}]}_{n_{l} \times 1}^{1 \leq q \leq n_{l}} ⊙ {[L {\hat{y}}_{q}]}_{n_{1} \times 1}^{1 \leq q \leq n_{l}})}$ and $W^{l - 3} [m + 1] = W^{l - 3} [m] - η \times \nabla L (W^{l - 3}) |_{A N N [m]}$ ;
⋮
(step $l - 1$ ) $\nabla L (W^{2}) \equiv {[L ω_{k_{1} k_{2}}^{2}]}_{1 \leq k_{1} \leq n_{1}}^{1 \leq k_{2} \leq n_{2}} = {[s o p_{k_{2}}^{2} ω_{k_{1} k_{2}}^{2}]}_{1 \leq k_{1} \leq n_{1}}^{1 \leq k_{2} \leq n_{2}} ⊡ {\prod_{h = 2}^{l - 1} ({[σ_{k_{h}}^{h} s o p_{k_{h}}^{h}]}_{n_{h} \times 1}^{1 \leq k_{h} \leq n_{h}} ⊠ {[s o p_{k_{h + 1}}^{h + 1} σ_{k_{h}}^{h}]}_{1 \leq k_{h} \leq n_{h}}^{1 \leq k_{h + 1} \leq n_{h + 1}})({[σ_{k_{l}}^{l} s o p_{k_{l}}^{l}]}_{n_{l} \times 1}^{1 \leq k_{l} \leq n_{l}} ⊙ {[L {\hat{y}}_{k_{l}}]}_{n_{l} \times 1}^{1 \leq k_{l} \leq n_{l}})\}$ and $W^{2} [m + 1] =$ $W^{2} [m] - η \times \nabla L (W^{2}) |_{A N N [m]}$ , where $\prod$ represents the recursive matrix multiplications.
(step l) $\nabla L (W^{1}) \equiv {[L ω_{k_{0} k_{1}}^{1}]}_{1 \leq k_{0} \leq n_{0}}^{1 \leq k_{1} \leq n_{1}} = {[s o p_{k_{1}}^{1} ω_{k_{0} k_{1}}^{1}]}_{1 \leq k_{0} \leq n_{0}}^{1 \leq k_{1} \leq n_{1}} ⊡ \{\prod_{h = 1}^{l - 1} ({[σ_{k_{h}}^{h} s o p_{k_{h}}^{h}]}_{n_{h} \times 1}^{1 \leq k_{h} \leq n_{h}} ⊠ {[s o p_{k_{h + 1}}^{h + 1} σ_{k_{h}}^{h}]}_{1 \leq k_{h} \leq n_{h}}^{1 \leq k_{h + 1} \leq n_{h + 1}}) ({[{\hat{y}}_{k_{l}}^{l} s o p_{k_{l}}^{l}]}_{n_{l} \times 1}^{1 \leq k_{l} \leq n_{l}} ⊙ {[L {\hat{y}}_{k_{l}}^{l}]}_{n_{l} \times 1}^{1 \leq k_{l} \leq n_{l}})\}$ and $W^{1} [m + 1] =$ $W^{1} [m] - η \times \nabla L (W^{1}) |_{A N N [m]}$ .

Definition A3.

M (t) : = \prod_{h = t}^{l - 1} ({[s o p_{k_{h}}^{h} σ_{k_{h}}^{h}]}_{n_{h} \times 1}^{1 \leq k_{h} \leq n_{h}} ⊠ {[σ_{k_{h + 1}}^{h + 1} s o p_{k_{h}}^{h}]}_{1 \leq k_{h} \leq n_{h}}^{1 \leq k_{h + 1} \leq n_{h + 1}})

, where

σ_{k_{t}}^{t} = {\hat{y}}_{k_{t}}^{t}

. Observe that each

M [t]

is an

n_{t}

-by-

n_{l}

matrix.

Claim A1.

\nabla L (W^{t}) = {[s o p_{j}^{t} ω_{i, j}^{t}]}_{1 \leq i \leq n_{t - 1}}^{1 \leq j \leq n_{t}} ⊡ {\{M (t) ({[{\hat{y}}_{k_{l}}^{l} s o p_{k_{l}}^{l}]}_{n_{l} \times 1}^{1 \leq k_{l} \leq n_{l}} ⊙ {[L {\hat{y}}_{k_{l}}^{l}]}_{n_{l} \times 1}^{1 \leq k_{l} \leq n_{l}})\}}_{n_{t} \times 1}

Alternatively, this could be represented in another form of matrix representation:

Definition A4.

N (t) : = \prod_{h = t}^{l - 1} ({[s o p_{k_{h + 1}}^{h} σ_{k_{h}}^{h}]}_{1 \leq k_{h} \leq n_{h}}^{1 \leq k_{h + 1} \leq n_{h + 1}} ⊡ {[σ_{k_{h + 1}}^{h + 1} s o p_{k_{h + 1}}^{h + 1}]}_{n_{h + 1} \times 1}^{1 \leq k_{h + 1} \leq n_{h + 1}})

, where

σ_{k_{t}}^{t} = {\hat{y}}_{k_{t}}^{t}

.

Observe that each

N [t]

is an

n_{t}

-by-

n_{l}

matrix.

Claim A2.

\nabla L (W^{t}) = ({[s o p_{k_{t}}^{t} ω_{k_{t - 1}, k_{t}}^{t}]}_{1 \leq k_{t - 1} \leq n_{t - 1}}^{1 \leq k_{t} \leq n_{t}} ⊡ {[σ_{j}^{t} s o p_{j}^{t}]}_{n_{t} \times 1}^{1 \leq k_{t} \leq n_{t}}) ⊡ \{N (t) {[L σ_{k_{l}}^{l}]}_{n_{l} \times 1}^{1 \leq k_{l} \leq n_{l}})}_{n_{t} \times 1}

, where

σ_{k_{l}}^{l} = y_{k_{l}}^{l}

.

Since either forward or backward propogation of the CNN engages only partial input neurons, we use an

n_{0}

-lengthed characteristic column vector

\vec{v}

to capture the connected neurons, i.e., its element is 1 if it is connected after performing the feature extraction and 0 if it is not connected. We could represent this partially connected gradient as

\tilde{\nabla} L (W^{1}) = ({[\begin{matrix} 1 \end{matrix}]}_{n_{0} \times 1} \otimes \vec{v}) ⊙ \nabla L (W^{1})

and thus

W^{1} [m + 1] = W^{1} [m] - η \times \tilde{\nabla} L (W^{1})

, where m denotes the mth recursive backward propogation via the gradient descent method.

Appendix A.3. Forward Propagation/Induction for ANN

Figure A1. This is a fully connected ANN: The upper part is a matrix-form representation, while the lower part is an (neuron) element form of representation. sop stands for sum of product. m stands for the mth step for forward and backward propogation. In the graph, stage means layer. a stands for an activation function. b

W^{t}

is the weight matrix at stage (layer) t.

\vec{Y}

is the target output vector, while

\vec{\hat{Y}}

is the predictive output vector.

Figure A1. This is a fully connected ANN: The upper part is a matrix-form representation, while the lower part is an (neuron) element form of representation. sop stands for sum of product. m stands for the mth step for forward and backward propogation. In the graph, stage means layer. a stands for an activation function. b

W^{t}

is the weight matrix at stage (layer) t.

\vec{Y}

is the target output vector, while

\vec{\hat{Y}}

is the predictive output vector.

Appendix A.4. Backward Propagation/Induction for ANN

Figure A2. Backward Induction for GDM: Suppose

A N N [m]

is known. This diagram shows how to update

W^{t} [m + 1]

for all

1 \leq t \leq l

.

Figure A2. Backward Induction for GDM: Suppose

A N N [m]

is known. This diagram shows how to update

W^{t} [m + 1]

for all

1 \leq t \leq l

.

References

Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef]
Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
Aloysius, N.; Geetha, M. A review on deep convolutional neural networks. In Proceedings of the 2017 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 6–8 April 2017; pp. 0588–0592. [Google Scholar] [CrossRef]
Yamashita, R.; Nishio, M.; Do, R.K.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef]
Jin, K.H.; McCann, M.T.; Froustey, E.; Unser, M. Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 2017, 26, 4509–4522. [Google Scholar] [CrossRef]
Kumar, A.; Singh, C.; Sachan, M. A moment-based pooling approach in convolutional neural networks for breast cancer histopathology image classification. Neural Comput. Appl. 2025, 37, 1127–1156. [Google Scholar] [CrossRef]
Sharma, A.; Vans, E.; Shigemizu, D.; Boroevich, K.A.; Tsunoda, T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 2019, 9, 11399. [Google Scholar] [CrossRef]
Wardah, W.; Khan, M.G.M.; Sharma, A.; Rashid, M.A. Protein secondary structure prediction using neural networks and deep learning: A review. Comput. Biol. Chem. 2019, 81, 1–8. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D convolutional neural networks and applications: A survey. Mech. Syst. Signal Process. 2021, 151, 107398. [Google Scholar] [CrossRef]
Wang, H.; Jin, X.; Zhang, T.; Wang, J. Convolutional neural network-based recognition method for volleyball movements. Heliyon 2023, 9, e18124. [Google Scholar] [CrossRef] [PubMed]
Bhowmick, S.; Nagarajaiah, S.; Veeraraghavan, A. Vision and deep learning-based algorithms to detect and quantify cracks on concrete surfaces from uav videos. Sensors 2020, 20, 6299. [Google Scholar] [CrossRef]
Hasan, R.I.; Yusuf, S.M.; Alzubaidi, L. Review of the state of the art of deep learning for plant diseases: A broad analysis and discussion. Plants 2020, 9, 1302. [Google Scholar] [CrossRef]
Ciresan, D.C.; Meier, U.; Gambardella, L.M.; Schmidhuber, J. Convolutional neural network committees for handwritten character classification. In Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 1135–1139. [Google Scholar]
Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. In Computer Vision—ECCV 2014, Proceedings of the ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2014; Volume 8689. [Google Scholar] [CrossRef]
Taye, M.M. Theoretical Understanding of Convolutional Neural Network: Concepts, Architectures, Applications, Future Directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Raj, R.; Kos, A. An Extensive Study of Convolutional Neural Networks: Applications in Computer Vision for Improved Robotics Perceptions. Sensors 2025, 25, 1033. [Google Scholar] [CrossRef]
Muñoz-Silva, E.M.; Vasquez-Gomez, J.I.; Merlo-Zapata, C.A.; Antonio-Cruz, M. Binary damage classification of built heritage with a 3D neural network. npj Herit. Sci. 2025, 13, 124. [Google Scholar] [CrossRef]
Nie, Y.; Chen, Y.; Guo, J.; Li, S.; Xiao, Y.; Gong, W.; Lan, R. An improved CNN model in image classification application on water turbidity. Sci. Rep. 2025, 15, 11264. [Google Scholar] [CrossRef]
Choy, C.; Lee, J.; Ranftl, R.; Park, J.; Koltun, V. High-Dimensional Convolutional Networks for Geometric Pattern Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11224–11233. [Google Scholar] [CrossRef]
Du, G.; Su, J.; Zhang, L.; Su, K.; Wang, X.; Teng, S.; Liu, P.X. A Multi-Dimensional Graph Convolution Network for EEG Emotion Recognition. IEEE Trans. Instrum. Meas. 2022, 71, 2518311. [Google Scholar] [CrossRef]
Zaniolo, L.; Marques, O. On the use of variable stride in convolutional neural networks. Multimed. Tools Appl. 2020, 79, 13581–13598. [Google Scholar] [CrossRef]
Liu, L.Y.; Liu, Y.; Zhu, H. Masked convolutional neural network for supervised learning problems. Stat 2020, 9, e290. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Kudo, Y.; Aoki, Y. Dilated convolutions for image classification and object localization. In Proceedings of the 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017; pp. 452–455. [Google Scholar] [CrossRef]
Hu, H.; Yu, C.; Zhou, Q.; Guan, Q.; Feng, H. HDConv: Heterogeneous kernel-based dilated convolutions. Neural Netw. 2024, 179, 106568. [Google Scholar] [CrossRef] [PubMed]
Madichetty, S.; M, S. A stacked convolutional neural network for detecting the resource tweets during a disaster. Multimed. Tools Appl. 2021, 80, 3927–3949. [Google Scholar] [CrossRef] [PubMed]
Kim, S.B.; Kang, J.H.; Cheon, M.; Kim, D.J.; Lee, B.C. Stacked neural network for predicting polygenic risk score. Sci. Rep. 2024, 14, 11632. [Google Scholar] [CrossRef] [PubMed]
Rumala, D.J.; van Ooijen, P.; Rachmadi, R.F.; Sensusiati, A.D.; Purnama, I.K.E. Deep-Stacked Convolutional Neural Networks for Brain Abnormality Classification Based on MRI Images. J. Digit. Imaging 2023, 36, 1460–1479. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Standard high-dimension

C N N [m]

: This graph shows how to process a jth training input tensor

{}_{j}T (\vec{τ}) \equiv_{j} T_{\vec{τ}}

whose dimensional vector (a vector that contains all the number of dimensions in each section) is

\vec{τ}

via the

C N N

; p is a pooling function.

Figure 1. Standard high-dimension

C N N [m]

: This graph shows how to process a jth training input tensor

{}_{j}T (\vec{τ}) \equiv_{j} T_{\vec{τ}}

whose dimensional vector (a vector that contains all the number of dimensions in each section) is

\vec{τ}

via the

C N N

; p is a pooling function.

Figure 2. Forward network representation for the feature extraction in the CNN: This is a simplified matrix form of the CNN. p denotes a pooling function; sop denotes sum of product; a denotes an activation function; F denotes a feature matrix, and its subscript

\vec{f}

denotes its tensor size. In addition,

s o p^{i}

and

A_{\vec{q}}^{i}

are both

\vec{q}

-indexed tensors (or

\prod_{k = 1}^{s} \bar{\vec{q} [k]}

-indexed tensors).

Figure 2. Forward network representation for the feature extraction in the CNN: This is a simplified matrix form of the CNN. p denotes a pooling function; sop denotes sum of product; a denotes an activation function; F denotes a feature matrix, and its subscript

\vec{f}

denotes its tensor size. In addition,

s o p^{i}

and

A_{\vec{q}}^{i}

are both

\vec{q}

-indexed tensors (or

\prod_{k = 1}^{s} \bar{\vec{q} [k]}

-indexed tensors).

Figure 3. The connected paths from the power tensor of

T_{\vec{τ}}

to the output

\vec{X}

: The upper part refers to the forward propagation, while the lower part shows the related paths for the backward propagation. We use n in the edges to represent

F_{\vec{f}}^{n}

and

F_{i j}^{n}

to represent the

(i, j)

cell in the matrix

F_{\vec{f}}^{n}

.

t_{i j}^{p q}

is the

(i, j)

element in

T_{\vec{f}}^{p q}

.

〈 A, B 〉

represents an inner product of two matrices.

Figure 3. The connected paths from the power tensor of

T_{\vec{τ}}

to the output

\vec{X}

: The upper part refers to the forward propagation, while the lower part shows the related paths for the backward propagation. We use n in the edges to represent

F_{\vec{f}}^{n}

and

F_{i j}^{n}

to represent the

(i, j)

cell in the matrix

F_{\vec{f}}^{n}

.

t_{i j}^{p q}

is the

(i, j)

element in

T_{\vec{f}}^{p q}

.

〈 A, B 〉

represents an inner product of two matrices.

Figure 4. Weights between layers: This graph shows how to linearly process the feature extraction in the first layer. There are

N_{1}

features/filters whose dimensional vector is

\vec{f} = [\vec{f} [1], \vec{f} [2], \dots, \vec{f} [h]]

and

| \vec{f} | = \prod_{n = 1}^{h} \vec{f} [n]

. In the graph

\vec{q} = [\vec{q} [1], \vec{q} [2], \dots, \vec{q} [h]]

and

| \vec{q} | \equiv \prod_{n = 1}^{h} \vec{q} [n] = \prod_{n = 1}^{h} (⌊ \frac{\vec{τ} [n] - \vec{f} [n]}{\vec{σ} [n]} ⌋ + 1)

. The edges associated with

w_{i j}

are

[[w_{i j}]] = {l_{\vec{τ}} (\vec{ϵ} + l_{\vec{f}}^{- 1} (i) - \vec{1}, s o p_{(j - 1) | \vec{q} | + l_{\vec{q}} (\vec{ϵ})})}_{\vec{ϵ} \in \prod_{n = 1}^{h} \vec{q} [n]}

.

Figure 4. Weights between layers: This graph shows how to linearly process the feature extraction in the first layer. There are

N_{1}

features/filters whose dimensional vector is

\vec{f} = [\vec{f} [1], \vec{f} [2], \dots, \vec{f} [h]]

and

| \vec{f} | = \prod_{n = 1}^{h} \vec{f} [n]

. In the graph

\vec{q} = [\vec{q} [1], \vec{q} [2], \dots, \vec{q} [h]]

and

| \vec{q} | \equiv \prod_{n = 1}^{h} \vec{q} [n] = \prod_{n = 1}^{h} (⌊ \frac{\vec{τ} [n] - \vec{f} [n]}{\vec{σ} [n]} ⌋ + 1)

. The edges associated with

w_{i j}

are

[[w_{i j}]] = {l_{\vec{τ}} (\vec{ϵ} + l_{\vec{f}}^{- 1} (i) - \vec{1}, s o p_{(j - 1) | \vec{q} | + l_{\vec{q}} (\vec{ϵ})})}_{\vec{ϵ} \in \prod_{n = 1}^{h} \vec{q} [n]}

.

Figure 5. Linearized CNN: This graph shows how to linearly process a h-sectioned input tensor (or a to-be-cropped tensor)

T_{\vec{τ}} (\equiv {T_{\vec{τ}}^{\vec{ϵ}}}_{\vec{ϵ} \in \prod_{n = 1}^{h} \bar{\vec{τ} [n]}})

. The original setting is shown in Figure 1. Then, tensor

T_{\vec{τ}}

is linearized (or flattened/vectorized) via its positional indices (Lemma 2) and assigned values. Then, there are

\prod_{n = 1}^{h} \vec{f} [n] \cdot N_{1}

weights to be optimized. Let us use

{ω_{i j}}_{1 \leq i \leq \prod_{n = 1}^{h} \vec{f} [n]}^{1 \leq j \leq N_{1}}

to denote the set of weights. For neurons connecting to

w_{i j}

, the set should be

{{\vec{ϵ} + l_{\vec{f}}^{- 1} (i) - \vec{1}} : i \in {1, 2, \dots, \prod_{n = 1}^{h} \vec{f} [n]}}_{\vec{ϵ} \in \prod_{n = 1}^{h} \bar{\vec{q} [n]}}

, and for neurons connecting from

w_{i j}

, the set should be

{s o p_{(j - 1) \cdot | \vec{q} | + 1}, s o p_{(j - 1) \cdot | \vec{q} | + 2}, \dots, s o p_{(j - 1) \cdot | \vec{q} | + (\vec{q} - 1)}, s o p_{j \cdot | \vec{q} |}}

, as shown in Figure 4. Hence, each

w_{i j}

is associated with

| \vec{q} |

pairs of edges.

Figure 5. Linearized CNN: This graph shows how to linearly process a h-sectioned input tensor (or a to-be-cropped tensor)

T_{\vec{τ}} (\equiv {T_{\vec{τ}}^{\vec{ϵ}}}_{\vec{ϵ} \in \prod_{n = 1}^{h} \bar{\vec{τ} [n]}})

. The original setting is shown in Figure 1. Then, tensor

T_{\vec{τ}}

is linearized (or flattened/vectorized) via its positional indices (Lemma 2) and assigned values. Then, there are

\prod_{n = 1}^{h} \vec{f} [n] \cdot N_{1}

weights to be optimized. Let us use

{ω_{i j}}_{1 \leq i \leq \prod_{n = 1}^{h} \vec{f} [n]}^{1 \leq j \leq N_{1}}

to denote the set of weights. For neurons connecting to

w_{i j}

, the set should be

{{\vec{ϵ} + l_{\vec{f}}^{- 1} (i) - \vec{1}} : i \in {1, 2, \dots, \prod_{n = 1}^{h} \vec{f} [n]}}_{\vec{ϵ} \in \prod_{n = 1}^{h} \bar{\vec{q} [n]}}

, and for neurons connecting from

w_{i j}

, the set should be

{s o p_{(j - 1) \cdot | \vec{q} | + 1}, s o p_{(j - 1) \cdot | \vec{q} | + 2}, \dots, s o p_{(j - 1) \cdot | \vec{q} | + (\vec{q} - 1)}, s o p_{j \cdot | \vec{q} |}}

, as shown in Figure 4. Hence, each

w_{i j}

is associated with

| \vec{q} |

pairs of edges.

Figure 6. Convolution with respect to sCNN and lCNN: Let us regard the alignment of this picture by a 5-by-4 grid. The image at position (1,1) is the original input tensor (a 3D tensor corresponding to RGB). The images at positions (1,2), (1,3), and (1,4) are the extracted R-image (a 2D tensor), G-image (a 2D tensor), and B-image (a 2D tensor), respectively. The second and third rows are the images for the 4 kernels. The 4th row reveals the sCNN convoluted results with respect to the 4 kernels, respectively. The 5th row reveals the lCNN convoluted results with respect to the 4 kernels, respectively.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, R.-M. A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks. Mathematics 2025, 13, 2903. https://doi.org/10.3390/math13172903

AMA Style

Chen R-M. A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks. Mathematics. 2025; 13(17):2903. https://doi.org/10.3390/math13172903

Chicago/Turabian Style

Chen, Ray-Ming. 2025. "A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks" Mathematics 13, no. 17: 2903. https://doi.org/10.3390/math13172903

APA Style

Chen, R.-M. (2025). A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks. Mathematics, 13(17), 2903. https://doi.org/10.3390/math13172903

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks

Abstract

1. Introduction

2. Some Basic Notations

3. Basic Definitions and Properties

4. Theoretical Settings

5. Theories and Methods

6. Illustrative Example

7. Conclusions

Limitations and Future Work

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Calculation of Quotient Tensor $\vec{q}$ and ${{\vec{q}}_{n}}_{n = 1}^{N}$

Appendix A.2. Some Knowledge About Neural Networks

Appendix A.3. Forward Propagation/Induction for ANN

Appendix A.4. Backward Propagation/Induction for ANN

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks

Abstract

1. Introduction

2. Some Basic Notations

3. Basic Definitions and Properties

4. Theoretical Settings

5. Theories and Methods

6. Illustrative Example

7. Conclusions

Limitations and Future Work

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Calculation of Quotient Tensor q → and { q → n } n = 1 N

Appendix A.2. Some Knowledge About Neural Networks

Appendix A.3. Forward Propagation/Induction for ANN

Appendix A.4. Backward Propagation/Induction for ANN

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A.1. Calculation of Quotient Tensor $\vec{q}$ and ${{\vec{q}}_{n}}_{n = 1}^{N}$