Next Article in Journal
New Insights into the Multifractal Formalism of Branching Random Walks on Galton–Watson Tree
Previous Article in Journal
Hybrid Entropy-Based Metrics for k-Hop Environment Analysis in Complex Networks
Previous Article in Special Issue
Two-Stage Video Violence Detection Framework Using GMFlow and CBAM-Enhanced ResNet3D
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks

1
Department of Mathematical Sciences, College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou 325060, China
2
Department of Mathematical Sciences, College of Science, Mathematics and Technology, Kean University, 1000 Morris Avenue, Union, NJ 07083, USA
Mathematics 2025, 13(17), 2903; https://doi.org/10.3390/math13172903
Submission received: 2 August 2025 / Revised: 2 September 2025 / Accepted: 5 September 2025 / Published: 8 September 2025

Abstract

Convolutional Neural Networks (CNNs) are a highly used machine learning architecture in various fields. Typical descriptions of CNNs are based on low-dimension and tensor representations in the feature extraction part. In this article, we extend the setting of CNNs to any arbitrary dimension and linearize the whole setting via the typical layers of neurons. In essence, a partial and a full network construct the entire process of a standard CNN, with the partial network being used to linearize the feature extraction. By doing so, we link the tensor-style representation of CNNs with the pure network representation. The outcomes serve two main purposes: to relate CNNs with other machine learning frameworks and to facilitate intuitive representations.

1. Introduction

CNNs, based on the concept of inner product and sliding windows, serve as a vital analyzing tool in data science and artificial intelligence [1,2,3]. They are not only theoretically justified but also practically accepted: radiology [4], image processing [5,6,7], protein analysis [8], computer vision [9], signal processing [10], movement recognition [11], cracked surface detection [12], plant disease detection [13], handwritten character classification [14], etc. Most of the presentation of CNNs is described by graphs [15,16]. This causes two main issues: (1) determining how to unify CNNs with ANNs under the framework—in the literature, they are not unified under the same setting, i.e., neurons and weights, but an informal description of the feature extraction and learning, together with ANNs [17]; (2) how to extend CNNs (typically applied to 2D or 3D objects) to any arbitrary n-dimensional objects. The typical settings in the literature about CNNs are focused on 2D or 3D object learning, such as computer vision [18] and object/image status classification [19,20]. Despite the usual textbooks or applications (2D/3D image recognition/classification, etc.), standard CNNs can also be applied to the high-dimensional data. Indeed, they could also easily be applied via the concepts of high-dimensional tensors for both the input tensor/object and the kernel/filter tensors [21,22]. The main idea of this article is not to show how to apply CNNs to high-dimensional data but to reveal how to convert a typical/standard high-dimensional CNN to a linearized high-dimensional CNN. The main purpose is to yield a coherent network under the same framework. Though these papers mention the need to consider multiple dimensions, or more than three dimensions, they do not provide a theoretical framework to accommodate such a need. To overcome these two limitations, we unify CNNs and ANNs by linearizing the CNN and then combining the systems into one system via partial and full networks, and we extend the CNN to any arbitrary dimensions via tensor power and tensor product. This unified high-dimensional CNN, together with the ANN system, provides both theoretical and practical advantages, since all the objects are treated as one rather than split into various channels. We divide this article into various sections: from a basic introduction of background knowledge, notations, and related definitions to the derivations of theoretical settings and theories and an illustrative example to validate the theory and the practice.

2. Some Basic Notations

Let N denote the set of all the natural numbers. Let R denote the set of all the real numbers. Let R R { + , } denote the set of all the real numbers, including infinities. α and α are the floor and ceiling functions evaluated at the real value α R , respectively; in particular, = 0 . For convenience, let us simply define α 0 = 0 for all α R 0 + , the set of all the non-negative real numbers. Let n ¯ denote the set { 1 , 2 , , n } for any arbitrary n N . Let n denote the vector of [ 1 , 2 , , n ] . Let the Cartesian product set n ¯ 1 × n ¯ 2 denote the set { ( p , q ) : p n ¯ 1 , q n ¯ 2 } and ( m 1 : m 2 ¯ ) × ( m 3 : m 4 ¯ ) denote the set { ( p , q ) : m 1 p m 2 , m 3 q m 4 } for any n 1 , n 2 , m 1 , m 2 , m 3 , m 4 N . Let the n-ary Cartesian product j = 1 N n ¯ j denote the set { ( i 1 , i 2 , , i N ) : i j n ¯ j for all 1 j N } . If S is a set, we use | S | to denote the size (cardinality) of the set S. Let i N k , the k-ary Cartesian product over sets N , be an n-ary column vector. Let i [ m ] (or i m ) denote the mth element in the vector i . Let | i | denote the length of vector i . Let N N be arbitrary. Let n 1 , n 2 , , n N N be arbitrary. Suppose A is an arbitrary k-dimensional tensor after all the preliminary processing of the original object (for example, the padding of an input text or image), or A could be identified by a function f A : j = 1 k n ¯ j R D (or A f A : j = 1 k n ¯ j R D ), where D = j = 1 k n j . Let σ j = [ σ 1 j , σ 2 j , , σ k j ] N k denote the stride vector, where σ i j denotes the number of strides taken in the ith dimension at the jth convolutional layer. Let f j [ f 1 j , f 2 j , , f k j ] -sectional tensor F i , j denotes the ith feature/filter in the jth layer. Let F j = { F i , j } i = 1 m j , where m j denotes the number of filters/features in layer j.

3. Basic Definitions and Properties

Definition 1. 
We use T τ to denote a tensor T with sectional/directional vector τ , i.e., T τ : k = 1 s τ [ k ] ¯ R , where s = | τ | is the number of sections (or directions) of T . If S k = 1 s τ [ k ] ¯ , we use T τ S to denote its partial function, i.e., T τ S ( γ ) : = T τ ( γ ) . Furthermore, T τ ( S ) = { T τ ( γ ) : γ S } .
Remark 1. 
In most of the cases, kernels are representable by a product set (or tensor product). But in some cases, we might need to consider the irregular kernels, i.e., those that could not be represented directly by product sets. If that is the case, in our setting, we could simply pad the undefined cell by 0. For example, a kernel κ : { ( 1 , 2 ) , ( 2 , 1 ) } R could be extended to a product set κ + : { ( 1 , 1 ) , ( 1 , 2 ) , ( 2 , 1 ) , ( 2 , 2 ) } in which κ + ( 1 , 1 ) = κ + ( 2 , 2 ) = 0 ; κ + ( 1 , 2 ) : = κ ( 1 , 2 ) ; κ + ( 2 , 1 ) : = κ ( 2 , 1 ) . Such a setting could rescue us from redesigning the setting/modeling from scratch. This leads to another question: Why, or under which circumstance, shall we tend to adopt irregular kernels? An obvious one is that even the input data is not representable by a product set or a tensor. If this is the case, we could apply the same technique to pad the undefined cells by 0. Such accommodation suits our setting and is much more consistent with the settings of standard CNNs.
Definition 2. 
Let σ denote a stride vector whose elements indicate the strides with respect to each section (or direction).
Definition 3. 
Let F f denote a filter (tensor), or kernel, where f is its sectional vector, i.e., F f : k = 1 s f [ k ] ¯ R . If we need to denote the ith filter in the jth layer, we use the notation F f i , j .
Remark 2. 
For each kernel F i , j (the ith kernel in the jth layer), we could associate it with a stride vector σ i , j . This is based on the assumption that a uniform stride is applicable and appropriate for the sliding and inner product operation. In some cases, we might encounter non-uniform strides [23]. A much more generalized setting is where the strides (or the positions where the inner product takes place) are determined by a set of positional vectors (for example, such positions are randomly chosen). Then, this case is beyond our setting, since our setting is based on the standard CNN in which the strides normally are assumed to be uniform. In order not to reformulate our setting, we could add another masking layer after the feature maps [24]. Therefore, all the stride vectors consist of 1 ( 1 , 1 , , 1 ) . Then, each feature map acts on the mask consisting of all values of 1 if the feature map is activated and 0 if the feature map is not activated. After this masking layer, in the pooling layer, one gets rid of all the values of 0 and their associated positions. The remaining values are linearized and fed into the ANN part.
Remark 3. 
Another interesting aspect of feature extraction is the dilated convolution neural network [25,26]. Since normally the dilated parts are padded by 0, the setting is still comparable with our setting regarding kernels. This is because the parameters to be learned lie in the kernels. To accommodate our setting with the dilated kernel is to fix the padded parameters and to learn the other non-padded parts in the kernel.
In order to present the cropped sub-tensors of T τ , given the filter F f and stride vector σ , we use a power tensor to collect such sub-tensors as follows:
Definition 4 
(power tensor one). Let P f σ ( T τ ) = { T f γ } γ k = 1 s τ [ k ] f [ k ] σ [ k ] + 1 ¯ , where T f γ = T τ ( k = 1 s ( 1 + ( γ 1 ) σ ) [ k ] : ( ( γ 1 ) σ + f ) [ k ] ¯ ) , where ⊙ indicates the Hadamard product and 1 = [ 1 , 1 , , 1 ] .
Remark 4. 
In the definition, one observes that the floor function might contain a denominator of 0; this is normally defined by extended real numbers. As long as the numerator is non-negative, we could regard the evaluated value of the floor function as 0. As for a non-negative numerator, this is normally true by adding padding into the CNN systems. The same argument goes for the following definitions containing denominator 0.
Definition 5 
(power tensor two). Let P p σ ( A γ ) = { A p η } η n = 1 s γ [ n ] p [ n ] σ [ n ] + 1 ¯ , where A p η = A γ ( n = 1 s ( 1 + ( η 1 ) σ ) [ n ] : ( ( η 1 ) σ + f ) [ n ] ¯ ) .
Definition 6 
(maximum pooling function). Define
μ γ : n = 1 s γ [ n ] p [ n ] σ [ n ] + 1 ¯ n = 1 s γ [ n ] ¯
by
μ γ ( θ ) = argmax { A γ ( θ ) : θ n = 1 s ( 1 + ( γ 1 ) σ ) [ n ] : ( ( γ 1 ) σ + p ) [ n ] ¯ } .
for all γ n = 1 s τ [ n ] f [ n ] σ [ n ] + 1 ¯ .
For a standard CNN, one could consult Figure 1. The figure shows a complete data set training. Suppose there are B training data points, i.e., 1 j B . The process consists mainly of three operators: convolution, activation function a, and pooling function p. The convolution operator (or inner product) acts on the input tensor and the filters/features. F f n [ m ; j ] ( 1 n N denotes the nth filter at the mth recursive step for the jth input, and f is the dimensional vector of the filter). The activation function is denoted by a, while the pooling function is denoted by p. In the whole process, we insert an auxiliary operator P v u (named a power tensor operator), where u describes the strides assigned for all the dimensions and v is the dimensional vector of a truncated tensor (the tensor used to truncate its preceding tensor). The floor function is used to calculate the dimensional vector of the truncated tensors. Finally, the resulting (column) vector X m j comes from flattening and stacking all the pooled tensors at the mth recursive step for the jth input. In this diagram, we do not directly take padding into consideration. It could be done by a slight tweaking of the setting. In addition, F f n [ 1 , j + 1 ] is defined to be F f n [ N , j ] , where (suppose we fix the number) N is the total number of recursive steps for the backward induction. k = 1 s τ [ k ] f [ k ] ] σ [ k ] + 1 ¯ k = 1 s q [ k ] ¯ , where q [ k ] = τ [ k ] f [ k ] σ [ k ] + 1 . Moreover, q j [ k ] = q [ k ] p j [ k ] σ j [ k ] + 1 and k = 1 s q [ k ] p j [ k ] ] σ j [ k ] + 1 ¯ k = 1 s q j [ k ] ¯ .
A high-dimension tensor T τ is fed into the CNN system. The cropped sub-tensors
P f σ ( T τ ) = { T f γ } γ k = 1 s τ [ k ] f [ k ] σ [ k ] + 1 ¯
interact with the sequence of filters/kernels/tensors { F f n } n = 1 N to yield a sequence of feature maps/inner products
H γ n T τ γ , F f n : γ k = 1 s τ [ k ] f [ k ] σ [ k ] + 1 ¯ n = 1 N .
After applying the activation function a on the feature maps, one has
A γ n a ( T τ γ , F f n ) : γ k = 1 s τ [ k ] f [ k ] σ [ k ] + 1 ¯ n = 1 N .
Then, one forms their sub-tensors via a sequence of power tensors { P p n σ n } n = 1 N and yields
P p n σ n ( A γ n ) P p n σ n a ( T τ γ , F f n ) : γ k = 1 s τ [ k ] p n [ k ] σ n [ k ] + 1 ¯ n = 1 N
A p n γ , η : η k = 1 s γ [ k ] p n [ k ] σ n [ k ] + 1 ¯ : γ k = 1 s τ [ k ] p [ k ] σ [ k ] + 1 ¯ n = 1 N .
Next, one applies pooling functions, p, on this set to form
p A p n γ , η : η k = 1 s γ [ k ] p n [ k ] σ n [ k ] + 1 ¯ : γ k = 1 s τ [ k ] p [ k ] σ [ k ] + 1 ¯ n = 1 N .
Lastly, one flattens them into one column vector X with respect to γ and n via f and f ˜ :
f ˜ ( { f p A p n γ , η : η k = 1 s γ [ k ] p n [ k ] σ n [ k ] + 1 ¯ : γ k = 1 s τ [ k ] p [ k ] σ [ k ] + 1 ¯ } n = 1 N ) .
= f ˜ C γ n : γ k = 1 s τ [ k ] p [ k ] σ [ k ] + 1 ¯ n = 1 N = X .
This could also be represented by the diagram presented in Figure 2.
In order to perform the backward propagation/pass/induction, we take the max pooling function p and keep track of all their indices as follows (fix n):
μ n ( γ ) : = a r g m a x A p n γ , η : η k = 1 s γ [ k ] p n [ k ] σ n [ k ] + 1 ¯ ,
for all γ k = 1 s τ [ k ] p [ k ] σ [ k ] + 1 ¯ and for all 1 n N .
Example 1. 
Suppose T τ = [ t i , j ] 1 i 6 1 j 7 is a 6-by-7 matrix (tensor), where τ = [ 6 , 7 ] . Suppose the stride vector σ = [ 1 , 2 ] and f = [ 2 , 3 ] . Then, s = 2 and P f σ ( T τ ) = { T f γ } γ 5 ¯ × 3 ¯ = T τ ( 1 : 2 ¯ × 1 : 3 ¯ ) T τ ( 1 : 2 ¯ × 3 : 5 ¯ ) T τ ( 1 : 2 ¯ × 5 : 7 ¯ ) T τ ( 2 : 3 ¯ × 1 : 3 ¯ ) T τ ( 2 : 3 ¯ × 3 : 5 ¯ ) T τ ( 2 : 3 ¯ × 5 : 7 ¯ ) T τ ( 3 : 4 ¯ × 1 : 3 ¯ ) T τ ( 3 : 4 ¯ × 3 : 5 ¯ ) T τ ( 3 : 4 ¯ × 5 : 7 ¯ ) T τ ( 4 : 5 ¯ × 1 : 3 ¯ ) T τ ( 4 : 5 ¯ × 3 : 5 ¯ ) T τ ( 4 : 5 ¯ × 5 : 7 ¯ ) T τ ( 5 : 6 ¯ × 1 : 3 ¯ ) T τ ( 5 : 6 ¯ × 3 : 5 ¯ ) T τ ( 5 : 6 ¯ × 5 : 7 ¯ ) . The convolution between T and some feature matrix is shown in Figure 3.
As for the procedure of linearizing the power tensor, one starts from linearizing each T D I whose size is p = 1 h f p j . Hence, the total neurons in this layer should be ( p = 1 h f p j ) · ( p = 1 h q p j ) , i.e., the power tensor is stacked by the sequence { T D L 1 ( n ) } n = 1 M , where M = p = 1 h q p j and each T D L 1 ( n ) is a column vector with the length p = 1 h n p j .

4. Theoretical Settings

As for the procedure of linearizing the power tensor, one starts by linearizing each T D I whose size is p = 1 h f p j . Hence, the total neurons in this layer should be ( p = 1 h f p j ) · ( p = 1 h q p j ) , i.e., the power tensor is stacked by the sequence { T D L 1 ( n ) } n = 1 M , where M = p = 1 h q p j and each T D L 1 ( n ) is a column vector with the length p = 1 h n p j .
Definition 7 
( v ^ w ). For any column vector v and w , define v 1 v 2 v p ^ w 1 w 2 w q : = ( v 1 , w 1 ) ( v 1 , w 2 ) ( v 1 , w q ) ( v 2 , w 1 ) ( v 2 , w 2 ) ( v 2 , w q ) ( v p , w 1 ) ( v p , w 2 ) ( v p , w q ) and v 1 ^ v 2 ^ ^ v n : = { v 1 [ t 1 ] } t 1 = 1 | v 1 | × { v 2 [ t 2 ] } t 2 = 1 | v 2 | × × { v n [ t n ] } t n = 1 | v n | .
Definition 8. 
Define the positional window/vector in the ith dimension by I i .
Claim 1 
(dimensional multipliers). If d i m ( T ) = [ τ 1 , τ 2 , , τ h ] , d i m ( F j ) = [ f 1 j , f 2 j , , f h , j ] and the stride vector is σ j = [ σ 1 j , σ 2 j , , σ h , j ] , then q i j = τ i f i j σ i j + 1 , i.e., Q ( T / F j ) = [ τ 1 f 1 j σ 1 j + 1 , τ 2 f 2 j σ 2 j + 1 , , τ h f h , j σ h , j + 1 ] .
Proof. 
Based on the concept of arithmetic sequence and the given conditions, for each 1 i h , there exists a maximum e i N such that 1 + e i · σ i j + ( f i j 1 ) τ i , i.e., e i τ i f i j σ i j . Since q i j = e i + 1 , the result follows immediately via the representation of the floor function. □
Definition 9. 
Let q i j f i j denote the column vector [ q i j , q i j + 1 , q i j + 2 , , q i j + f i j 1 ] .
Definition 10. 
The set of all the positional indices of T / F j (or Q ( T / F j ) ) is defined by i = 1 h q ¯ i , j
Lemma 1 
( I -index tensor). The I -index tensor of T is T [ I ] = T D I (a partial function), where the partial domain D I = 1 + ( i 1 1 ) · σ 1 j f 1 j ^ 1 + ( i 2 1 ) · σ 2 j f 2 j ^ ^ 1 + ( i h 1 ) · σ h j f h j , where I = [ i 1 , i 2 , , i h ] q ¯ 1 j × q ¯ 2 j q ¯ h , j .
Proof. 
Based on Definitions 7–10 and Claim 1, the results follow immediately. □
Claim 2 
(linearity for matrix). The function l : n ¯ 1 × n ¯ 2 n 1 · n 2 ¯ defined by l ( i , j ) : = i + ( j 1 ) · n 1 is a bijective function for any given n 1 , n 2 N .
Proof. 
Since | n ¯ 1 × n ¯ 2 | = | n 1 · n 2 ¯ | , it suffices to show that l is injective. Suppose l ( i , j ) = l ( i , j ) , or i + ( j 1 ) · n 1 = i + ( j 1 ) · n 1 , i.e., i i = ( j j ) · n 1 . Since n 1 < i i < n 1 , i.e., n 1 < ( j j ) · n 1 < n 1 , i.e., 1 < j j < 1 , one has j = j and thus i = i . □
Lemma 2. 
(linearity of spatial indices of a tensor) The function l : j = 1 M n ¯ j j = 1 M n j ¯ defined by l ( i 1 , i 2 , , i M ) : = i 1 + k = 2 M [ ( i k 1 ) · p = 1 k 1 n p ] is a bijective function.
Proof. 
We show this by mathematical induction. For M = 2 , it is shown to be true in Claim 2. Suppose M = n holds, i.e., for any arbitrary l ( i 1 , i 2 , , i n ) = l ( i 1 , i 2 , , i n ) , one has ( i 1 , i 2 , , i n ) = ( i 1 , i 2 , , i n ) . Next, we show M = n + 1 also holds. Suppose l ( i 1 , i 2 , , i n , i n + 1 ) = l ( i 1 , i 2 , , i n , i n + 1 ) . Then, i 1 + k = 2 n + 1 [ ( i k 1 ) · p = 1 k 1 n p ] = i 1 + k = 2 n + 1 [ ( i k 1 ) · p = 1 k 1 n p ] , i.e., i 1 i = n 1 · k = 2 n + 1 [ ( i k i k ) · p = 1 k 1 n p ] = 0 . Since 1 i 1 , i 1 < n 1 , one has i 1 = i 1 , i.e., k = 2 n + 1 [ ( i k 1 ) · p = 1 k 1 n p ] = k = 2 n + 1 [ ( i k 1 ) · p = 1 k 1 n p ] . Now, set h = k 1 . Then, h = 1 n [ ( i h + 1 1 ) · p = 1 h n p ] = h = 1 n [ ( i h + 1 1 ) · p = 1 h n p ] , i.e., l ( i 2 , , i n , i n + 1 ) = l ( i 2 , , i n , i n + 1 ) . Based on the mathematical assumption, one has i j = i j for all 2 j n + 1 . □
Lemma 2 is vital in converting a tensor-style representation of a CNN into a neuron-style representation of a CNN, as demonstrated in Corollaries 1–6.
Corollary 1 
(vectorization/linearization of the indices of an input tensor T τ ).  l T : k = 1 s τ [ k ] ¯ k = 1 s τ [ k ] ¯ is a bijective function if l T is defined in the same manner as l in Lemma 2.
Corollary 2. 
(vectorization/linearization of the indices of sub-tensors or power tensors of T τ ) l τ : j = 1 s τ [ j ] f [ j ] σ [ j ] + 1 ¯ | P f σ ( T τ ) | ¯ is a bijective function if l τ is defined in the same manner as l in Lemma 2.
Proof. 
If we take M = s and n j = τ [ j ] f [ j ] σ [ j ] + 1 , then the result follows immediately. □
Corollary 3 
(vectorization/linearization of the number of convolutional operations, or sum of product sop, between T τ and all the kernels in a given layer).  l s o p : | P f σ ( T τ ) | ¯ × N 1 ¯ | P f σ ( T τ ) | × N 1 ¯ is a bijective function if l s o p is defined in the manner of l in Lemma 2, where N 1 is the number of filters (see Figure 4).
Corollary 4 
(vectorization/linearization of the indices of a kernel/filter). l F : k = 1 s f [ k ] ¯ k = 1 s f [ k ] ¯ is a bijective function if l F is defined in the same manner as l in Lemma 2.
Corollary 5. 
l 1 ( n ) = ( t 1 mod n 1 , t 2 mod n 2 , , t N 1 mod n N 1 , t N mod n N ) , where, by applying the floor and ceiling functions,
  • t N = n p = 1 N 1 n p j ;
  • t N 1 = r N p = 1 N 2 n p j , where r N = n q ˜ N · p = 1 N 1 n p j and q ˜ N = n p = 1 N 1 n p j ;
  • t N 2 = r N 1 p = 1 N 3 n p j , where r N 1 = r N q ˜ N 1 · p = 1 N 2 n p j and q ˜ N 1 = r N p = 1 N 2 n p j . Repeat the whole process until
  • t 2 = r 3 p = 1 1 n p j , where r 3 = r 4 q ˜ 3 · p = 1 2 n p j and q ˜ 3 = r 4 p = 1 2 n p j ;
  • t 1 = r 2 , where r 2 = r 3 q ˜ 2 · n 1 j and q ˜ 2 = r 3 p = 1 1 n p j .
Remark 5. 
Since l τ , l τ , l s o p are defined over the sizes of the sections (or sectional vectors), any further expansion, such as zero-padding of T (the input object with sectional vector τ and zero-padding sectional vector τ ) will not alter its bijectivity, i.e., l τ , l τ , l s o p are all bijective (this could be easily shown via Lemma 2).
Corollary 6. 
For the ith filter in the layer (stage) j, or F i j , the filter could be linearized by F i j [ n ] for all 1 n p = 1 h f p j , i.e., in the jth layer (stage) related to the feature maps, there are m j · p = 1 h f p j linearized nodes whose values are decided by the presented results.
Proof. 
For each 1 i m j , F i j [ p = 1 k f ¯ p j ] = F i j [ l 1 ( p = 1 k f p j ) ] , by Lemma 2, the result follows immediately. □
Example 2. 
Suppose T is a 6-by-7 matrix. Suppose the stride vector σ j = [ σ 1 j , σ 2 j ] = [ 1 , 2 ] . S e c ( F j ) f j = [ f 1 j , f 2 j ] = [ 2 , 3 ] . Then, h = 2 , S e c ( T ) τ = [ τ 1 , τ 2 ] = [ 6 , 7 ] . Based on Claim 1, Q ( T / F j ) = [ q 1 j , q 2 j ] = [ τ 1 f 1 j σ 1 j + 1 , τ 2 f 2 j σ 2 j + 1 ] = [ 5 , 3 ] , I ( T / F j ) = 5 ¯ × 3 ¯ = { ( 1 , 1 ) , ( 1 , 2 ) , ( 1 , 3 ) , ( 2 , 1 ) , ( 2 , 2 ) , ( 2 , 3 ) , ( 3 , 1 ) , ( 3 , 2 ) , ( 3 , 3 ) , ( 4 , 1 ) , ( 4 , 2 ) , ( 4 , 3 ) , ( 5 , 1 ) , ( 5 , 2 ) , ( 5 , 3 ) } . Suppose I = [ 4 , 2 ] . Based on Lemma 1, D I = 1 + ( 4 1 ) · σ 1 j f 1 j ^ 1 + ( 2 1 ) · σ 2 j f 2 j = 4 2 ^ 3 3 = 4 5 ^ 3 4 5 = ( 4 , 3 ) ( 4 , 4 ) ( 4 , 5 ) ( 5 , 3 ) ( 5 , 4 ) ( 5 , 5 ) and thus T D I = T ( 4 , 4 ) T ( 4 , 5 ) T ( 4 , 6 ) T ( 5 , 4 ) T ( 5 , 5 ) T ( 5 , 6 ) . Hence, P f σ could be represented by P S e c ( F j ) T P f σ = [ T ( 1 + ( i 1 1 ) · σ 1 j 2 ^ 1 + ( i 2 1 ) · σ 2 j 3 ) ] ( i 1 , i 2 ) 5 ¯ × 3 ¯ = [ T ( i 2 ^ 2 · i 2 1 3 ) ] ( i 1 , i 2 ) 5 ¯ × 3 ¯ = T ( 1 2 ^ 1 3 ) T ( 1 2 ^ 3 3 ) T ( 1 2 ^ 5 3 ) T ( 2 2 ^ 1 3 ) T ( 2 2 ^ 3 3 ) T ( 2 2 ^ 5 3 ) T ( 3 2 ^ 1 3 ) T ( 3 2 ^ 3 3 ) T ( 3 2 ^ 5 3 ) T ( 4 2 ^ 1 3 ) T ( 4 2 ^ 3 3 ) T ( 5 2 ^ 5 3 ) T ( 5 2 ^ 1 3 ) T ( 5 2 ^ 3 3 ) T ( 5 2 ^ 5 3 ) .
Example 3. 
Suppose T is a 26-by-37-by-19-by-28 tensor. Suppose the stride vector σ j = [ σ 1 j , σ 2 j , σ 3 j , σ 4 j ] = [ 6 , 3 , 7 , 5 ] . S e c ( F j ) = [ f 1 j , f 2 j , f 3 j , f 4 j ] = [ 3 , 4 , 5 , 2 ] . Then, h = 4 , S e c ( T ) = [ τ 1 , τ 2 , τ 3 , τ 4 ] = [ 26 , 37 , 19 , 28 ] . Based on Claim 1, Q ( T / F j ) = [ q 1 j , q 2 j , q 3 j , q 4 j ] = [ τ 1 f 1 j σ 1 j + 1 , τ 2 f 2 j σ 2 j + 1 , τ 3 f 3 j σ 3 j + 1 , τ 4 f 4 j σ 4 j + 1 ] = [ 4 , 12 , 3 , 6 ] , Q ( T / F j ) = 4 ¯ × 12 ¯ × 3 ¯ × 6 ¯ . Suppose I = [ 4 , 7 , 2 , 5 ] . Based on Lemma 1, D I = 1 + ( i 1 1 ) · σ 1 j f 1 j ^ 1 + ( i 2 1 ) · σ 2 j f 2 j ^ 1 + ( i 3 1 ) · σ 3 j f 3 j ^ 1 + ( i 4 1 ) · σ 4 j f 4 j = 19 3 ^ 19 4 ^ 8 5 ^ 21 2 = 19 20 21 ^ 19 20 21 22 ^ 8 9 10 11 12 ^ 21 22 and thus T D I = T ( 19 3 ^ 19 4 ^ 8 5 ^ 21 2 ) . Hence, P f σ could be represented by P S e c ( F j ) T P f σ = [ T ( 6 · i 1 5 3 ^ 3 · i 2 2 4 ^ 7 · i 3 6 5 ^ 5 · i 4 4 2 ) ] I 4 ¯ × 12 ¯ × 3 ¯ × 6 ¯ , where I = [ i 1 , i 2 , i 3 , i 4 ] .
Definition 11. 
T / F i j = [ T D I , F i j ] I q ¯ 1 j × q ¯ 2 j q ¯ h , j , or a tensor consisting of the standard inner products between input tensors T D I and the feature tensor F j at stage (layer) j.
Now, each of these inner products is captured by the function sum of products, as we did for the fully connected network. With Definition 11 and Corollary 6, we could reach the conclusion that the total number of neurons in this layer is ( p = 1 h q p j ) · m j . Since we are going to look into the forward and backward propagations, we partition the indices of the tensors of the previous layer.

5. Theories and Methods

Claim 3. 
The number of neurons in the first layer is n = 1 s τ [ n ] under the linearization in Corollary 1, and the value associated with each neuronkis T ( l τ 1 ( k ) ) for all 1 k n = 1 s τ [ n ] , where l τ is defined in Corollary 1.
Claim 4. 
The number of neurons in the second layer is N 2 = | P f σ ( T τ ) | · N 1 , where N 1 is the number of filters/kernels, i.e., the label set for the neurons in the second layer is { s o p j } j = 1 N 2 = { s o p j } j = 1 K · N 1 , where K = | P f σ ( T τ ) | , where s o p j stands for ‘sum of product’ of thejth node in the referred layer (or in the standard CNN, it is thejth convoluted value).
Lemma 3. 
The indices from the first layer described in Claim 3 that links to the jth neuron in the second layer described in Claim 4 is the set l T k = 1 s ( 1 + ( γ 1 ) σ ) [ k ] : ( ( γ 1 ) σ + f ) [ k ] ¯ , where γ = l τ 1 ( ( l s o p 1 ( j ) ) 1 ) , and ( l s o p 1 ( j ) ) 1 = k , if l s o p 1 ( j ) = ( k , n ) , and where l T k = 1 s ( 1 + ( γ 1 ) σ ) [ k ] : ( ( γ 1 ) σ + f ) [ k ] ¯ l T ( v ) : v k = 1 s ( 1 + ( γ 1 ) σ ) [ k ] : ( ( γ 1 ) σ + f ) [ k ] ¯ .
Proof. 
Given an index j in the second layer, based on Corollary 3: l s o p : | P f σ ( T τ ) | ¯ × N 1 ¯ | P f σ ( T τ ) | × N 1 ¯ , one has ( k , n ) = l s o p 1 ( j ) , where k | P f σ ( T τ ) | . Then, based on Corrollary 2, l τ : j = 1 s τ [ j ] f [ j ] σ [ j ] + 1 ¯ | P f σ ( T τ ) | ¯ , one has l τ 1 ( k ) = γ , which, via Definition 4 and Corollary 1, links to the result. □
Theorem 1. 
s o p j = T τ ( l s o p 1 ( j ) 1 ) , F ( l s o p 1 ( j ) 2 ) , 1 , where l s o p 1 ( j ) 1 ( l s o p 1 ( j ) ) 1 and l s o p 1 ( j ) 2 ( l s o p 1 ( j ) ) 2 .
Proof. 
Based on Corollary 3, the result follows immediately. □
Theorem 2. 
ω i j = F f j 2 , 1 [ l τ 1 ( i ) ] , if i l T k = 1 s ( 1 + ( γ 1 ) σ ) [ k ] : ( ( γ 1 ) σ + f ) [ k ] ¯ , 0 , otherwise . where γ = l τ 1 ( ( l s o p 1 ( j ) ) 1 ) , for all 1 i k = 1 s τ [ k ] and 1 j N 2 | P f σ ( T τ ) | × N 1 ; j 2 = ( l F 1 ( j ) ) 2 ; F f i , j is defined in Definition 3.
Proof. 
Let us define a (weight) function ω : k = 1 s τ [ k ] ¯ × P f σ ( T τ ) | × N 1 ¯ R . Before that, let us define some settings. Since there are | q | = k = 1 s q [ k ] sub-tensors, or | P f σ | = | q | , and the indices for the γ th sub-tensor (the size of each sub-tensor is | f | ), based on the proof in Lemma 3, are k = 1 s ( 1 + ( γ 1 ) σ ) [ k ] : ( ( γ 1 ) σ + f ) [ k ] ¯ k = 1 s τ [ k ] ¯ . Now, we define w i j : = w ( i , j ) . If one observes j, they will find j is decided by a pair of values, say j = ( j 1 , j 2 ) . Since i could link to all the nodes in the next layer (intermediate output layer, or SOP layer), the value of ω ( i , j ) would need to involve j as well. Then, we have to check whether i is activated (convoluted) with respect to j, or more precisely, i is located in the γ th sub-tensor, where γ = l τ 1 ( ( l s o p 1 ( j ) ) 1 ) . In summary, neuron i is activated (convoluted) if i k = 1 s ( 1 + ( γ 1 ) σ ) [ k ] : ( ( γ 1 ) σ + f ) [ k ] ¯ and its activated value is defined to be ω i , j , where i will indicate the ith linearized element in the j 2 filter/kernel and is decided by l τ 1 ( i ) . Hence, w i j is defined as the statement. The whole argument could also refer to Figure 4 and Figure 5. □
Remark 6. 
{ ω i j } 1 i k = 1 s τ [ k ] 1 j N 2 is a set of weights between an input layer, whose input object/tensor is T τ , with k = 1 s τ [ k ] neurons and an intermediate output layer, whose intermediate output object/values is s o p (sum of product), with N 2 neurons. ω i j is the weight between the ith neuron in the input layer and the jth neuron in the intermediate output neuron.
Theorem 3. 
The CNN is represented by two networks: a partial network (or parameterized fully connected network) for feature extraction and a fully connected network.
Proof. 
The result follows immediately from Theorems 1 and 2 (see Figure 5). □

6. Illustrative Example

Since the main difference between the standard CNN (sCNN) and this linearized CNN (lCNN) lies in the encoding part, the calculation of the set of weights ω i , j is vital. In this section, we exploit R programming (version 4.5.1) to demonstrate the equivalence between the sCNN and lCNN, i.e., an illustrative example for Theorems 2 and 3 (see https://github.com/raymingchen/linearized-high-dimension-CNN.git, accessed on 1 September 2025). Suppose the input image is a cute puppy (https://unsplash.com/photos/brown-short-coated-dog-on-green-grass-field-dgr0ZDbeOqw, a 3D RGB tensor, accessed on 1 September 2025). To simplify the computation, we extract the R part of the input tensor, or puppyR (a 200-by-200 matrix, or (1,2) cell at Figure 6) as our input tensor. Suppose we consider four kernels (the first two are randomly generated, while the last two are some deterministic features):
F 1 , 1 =   40.01257 71.03048 90.265515 64.15175 0.6114631 28.811208 91.03395 22.87645 13.136873 15.21936 79.4672556 9.257303 92.50072 90.29318 9.735757 86.47676 41.9774183 7.608247 70.20737 35.65345 51.91615 95.19638 52.5290581 84.474351 32.68287 40.95929 28.365139 44.18469 83.1533521 13.688504 38.91981 91.67589 14.902744 97.88255 74.1104222 95.176985 ; F 2 , 1 =   1.125705 7.66730341 2.626473 2.0697611 3.793908 5.55425965 3.695838 4.7750866 5.391265 1.76043044 4.390247 0.1502723 2.170834 0.03233433 3.177388 6.2366755 ; F 3 , 1 = 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1   F 4 , 1 = 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 .
Then, we perform the standard CNN (or sCNN) and linearized CNN (or lCNN) for the convolution (this operation is sufficient to show whether the two representations are equivalent). The convoluted results based on sCNN and lCNN are shown in the 4th and 5th rows in Figure 6. The stride vector is fixed at σ = ( 3 , 5 ) . Based on these, one finds that the number of power tensors (or sub-tensors) with respect to F 1 , 1 , F 2 , 1 , F 3 , 1 , F 4 , 1 are 65 39 = 2535 , 66 40 = 2640 , 66 40 = 2640 , 66 40 = 2640 , respectively. For the linearized CNN, there are 200 200 = 40,000 nodes in the first/input layer, and there are 2535 + 2640 + 2640 + 2640 = 10,455 nodes in the second layer (intermediate output, or sop layer). Then, we construct (based on Theorem 4) the weights linking layer one and layer two by a 40,000-by-10,455 weight matrix W. Then, the values associated with the nodes of the 2nd layer are computed by a 10,455-by-1 column vector s o p = W t P l , where t denotes the transpose operator, and P l is the linearized 40,000-by-1 column vector of puppyR. In order to show the equivalence of sCNN and lCNN, we have to reshape s o p into four matrices: a 65-by-39 matrix M A 1 , and three 66-by-40 M A 2 , M A 3 , M A 4 matrices. The feature map of sCNN and s o p of lCNN are identical in the convoluted operation. Since the operations (forward/backward propagation) are also identical regardless of whether it is an sCNN or lCNN, we have demonstrated the equivalence between the two representations.

7. Conclusions

We have reformulated the typical CNN architecture via two sets of networks: (1) a parameterized fully connected network and (2) an (unparameterized) fully connected network (see Figure A1 and Figure A2). This representation will directly link the relation between the CNN and ANN (see Figure 2). Indeed, the CNN could be regarded as a special case of a (parameterized) ANN. In addition, it also offers a very intuitive representation of the transforming process. The main contribution of this work is to incorporate the typical two stages of CNNs, feature extraction and ANN, into one framework. The technique part is the linearization and the high-dimension extension.

Limitations and Future Work

There are some limitations to this setting; for example, we have not conducted any other studies on some newly developed CNNs [16,17] or discussed stacked CNNs [18,19], dense networks [19], or other variants, even though the settings are similar. In the future, we will incorporate high-dimensional CNNs into high-dimensional ANNs and test their efficacy and efficiency if the architecture is unified. We could also study other variants, such as stacked neural networks [27,28,29]. Another drawback for this manuscript is that we do not provide an experiment to implement the framework devised in this article for the following reasons:
  • This is purely a theoretical framework for incorporating and expanding existing theories and practices, and its efficacy is guaranteed by the minimized error via forward/backward propagation.
  • Our models could be easily downgraded to 1D, 2D, and 3D data or CNN architectures, and those architectures are always proven to be efficient and faithful.
  • For higher dimensions, for example, each input data point is a pair of pictures, and this could be represented by a set if two 3D products, or six dimensions. Therefore, theoretically, it is not different from the usual practice, but with significant computational resources—this is beyond our computational capacity. For future research, if a team is capable of implementing the whole algorithm, then it could implement such mechanisms.

Funding

This research was funded by the Internal (Faculty/Staff) Start-Up Research Grant of Wenzhou-Kean University (Project No. ISRG2023029).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Appendix A.1. Calculation of Quotient Tensor q and { q n } n = 1 N

This part shows the rationale of Definitions 4 and 5. Based on the formula of arithmetic progresion, for each k, one has the relation 1 + ( n k 1 ) · σ [ k ] + ( f [ k ] 1 ) τ [ k ] , where n k is regarded as a quotient in the direction k, i.e., 1 + ( q [ k ] 1 ) σ [ k ] + ( f [ k ] 1 ) τ [ k ] , i.e., ( q [ k ] 1 ) σ [ k ] τ [ k ] f [ k ] , i.e., q [ k ] τ [ k ] f [ k ] σ [ k ] + 1 , i.e., q [ k ] = τ [ k ] f [ k ] σ [ k ] + 1 . For each particular γ k = 1 s q [ k ] ¯ sub-tensor T f γ , its location along k direction/section (or L o c ( T f γ [ k ] ) ) lies between 1 + ( γ [ k ] 1 ) · σ [ k ] and 1 + ( γ [ k ] 1 ) · σ [ k ] + ( f [ k ] 1 ) = ( γ [ k ] 1 ) · σ [ k ] + f [ k ] , where 1 k s = | τ | . Now, taking all the k values into consideration, one has the locations/indices of T f γ (or L o c ( T f γ ) ) 1 + ( γ 1 ) σ L o c ( T f γ ) ( γ 1 ) σ + f . By the same analogy, one computes the value of each q n .

Appendix A.2. Some Knowledge About Neural Networks

Definition A1. 
Suppose A [ a i j ] is an m-by-n matrix and B is an n-by-1 matrix (vector). Define an m-by-n matrix A B : = [ c i j ] , where c i j = a i j · b j 1 .
Definition A2. 
Suppose A is an n-by-1 matrix and B [ b i j ] is an m-by-n matrix (vector). Define an m-by-n matrix A B : = [ c i j ] , where c i j = a i 1 · b i j .
For example, a 11 a 12 a 21 a 22 a 31 a 32 b 11 b 21 = a 11 · b 11 a 12 · b 21 a 21 · b 11 a 22 · b 21 a 31 · b 11 a 32 · b 21 and a 11 a 21 a 31 b 11 b 12 b 21 b 22 b 31 b 32 = a 11 · b 11 a 11 · b 12 a 21 · b 21 a 21 · b 22 a 31 · b 31 a 31 · b 32 .
Regarding its partial derivatives and its updated values ( A N N [ m ] denotes the updated forward neural network at step m):
  • (step 1) L ω i j l = s o p j l ω i j l × y ^ j s o p j l × L y ^ j and ω i j l [ m + 1 ] = ω i j l [ m ] η × L ω i j l | A N N [ m ] ;
  • (step 2) L ω i j l 1 = s o p j l 1 ω i j l 1 × σ j l 1 s o p j l 1 × s o p k l σ j l 1 1 × n l 1 k n l y ^ k s o p k l n l × 1 1 k n l L y ^ k n 1 × 1 1 k n l and ω i j l 1 [ m + 1 ] = ω i j l 1 [ m ] η × L ω i j l 1 | A N N [ m ] ;
  • (step 3) L ω i j l 2 = s o p j l 2 ω i j l 2 × σ j l 2 s o p j l 2 × s o p k l 1 σ j l 2 1 × n l 1 1 k n l 1 σ k l 1 s o p k l 1 n l 1 × 1 1 k n l 1 s o p p l σ k l 1 n l 1 × n l y ^ q s o p p l n l × 1 1 p n l L y ^ q n 1 × 1 1 q n l and ω i j l 2 [ m + 1 ] = ω i j l 2 [ m ] η × L ω i j l 2 | A N N [ m ] ;
  • (step 4) L ω i j l 3 = s o p j l 3 ω i j l 3 × σ j l 3 s o p j l 3 × s o p k l 2 σ j l 3 1 × n l 2 1 k n l 2 σ k l 2 s o p k l 2 n l 2 × 1 1 k n l 2 s o p p l 1 σ k l 2 1 × n l 1 1 p n l 1 σ q l 1 s o p p l 1 n l 1 × 1 1 p n l 1 s o p s l σ q l 1 n l 1 × n l y ^ s s o p s l n l × 1 1 p n l L y ^ s n l × 1 1 s n l and ω i j l 3 [ m + 1 ] = ω i j l 3 [ m ] η × L ω i j l 3 | A N N [ m ] ;
  • (step l 1 ) L ω i j 2 = s o p j 2 ω i j 2 × σ j 2 s o p j 2 × s o p k 3 3 σ j 2 1 × n 3 1 k 3 n 3 σ k 3 3 s o p k 3 3 n 3 × 1 1 k 3 n 3 h = 4 l s o p k h h σ k h 1 h 1 1 k h 1 n h 1 1 k h n h σ k h h s o p k h h 1 k h n h L y ^ i n 1 × 1 1 i n l and ω i j 2 [ m + 1 ] = ω i j 2 [ m ] η × L ω i j 2 | A N N [ m ] , where represents the recursive matrix multiplications.
  • (step l) L ω i j 1 = s o p j 1 ω i j 1 × σ j 1 s o p j 1 × s o p k 2 2 σ j 1 1 × n 2 1 k 2 n 2 σ k 2 2 s o p k 2 2 1 k 2 n 2 h = 3 l s o p k h h σ k h 1 h 1 1 k h 1 n h 1 1 k h n h σ k h h s o p k h h 1 k h n h L y ^ i n 1 × 1 1 i n l and ω i j 1 [ m + 1 ] = ω i j 1 [ m ] η × L ω i j 1 | A N N [ m ] .
This could be further represented in matrix form. The fundamental structure (pattern) is L ( W t ) = [ s o p j t ω i j t ] 1 i n t 1 1 j n t L s o p j t n t × 1 1 j n t .
We list some of the indices of size, but we omit the majority of them:
  • (step 1) L ( W l ) L ω i j l 1 i n l 1 1 j n l = s o p j l ω i j l 1 i n l 1 1 j n l y ^ j s o p j l n l × 1 1 j n l L y ^ j n l × 1 1 j n l n l × 1 and thus W l [ m + 1 ] = W l [ m ] η × L ( W l ) | A N N [ m ] ;
  • (step 2) L ( W l 1 ) L ω i j l 1 1 i n l 2 1 j n l 1 = s o p j l 1 ω i j l 1 { ( σ j l 1 s o p j l 1 s o p k l σ j l 1 1 j n l 1 1 k n l ) ( y ^ k s o p k l n l × 1 1 k n l L y ^ k n l × 1 1 k n l and W l 1 [ m + 1 ] = W l 1 [ m ] η × L ( W l 1 ) | A N N [ m ] ;
  • (step 3) L ( W l 2 ) L ω i j l 2 1 i n l 3 1 j n l 2 = s o p j l 2 ω i j l 2 { ( σ k l 2 s o p j l 2 s o p p l 1 σ k l 2 1 k n l 2 1 p n l 1 ) σ p l 1 s o p p l 1 n l 1 × 1 1 p n l 1 s o p q l σ p l 1 n l 1 × n l y ^ q s o p q l n l × 1 1 q n l L y ^ q n l × 1 1 q n l } and W l 2 [ m + 1 ] = W l 2 [ m ] η × L ( W l 2 ) | A N N [ m ] ;
  • (step 4) L ( W l 3 ) L ω i j l 3 1 i n l 4 1 j n l 3 = s o p j l 3 ω i j l 3 { σ v l 3 s o p j l 3 s o p k l 2 σ v l 3 1 × n l 2 1 k n l 2 σ k l 2 s o p k l 2 n l 2 × 1 1 k n l 2 s o p s l 1 σ k l 2 1 k n l 2 1 s n l 1 σ s l 1 s o p s l 1 n l 1 × 1 1 s n l 1 s o p q l σ s l 1 n l 1 × n l y ^ q s o p q l n l × 1 1 q n l L y ^ q n 1 × 1 1 q n l } and W l 3 [ m + 1 ] = W l 3 [ m ] η × L ( W l 3 ) | A N N [ m ] ;
  • (step l 1 ) L ( W 2 ) L ω k 1 k 2 2 1 k 1 n 1 1 k 2 n 2 = s o p k 2 2 ω k 1 k 2 2 1 k 1 n 1 1 k 2 n 2 { h = 2 l 1 ( σ k h h s o p k h h n h × 1 1 k h n h s o p k h + 1 h + 1 σ k h h 1 k h n h 1 k h + 1 n h + 1 σ k l l s o p k l l n l × 1 1 k l n l L y ^ k l n l × 1 1 k l n l and W 2 [ m + 1 ] = W 2 [ m ] η × L ( W 2 ) | A N N [ m ] , where represents the recursive matrix multiplications.
  • (step l) L ( W 1 ) L ω k 0 k 1 1 1 k 0 n 0 1 k 1 n 1 = s o p k 1 1 ω k 0 k 1 1 1 k 0 n 0 1 k 1 n 1 h = 1 l 1 σ k h h s o p k h h n h × 1 1 k h n h s o p k h + 1 h + 1 σ k h h 1 k h n h 1 k h + 1 n h + 1 y ^ k l l s o p k l l n l × 1 1 k l n l L y ^ k l l n l × 1 1 k l n l and W 1 [ m + 1 ] = W 1 [ m ] η × L ( W 1 ) | A N N [ m ] .
Definition A3. 
M ( t ) : = h = t l 1 s o p k h h σ k h h n h × 1 1 k h n h σ k h + 1 h + 1 s o p k h h 1 k h n h 1 k h + 1 n h + 1 , where σ k t t = y ^ k t t . Observe that each M [ t ] is an n t -by- n l matrix.
Claim A1. 
L ( W t ) = s o p j t ω i , j t 1 i n t 1 1 j n t M ( t ) y ^ k l l s o p k l l n l × 1 1 k l n l L y ^ k l l n l × 1 1 k l n l n t × 1
Alternatively, this could be represented in another form of matrix representation:
Definition A4. 
N ( t ) : = h = t l 1 s o p k h + 1 h σ k h h 1 k h n h 1 k h + 1 n h + 1 σ k h + 1 h + 1 s o p k h + 1 h + 1 n h + 1 × 1 1 k h + 1 n h + 1 , where σ k t t = y ^ k t t .
Observe that each N [ t ] is an n t -by- n l matrix.
Claim A2. 
L ( W t ) = s o p k t t ω k t 1 , k t t 1 k t 1 n t 1 1 k t n t σ j t s o p j t n t × 1 1 k t n t N ( t ) L σ k l l n l × 1 1 k l n l } n t × 1 , where σ k l l = y k l l .
Since either forward or backward propogation of the CNN engages only partial input neurons, we use an n 0 -lengthed characteristic column vector v to capture the connected neurons, i.e., its element is 1 if it is connected after performing the feature extraction and 0 if it is not connected. We could represent this partially connected gradient as ˜ L ( W 1 ) = ( 1 n 0 × 1 v ) L ( W 1 ) and thus W 1 [ m + 1 ] = W 1 [ m ] η × ˜ L ( W 1 ) , where m denotes the mth recursive backward propogation via the gradient descent method.

Appendix A.3. Forward Propagation/Induction for ANN

Figure A1. This is a fully connected ANN: The upper part is a matrix-form representation, while the lower part is an (neuron) element form of representation. sop stands for sum of product. m stands for the mth step for forward and backward propogation. In the graph, stage means layer. a stands for an activation function. b W t is the weight matrix at stage (layer) t. Y is the target output vector, while Y ^ is the predictive output vector.
Figure A1. This is a fully connected ANN: The upper part is a matrix-form representation, while the lower part is an (neuron) element form of representation. sop stands for sum of product. m stands for the mth step for forward and backward propogation. In the graph, stage means layer. a stands for an activation function. b W t is the weight matrix at stage (layer) t. Y is the target output vector, while Y ^ is the predictive output vector.
Mathematics 13 02903 g0a1

Appendix A.4. Backward Propagation/Induction for ANN

Figure A2. Backward Induction for GDM: Suppose A N N [ m ] is known. This diagram shows how to update W t [ m + 1 ] for all 1 t l .
Figure A2. Backward Induction for GDM: Suppose A N N [ m ] is known. This diagram shows how to update W t [ m + 1 ] for all 1 t l .
Mathematics 13 02903 g0a2

References

  1. Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef]
  2. Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
  3. Aloysius, N.; Geetha, M. A review on deep convolutional neural networks. In Proceedings of the 2017 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 6–8 April 2017; pp. 0588–0592. [Google Scholar] [CrossRef]
  4. Yamashita, R.; Nishio, M.; Do, R.K.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef]
  5. Jin, K.H.; McCann, M.T.; Froustey, E.; Unser, M. Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 2017, 26, 4509–4522. [Google Scholar] [CrossRef]
  6. Kumar, A.; Singh, C.; Sachan, M. A moment-based pooling approach in convolutional neural networks for breast cancer histopathology image classification. Neural Comput. Appl. 2025, 37, 1127–1156. [Google Scholar] [CrossRef]
  7. Sharma, A.; Vans, E.; Shigemizu, D.; Boroevich, K.A.; Tsunoda, T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 2019, 9, 11399. [Google Scholar] [CrossRef]
  8. Wardah, W.; Khan, M.G.M.; Sharma, A.; Rashid, M.A. Protein secondary structure prediction using neural networks and deep learning: A review. Comput. Biol. Chem. 2019, 81, 1–8. [Google Scholar] [CrossRef] [PubMed]
  9. Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
  10. Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D convolutional neural networks and applications: A survey. Mech. Syst. Signal Process. 2021, 151, 107398. [Google Scholar] [CrossRef]
  11. Wang, H.; Jin, X.; Zhang, T.; Wang, J. Convolutional neural network-based recognition method for volleyball movements. Heliyon 2023, 9, e18124. [Google Scholar] [CrossRef] [PubMed]
  12. Bhowmick, S.; Nagarajaiah, S.; Veeraraghavan, A. Vision and deep learning-based algorithms to detect and quantify cracks on concrete surfaces from uav videos. Sensors 2020, 20, 6299. [Google Scholar] [CrossRef]
  13. Hasan, R.I.; Yusuf, S.M.; Alzubaidi, L. Review of the state of the art of deep learning for plant diseases: A broad analysis and discussion. Plants 2020, 9, 1302. [Google Scholar] [CrossRef]
  14. Ciresan, D.C.; Meier, U.; Gambardella, L.M.; Schmidhuber, J. Convolutional neural network committees for handwritten character classification. In Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 1135–1139. [Google Scholar]
  15. Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. In Computer Vision—ECCV 2014, Proceedings of the ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2014; Volume 8689. [Google Scholar] [CrossRef]
  16. Taye, M.M. Theoretical Understanding of Convolutional Neural Network: Concepts, Architectures, Applications, Future Directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
  17. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
  18. Raj, R.; Kos, A. An Extensive Study of Convolutional Neural Networks: Applications in Computer Vision for Improved Robotics Perceptions. Sensors 2025, 25, 1033. [Google Scholar] [CrossRef]
  19. Muñoz-Silva, E.M.; Vasquez-Gomez, J.I.; Merlo-Zapata, C.A.; Antonio-Cruz, M. Binary damage classification of built heritage with a 3D neural network. npj Herit. Sci. 2025, 13, 124. [Google Scholar] [CrossRef]
  20. Nie, Y.; Chen, Y.; Guo, J.; Li, S.; Xiao, Y.; Gong, W.; Lan, R. An improved CNN model in image classification application on water turbidity. Sci. Rep. 2025, 15, 11264. [Google Scholar] [CrossRef]
  21. Choy, C.; Lee, J.; Ranftl, R.; Park, J.; Koltun, V. High-Dimensional Convolutional Networks for Geometric Pattern Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11224–11233. [Google Scholar] [CrossRef]
  22. Du, G.; Su, J.; Zhang, L.; Su, K.; Wang, X.; Teng, S.; Liu, P.X. A Multi-Dimensional Graph Convolution Network for EEG Emotion Recognition. IEEE Trans. Instrum. Meas. 2022, 71, 2518311. [Google Scholar] [CrossRef]
  23. Zaniolo, L.; Marques, O. On the use of variable stride in convolutional neural networks. Multimed. Tools Appl. 2020, 79, 13581–13598. [Google Scholar] [CrossRef]
  24. Liu, L.Y.; Liu, Y.; Zhu, H. Masked convolutional neural network for supervised learning problems. Stat 2020, 9, e290. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  25. Kudo, Y.; Aoki, Y. Dilated convolutions for image classification and object localization. In Proceedings of the 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017; pp. 452–455. [Google Scholar] [CrossRef]
  26. Hu, H.; Yu, C.; Zhou, Q.; Guan, Q.; Feng, H. HDConv: Heterogeneous kernel-based dilated convolutions. Neural Netw. 2024, 179, 106568. [Google Scholar] [CrossRef] [PubMed]
  27. Madichetty, S.; M, S. A stacked convolutional neural network for detecting the resource tweets during a disaster. Multimed. Tools Appl. 2021, 80, 3927–3949. [Google Scholar] [CrossRef] [PubMed]
  28. Kim, S.B.; Kang, J.H.; Cheon, M.; Kim, D.J.; Lee, B.C. Stacked neural network for predicting polygenic risk score. Sci. Rep. 2024, 14, 11632. [Google Scholar] [CrossRef] [PubMed]
  29. Rumala, D.J.; van Ooijen, P.; Rachmadi, R.F.; Sensusiati, A.D.; Purnama, I.K.E. Deep-Stacked Convolutional Neural Networks for Brain Abnormality Classification Based on MRI Images. J. Digit. Imaging 2023, 36, 1460–1479. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Standard high-dimension C N N [ m ] : This graph shows how to process a jth training input tensor T j ( τ ) j T τ whose dimensional vector (a vector that contains all the number of dimensions in each section) is τ via the C N N ; p is a pooling function.
Figure 1. Standard high-dimension C N N [ m ] : This graph shows how to process a jth training input tensor T j ( τ ) j T τ whose dimensional vector (a vector that contains all the number of dimensions in each section) is τ via the C N N ; p is a pooling function.
Mathematics 13 02903 g001
Figure 2. Forward network representation for the feature extraction in the CNN: This is a simplified matrix form of the CNN. p denotes a pooling function; sop denotes sum of product; a denotes an activation function; F denotes a feature matrix, and its subscript f denotes its tensor size. In addition, s o p i and A q i are both q -indexed tensors (or k = 1 s q [ k ] ¯ -indexed tensors).
Figure 2. Forward network representation for the feature extraction in the CNN: This is a simplified matrix form of the CNN. p denotes a pooling function; sop denotes sum of product; a denotes an activation function; F denotes a feature matrix, and its subscript f denotes its tensor size. In addition, s o p i and A q i are both q -indexed tensors (or k = 1 s q [ k ] ¯ -indexed tensors).
Mathematics 13 02903 g002
Figure 3. The connected paths from the power tensor of T τ to the output X : The upper part refers to the forward propagation, while the lower part shows the related paths for the backward propagation. We use n in the edges to represent F f n and F i j n to represent the ( i , j ) cell in the matrix F f n . t i j p q is the ( i , j ) element in T f p q . A , B represents an inner product of two matrices.
Figure 3. The connected paths from the power tensor of T τ to the output X : The upper part refers to the forward propagation, while the lower part shows the related paths for the backward propagation. We use n in the edges to represent F f n and F i j n to represent the ( i , j ) cell in the matrix F f n . t i j p q is the ( i , j ) element in T f p q . A , B represents an inner product of two matrices.
Mathematics 13 02903 g003
Figure 4. Weights between layers: This graph shows how to linearly process the feature extraction in the first layer. There are N 1 features/filters whose dimensional vector is f = [ f [ 1 ] , f [ 2 ] , , f [ h ] ] and | f | = n = 1 h f [ n ] . In the graph q = [ q [ 1 ] , q [ 2 ] , , q [ h ] ] and | q | n = 1 h q [ n ] = n = 1 h ( τ [ n ] f [ n ] σ [ n ] + 1 ) . The edges associated with w i j are [ [ w i j ] ] = { l τ ( ϵ + l f 1 ( i ) 1 , s o p ( j 1 ) | q | + l q ( ϵ ) ) } ϵ n = 1 h q [ n ] .
Figure 4. Weights between layers: This graph shows how to linearly process the feature extraction in the first layer. There are N 1 features/filters whose dimensional vector is f = [ f [ 1 ] , f [ 2 ] , , f [ h ] ] and | f | = n = 1 h f [ n ] . In the graph q = [ q [ 1 ] , q [ 2 ] , , q [ h ] ] and | q | n = 1 h q [ n ] = n = 1 h ( τ [ n ] f [ n ] σ [ n ] + 1 ) . The edges associated with w i j are [ [ w i j ] ] = { l τ ( ϵ + l f 1 ( i ) 1 , s o p ( j 1 ) | q | + l q ( ϵ ) ) } ϵ n = 1 h q [ n ] .
Mathematics 13 02903 g004
Figure 5. Linearized CNN: This graph shows how to linearly process a h-sectioned input tensor (or a to-be-cropped tensor) T τ ( { T τ ϵ } ϵ n = 1 h τ [ n ] ¯ ) . The original setting is shown in Figure 1. Then, tensor T τ is linearized (or flattened/vectorized) via its positional indices (Lemma 2) and assigned values. Then, there are n = 1 h f [ n ] · N 1 weights to be optimized. Let us use { ω i j } 1 i n = 1 h f [ n ] 1 j N 1 to denote the set of weights. For neurons connecting to w i j , the set should be { { ϵ + l f 1 ( i ) 1 } : i { 1 , 2 , , n = 1 h f [ n ] } } ϵ n = 1 h q [ n ] ¯ , and for neurons connecting from w i j , the set should be { s o p ( j 1 ) · | q | + 1 , s o p ( j 1 ) · | q | + 2 , , s o p ( j 1 ) · | q | + ( q 1 ) , s o p j · | q | } , as shown in Figure 4. Hence, each w i j is associated with | q | pairs of edges.
Figure 5. Linearized CNN: This graph shows how to linearly process a h-sectioned input tensor (or a to-be-cropped tensor) T τ ( { T τ ϵ } ϵ n = 1 h τ [ n ] ¯ ) . The original setting is shown in Figure 1. Then, tensor T τ is linearized (or flattened/vectorized) via its positional indices (Lemma 2) and assigned values. Then, there are n = 1 h f [ n ] · N 1 weights to be optimized. Let us use { ω i j } 1 i n = 1 h f [ n ] 1 j N 1 to denote the set of weights. For neurons connecting to w i j , the set should be { { ϵ + l f 1 ( i ) 1 } : i { 1 , 2 , , n = 1 h f [ n ] } } ϵ n = 1 h q [ n ] ¯ , and for neurons connecting from w i j , the set should be { s o p ( j 1 ) · | q | + 1 , s o p ( j 1 ) · | q | + 2 , , s o p ( j 1 ) · | q | + ( q 1 ) , s o p j · | q | } , as shown in Figure 4. Hence, each w i j is associated with | q | pairs of edges.
Mathematics 13 02903 g005
Figure 6. Convolution with respect to sCNN and lCNN: Let us regard the alignment of this picture by a 5-by-4 grid. The image at position (1,1) is the original input tensor (a 3D tensor corresponding to RGB). The images at positions (1,2), (1,3), and (1,4) are the extracted R-image (a 2D tensor), G-image (a 2D tensor), and B-image (a 2D tensor), respectively. The second and third rows are the images for the 4 kernels. The 4th row reveals the sCNN convoluted results with respect to the 4 kernels, respectively. The 5th row reveals the lCNN convoluted results with respect to the 4 kernels, respectively.
Figure 6. Convolution with respect to sCNN and lCNN: Let us regard the alignment of this picture by a 5-by-4 grid. The image at position (1,1) is the original input tensor (a 3D tensor corresponding to RGB). The images at positions (1,2), (1,3), and (1,4) are the extracted R-image (a 2D tensor), G-image (a 2D tensor), and B-image (a 2D tensor), respectively. The second and third rows are the images for the 4 kernels. The 4th row reveals the sCNN convoluted results with respect to the 4 kernels, respectively. The 5th row reveals the lCNN convoluted results with respect to the 4 kernels, respectively.
Mathematics 13 02903 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, R.-M. A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks. Mathematics 2025, 13, 2903. https://doi.org/10.3390/math13172903

AMA Style

Chen R-M. A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks. Mathematics. 2025; 13(17):2903. https://doi.org/10.3390/math13172903

Chicago/Turabian Style

Chen, Ray-Ming. 2025. "A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks" Mathematics 13, no. 17: 2903. https://doi.org/10.3390/math13172903

APA Style

Chen, R.-M. (2025). A Complete Neural Network-Based Representation of High-Dimension Convolutional Neural Networks. Mathematics, 13(17), 2903. https://doi.org/10.3390/math13172903

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop