Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI

Wang, Jiali; Bie, Hongxia; Jing, Zhao; Zhi, Yichen; Fan, Yongkai; Ma, Wentao

doi:10.3390/electronics15040794

Open AccessArticle

Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI

by

Jiali Wang

¹

,

Hongxia Bie

^1,*

,

Zhao Jing

¹,

Yichen Zhi

¹

,

Yongkai Fan

² and

Wentao Ma

¹

Intelligent Media Computing Center, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

²

School of Computer and Cyber Sciences, Communication University of China, Beijing 100024, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(4), 794; https://doi.org/10.3390/electronics15040794

Submission received: 6 January 2026 / Revised: 4 February 2026 / Accepted: 10 February 2026 / Published: 12 February 2026

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Filter pruning is an effective approach for improving the inference efficiency of neural networks and is particularly attractive for on-device artificial intelligence (AI) applications. However, many existing methods fail to accurately identify redundant filters due to limited modeling of inter-filter dependencies. A filter pruning method based on nuclear norm analysis is proposed to quantify filter independence and guide structured pruning. By analyzing the layer-wise distribution of independence scores, a principled trade-off between pruning rate and accuracy preservation is achieved. In most evaluation scenarios, the proposed method achieves 75–95% parameter reduction and 70–80% FLOPs reduction, while substantially higher compression ratios (up to 99%) can be obtained for more redundant network architectures, with consistent performance trends observed across multiple accuracy-related metrics. Furthermore, deployment on an RK3588 neural processing unit (NPU) demonstrates substantial reductions in memory consumption and inference latency, confirming the practical effectiveness of the method for mobile and edge AI applications.

Keywords:

filter pruning; structured pruning; nuclear norm; filter independence; efficient neural networks; neural processing unit

1. Introduction

Neural networks have demonstrated remarkable performance across a wide range of applications. However, their substantial computational and memory demands pose significant challenges for deployment on resource-constrained edge devices. In many practical scenarios, such as automatic modulation classification [1], smoke segmentation [2] and traffic sign recognition [3] models with efficient inference and compact architectures are critically required. Consequently, reducing computational cost and model parameters has become a key research direction for developing practical and deployable neural networks in real-world applications.

Network pruning reduces computational and storage costs by removing redundant parameters while preserving model performance. It can be applied independently or in combination with quantization to achieve effective deep model compression [4,5]. Pruning methods are typically categorized as unstructured or structured. Unstructured pruning removes individual low-importance weights, creating sparse matrices that often require specialized hardware or software. Structured pruning removes entire parameter groups, such as filters or channels, producing dense architectures compatible with standard frameworks and hardware. It is therefore better suited for deployment on resource-constrained devices.

Regardless of whether pruning is unstructured, targeting individual weights, or structured, operating on parameter groups, a fundamental step in all pruning strategies is the evaluation of parameter importance. Accurate identification of parameters that contribute minimally to model performance enables pruning with limited accuracy degradation. Existing approaches for assessing parameter importance can be broadly categorized into two main classes.

The first category evaluates filter importance by designing networks that learn importance scores either concurrently with training or in an alternating manner, where filters correspond to groups of convolutional kernels associated with output channels. Representative approaches include predicting filter importance [6,7], selecting channel saliency [8,9], evaluating subnetwork performance [10,11], and searching for optimal substructures [12]. In addition, reinforcement learning-based methods employ agents to autonomously discover effective compression strategies [13]. Despite their effectiveness, these approaches generally increase training complexity and often lack clear interpretability regarding how the learned importance scores relate to the network’s underlying representations.

The second category evaluates filter importance based on network parameters or corresponding feature maps, thereby providing an explicit measure of each filter’s contribution. Parameter-based methods are conceptually simple and include criteria such as weight magnitude [14,15], weight norms [16], Batch Normalization scaling factors [17], and filter contributions to the loss [18]. However, because correlations among filters are typically ignored, these methods may produce inaccurate importance rankings and consequently misidentify redundant filters.

In contrast, feature-map-based approaches explicitly account for inter-filter correlations [19,20], which improves redundancy detection. Nevertheless, these methods require storing feature maps during inference or training, leading to increased computational overhead and a strong dependence on the dataset, hyperparameter settings, and experimental repeatability.

A novel filter pruning method that explicitly accounts for inter-filter correlations is presented. Filter importance is quantified using an independence score, defined as the difference in the nuclear norm of the weight matrix before and after masking a given filter. Furthermore, the distribution of independence scores within each layer is analyzed to guide the selection of layer-wise pruning rates. Based on this criterion, an efficient nuclear-norm-based pruning algorithm is developed, and performance is preserved through retraining. Deployment on neural processing units (NPUs) demonstrates substantial reductions in memory consumption and inference latency, highlighting the suitability of the proposed approach for on-device AI applications.

The primary contributions of this work are summarized as follows:

1.: A structured pruning framework is introduced that leverages the nuclear norm as an independence score for filter evaluation. Unlike existing nuclear-norm-based pruning methods that focus on feature map low-rankness or intra-layer redundancy, this framework explicitly quantifies inter-filter dependencies directly from network weights without requiring feature map computation.
2.: Theoretical analysis indicates that the differential nuclear norm of a filter is positively correlated with its independence. On this basis, a filter independence evaluation criterion is established to identify filters that are critical for preserving the representational capacity of the network.
3.: A unified redundancy analysis applicable to both convolutional and fully connected layers is developed. By modeling correlations among filters or neurons at the weight level, the proposed independence assessment enables effective identification of redundant structures that extract highly similar representations.
4.: A layer-wise pruning indicator is designed by characterizing the distribution of filter independence scores. Specifically, the ratio between mid-range values and the median is utilized to estimate the proportion of low-independence filters, thereby enabling adaptive pruning rates across different layers.

2. Related Work

Filter pruning typically involves three stages: evaluating filter importance, removing less important filters, and retraining the pruned network to reduce accuracy loss. In the first stage, selecting appropriate criteria for filter evaluation is critical. In the later stages, determining layer-wise pruning rates is essential, as variations across layers can significantly affect the pruning outcome. The following discussion reviews these stages.

Feature map correlation. Feature-map-based methods assess filter importance by analyzing similarities or correlations among the corresponding feature maps. A variety of metrics have been explored to quantify these relationships, including rank estimation [21], nuclear norm analysis [19], cosine similarity [20,22], Hamming distance [23], histogram intersection [24], and correlation coefficients derived from inner products [25]. More recently, cross-sample strategies have been investigated to further improve importance estimation, among which a global channel attention-based static pruning framework learns a unified channel ranking across samples [26].

Filter similarity. Filter importance is evaluated using geometric or statistical relationships among filters. Clustering methods retain filters based on k-means or similar strategies [27], while distance-based approaches identify less important filters via pairwise distances [28]. Low-rank constraints have been used to enforce linear independence among filter groups [29]. More recent studies adopt information-theoretic or embedding-based criteria, such as entropy-guided pruning on CIFAR-10 [30].

Layer pruning rate. Early one-shot methods typically adopt fixed or manually tuned rates without explicit layer-wise analysis [19,21]. Adaptive strategies learn layer-wise rates during training using reinforcement learning or differentiable search techniques [13]. Iterative frameworks, in contrast, adjust rates progressively through repeated pruning and retraining [16]. More recently, adaptive frameworks enhanced with self-distillation have been applied to achieve substantial parameter reduction while maintaining accuracy [31].

In addition, metaheuristic-based pruning approaches have been studied, highlighting both their effectiveness and computational overhead [32]. Symbolic regression-based refinement methods have also been explored for neural network compression [33]. Structured pruning has further been combined with deployment-oriented strategies—such as knowledge distillation [34], FPGA acceleration [35], and lightweight attention mechanisms [36]—to enable efficient inference on mobile and resource-constrained platforms.

As summarized in Table 1, feature-map-based methods incur substantial computational overhead, whereas filter-based approaches often lack a principled explanation of redundancy. In addition, layer-wise pruning rate design is frequently treated as a heuristic or auxiliary optimization problem. These limitations motivate the development of a unified framework that explicitly models filter dependencies at the weight level and utilizes this information to guide both filter selection and layer-wise pruning decisions.

3. Methodology: Filter Independence Modeling

This section analyzes filter group redundancy arising from correlations induced by linear transformations and introduces a method for assessing filter independence using the nuclear norm. Filters with higher independence are considered more important and are therefore retained during the pruning process. In addition, a procedure is proposed to identify filter submatrices that exhibit high independence.

3.1. Filter Redundancy Analysis

Let

ω_{i} \in R^{n_{i} \times n_{i - 1} \times k \times k}

denote the weights of the i-th convolutional layer, where

n_{i}

is the number of output channels and k is the size of the convolutional kernel. The weights are reshaped into a filter matrix

F = {(f_{1}, f_{2}, \dots, f_{n_{i}})}^{⊤} \in R^{n_{i} \times (n_{i - 1} k^{2})}

, which allows the convolution operation to be represented as a series of matrix multiplications. The product

F x

performs the transformation from the input space to the feature space, with F serving as the transformation matrix, as illustrated in Figure 1. The first three steps of the figure demonstrate how the convolution kernels are represented as a transformation matrix, thereby enabling a linear mapping from the input space to the feature space.

A similar representation applies to fully connected layers, which also perform matrix multiplication. In a fully connected layer with n outputs, each neuron corresponds to a row of the filter matrix F, containing the weights connecting all input units to that output. The filter matrix F is used to represent fully connected layer parameters, analogous to its role in convolutional layers. Consequently, both convolutional and fully connected layers perform feature extraction through linear transformations from the input space to the feature space. The independence assessment and pruning strategy developed for convolutional filters can thus be directly applied to neurons in fully connected layers, ensuring that the proposed framework systematically covers the entire network.

As the transformation matrix for the feature space, the filter matrix is closely related to the features it generates. Specifically, the feature space corresponds to the column space of the filter matrix F, with its column vectors spanning the space. The column space and the null space of a matrix are orthogonal, and the sum of their dimensions equals the number of rows of the matrix. When the row vectors of F are correlated, the dimension of the null space increases, thereby reducing the dimension of the column space, i.e., the feature space. Consequently, features extracted by correlated filters are redundant, as illustrated in step 4 of Figure 1.

When considering a single filter

f_{j} \in R^{n_{i - 1} k^{2}}

,

j \in {1, 2, \dots, n_{i}}

, the convolution operation reduces to a vector inner product

f_{j} \cdot x

, compressing the input space into a one-dimensional feature space. In contrast, the filter matrix F transforms the input space into a feature space of dimension

Rank (F)

, which generally exceeds

n_{i}

one-dimensional projections. Therefore, the filter matrix encodes more comprehensive and interrelated information, making its analysis more informative than assessing the importance of individual filters alone. Redundancy within the filter matrix can thus be interpreted as structural redundancy, providing valuable guidance for pruning.

Building on the preceding analysis, filter importance is assessed by examining correlations among filters. The central idea is to rank filters according to their degree of correlation with the rest of the layer. Specifically, a filter exhibiting high correlation is likely to be linearly dependent on other filters. In this case, the filter can be represented as a linear combination of the remaining filters, and the corresponding feature map can be approximated as a weighted combination of the feature maps generated by the other filters. Consequently, pruning a highly correlated filter has minimal impact, as its extracted feature map can be largely preserved or reconstructed from the outputs of the remaining filters.

To illustrate this intuition, Figure 2 presents a simplified example. Filters

f_{1}

and

f_{2}

are highly similar, producing nearly identical feature maps

O_{1}

and

O_{2}

, whereas filter

f_{3}

extracts distinct features, resulting in a feature map

O_{3}

that differs substantially. This example demonstrates that highly similar filters generate redundant features, which can be exploited for pruning.

3.2. Filter Independence Evaluation Based on Nuclear Norm

Before presenting the formal definition, it should be emphasized that the proposed method imposes no additional assumptions or constraints beyond the standard convolutional and fully connected layer settings. The formulation relies solely on layer weights, rendering it architecture-agnostic and generally applicable across different CNN models. With these general settings established, a metric for quantifying filter correlations is introduced.

Compared with the dimension of the column space, the basis vectors encode more comprehensive and richer spatial information. While the rank of F indicates only the dimensionality of the feature space, the nuclear norm of F, defined as the sum of its singular values, reflects the scale of the basis vectors within the column space. The nuclear norm of the filter matrix F is thus employed to represent the effective size of the feature space. This metric captures additional spatial information, providing a more informative measure of correlations among filters.

Since the filter

f_{j}

corresponds to the j-th row vector of F, the following decomposition holds:

\begin{matrix} F^{T} F & = (f_{1}, \dots, f_{n_{i}}) (\begin{matrix} f_{1}^{T} \\ ⋮ \\ f_{n_{i}}^{T} \end{matrix}) \\ = f_{1} f_{1}^{T} + \dots + f_{j} f_{j}^{T} + \dots + f_{n_{i}} f_{n_{i}}^{T} \\ = (f_{1} f_{1}^{T} + \dots + f_{n_{i}} f_{n_{i}}^{T}) + f_{j} f_{j}^{T} \\ = {(F^{'})}^{T} F^{'} + f_{j} f_{j}^{T} \end{matrix}

(1)

Here,

F^{'} = {(f_{1}, \dots, f_{j - 1}, f_{j + 1}, \dots, f_{n_{i}})}^{T} \in R^{(n_{i} - 1) \times (n_{i - 1} k^{2})}

denotes the filter matrix obtained by excluding

f_{j}

. This decomposition separates the contribution of

f_{j}

from that of the remaining filters, which serves as the foundation for defining the filter independence metric.

Suppose that

F^{'}

admits a singular value decomposition

F^{'} = U Σ V^{T}

, where

Σ = diag (σ_{1}, \dots, σ_{n_{i} - 1}) \in R^{(n_{i} - 1) \times (n_{i - 1} k^{2})}

. Then,

{(F^{'})}^{T} F^{'} = V Σ^{2} V^{T}

. The columns of

V^{T} \in R^{(n_{i - 1} k^{2}) \times (n_{i - 1} k^{2})}

form a basis for

R^{n_{i - 1} k^{2}}

. Since

f_{j} \in R^{n_{i - 1} k^{2}}

lies in this space, it can be expressed as

f_{j} = V (\begin{matrix} a_{j 1} \\ ⋮ \\ a_{j, n_{i - 1} k^{2}} \end{matrix}) .

(2)

Consequently,

f_{j} f_{j}^{T} = V (\begin{matrix} a_{j, 1} \\ ⋮ \\ a_{j, n_{i - 1} k^{2}} \end{matrix}) (a_{j, 1}, \dots, a_{j, n_{i - 1} k^{2}}) V^{T} = V Σ_{a}^{2} V^{T}

(3)

where

Σ_{a}^{2} = diag (a_{j 1}^{2}, \dots, a_{j, n_{i - 1} k^{2}}^{2})

. This decomposition separates the contribution of

f_{j}

along the basis defined by V, which underpins the subsequent definition of the filter independence score.

Let the singular values of F be

μ_{1}, \dots, μ_{n_{i - 1} k^{2}}

. Then, the eigenvalues of

F^{T} F

are given by

μ_{1}^{2}, \dots, μ_{n_{i - 1} k^{2}}^{2}

. According to the perturbation theorem (Matrix Theory [37]), for any eigenvalue

μ_{p}^{2}

of

F^{T} F

, there exists an eigenvalue

σ_{q}^{2}

of

{(F^{'})}^{T} F^{'}

such that

| μ_{p}^{2} - σ_{q}^{2} | \leq max_{m} a_{j m}^{2}, m \in {1, 2, \dots, n_{i - 1} k^{2}} .

(4)

Consequently, taking square roots yields

| μ_{p} - σ_{q} | \leq \sqrt{| μ_{p}^{2} - σ_{q}^{2} |} \leq max_{m} \sqrt{a_{j m}^{2}} = max_{m} | a_{j m} | .

(5)

This analysis indicates that the singular values of F and

F^{'}

differ by at most the largest absolute coefficient in the representation of

f_{j}

with respect to the basis of

F^{'}

. Consequently, the contribution of a highly correlated filter

f_{j}

to the overall feature space, as quantified by the nuclear norm, is relatively small. This observation provides a principled justification for assigning lower importance to such filters during pruning.

When

f_{j}

is uniformly projected onto the column space of

V^{T}

, its largest projection component is small, resulting in only a minor difference between the singular values of F and

F^{'}

. In contrast, when

f_{j}

is non-uniformly projected, the magnitude of the largest projection component increases. If

f_{j}

becomes aligned with a basis vector of the column space of

V^{T}

, the projection component attains its maximum, leading to a substantial difference between the singular values of F and

F^{'}

. This behavior provides an intuitive explanation for why highly independent filters, which are less aligned with other filters, contribute more significantly to the feature space.

Thus, the difference between the singular values of F and

F^{'}

provides a measure of the degree of independence of

f_{j}

relative to the other filters in

F^{'}

. A filter that induces a larger difference is more closely aligned with a basis vector of the row space of

F^{'}

, indicating greater independence from the remaining filters. Motivated by this observation, the following definition of filter independence is introduced.

Definition 1.

Let

F^{(i)} = {(f_{1}, f_{2}, \dots, f_{n_{i}})}^{T} \in R^{n_{i} \times (n_{i - 1} k^{2})}

denote the filter matrix of the i-th convolutional layer. The independence score of a filter

f_{j} \in R^{n_{i - 1} k^{2}}

,

j \in {1, 2, \dots, n_{i}}

, relative to the other filters in

F^{(i)}

, is defined as

S_{F^{(i)}} (f_{j}) = | | F^{(i)} {| |}_{*} - | | F^{(i)} ⊙ M_{j} {| |}_{*},

(6)

where

| | \cdot {| |}_{*}

denotes the nuclear norm, ⊙ is the Hadamard product, and

M_{j} \in R^{n_{i} \times (n_{i - 1} k^{2})}

is a row-wise mask matrix with the j-th row set to zeros and all other rows set to ones.

Specifically, a filter

f_{j}

is masked from the filter matrix F, and the resulting change in the nuclear norm between the modified and original matrices is computed. One advantage of employing the nuclear norm difference to characterize filter correlation is its ability to capture fine-grained variations. As illustrated in Figure 3a, using the rank difference to evaluate filter independence produces only two extreme values—0 for correlated filters and 1 for independent filters—thereby limiting its sensitivity to subtle variations. In contrast, the nuclear norm difference provides a more sensitive measure, enabling a finer distinction of filter independence, as shown in Figure 3b.

The independence score

S_{F^{(i)}} (f_{j})

quantifies the correlation between a filter

f_{j}

and the remaining filters in

F^{(i)}

, with larger values indicating lower substitutability. Consequently, independence scores can be computed for all filters in a network layer to accurately identify less important filters for pruning. As illustrated in Figure 4, feature maps extracted by filters sorted according to L2 norm and independence score are compared. The feature map corresponding to filter (7), which contains fine-grained details (highlighted by the red dotted box), is discarded when pruning based on the L2 norm but retained when pruning based on the independence score. Conversely, the feature map corresponding to filter (4), which primarily encodes contour information (highlighted by the purple dotted box), exhibits the opposite behavior.

3.3. Computational Complexity

For the i-th convolutional layer, all filters are arranged into a matrix

F^{(i)} \in R^{n_{i} \times (n_{i - 1} k^{2})}

. The nuclear norm is computed via singular value decomposition (SVD), which has a computational complexity of

O (n_{i} (n_{i - 1} k^{2}) min (n_{i}, n_{i - 1} k^{2}))

. Since the nuclear norm of the full filter matrix,

| | F^{(i)} {| |}_{*}

, is shared across all filters, it needs to be computed only once. In contrast, the masked nuclear norm,

| | F^{(i)} ⊙ M_{j} {| |}_{*}

, must be computed for each of the

n_{i}

filters. Therefore, the total complexity for computing the independence scores of all filters in the i-th layer is

O (n_{i}^{2} (n_{i - 1} k^{2}) min (n_{i}, n_{i - 1} k^{2})) .

(7)

In typical convolutional neural networks, the quantity

n_{i - 1} k^{2}

is often much larger than

n_{i}

, allowing the overall complexity to be approximated as

O (n_{i}^{3} n_{i - 1} k^{2}) .

(8)

It is worth noting that this computation is performed only once per layer during the offline pruning stage and is independent of the spatial resolution of the feature maps. Consequently, unlike feature-map-based pruning methods that require repeated forward passes over large activation tensors, the proposed approach introduces no additional overhead during training or inference. In practice, its computational cost is negligible compared with the cumulative cost of convolution operations.

3.4. Independent Submatrix Estimation

This subsection focuses on estimating the filter submatrix

{\tilde{F}}_{sub - c}

with high independence among the numerous candidate submatrices

F_{sub - c} = F ⊙ M_{j_{1}, j_{2}, \dots, j_{c}}

, when c filters are to be pruned from the i-th convolutional layer. Here,

M_{j_{1}, j_{2}, \dots, j_{c}} \in R^{n_{i} \times (n_{i - 1} k^{2})}

is a multi-row mask matrix in which the

j_{1}, j_{2}, \dots, j_{c}

rows are set to zeros and all other rows are set to ones.

Theoretically, the filter submatrix

F_{sub - c}

can be constructed iteratively. First, the independence scores

S_{F} (f_{j})

are computed for all filters. The filter with the lowest score,

f_{j_{1}}

, is then masked to obtain

F_{sub - 1}

. Following this procedure,

S_{F_{sub - 1}} (f_{j})

is computed for the remaining filters, and the filter with the lowest score,

f_{j_{2}}

, is masked to form

F_{sub - 2}

. This process is repeated for c iterations to obtain

F_{sub - c}

.

However, this iterative approach requires computing a large number of independence scores, resulting in significant computational cost. To mitigate this,

S_{F} (f_{j})

can be used to estimate the relative magnitudes of

S_{F_{sub - c}} (f_{j})

, substantially reducing the computation in practice. Specifically, for filters

f_{j}, f_{k} \notin {j_{1}, j_{2}, \dots, j_{a}}

, if

S_{F} (f_{j}) > S_{F} (f_{k})

, then

S_{F_{sub - a}} (f_{j}) > S_{F_{sub - a}} (f_{k}), a \in {1, 2, \dots, c - 1}

. For instance, in a convolutional layer containing 500 filters where 100 filters are to be pruned, this estimation method reduces the total number of independence score computations from 45,000 (using the full iterative procedure) to only 500.

Definition 2.

Let

F = {(f_{1}, f_{2}, \dots, f_{n_{i}})}^{T} \in R^{n_{i} \times (n_{i - 1} k^{2})}

denote the filter matrix of the i-th convolutional layer, from which c filters are to be pruned. The submatrix with maximal independence,

\tilde{F} sub - c

, consists of the remaining

n_{i} - c

filters corresponding to the highest independence scores, i.e.,

f {max}_{1}, f {max}_{2}, \dots, f {max}_{n_{i} - c}

. Formally,

\tilde{F} sub - c = {(f_{{max}_{1}}, f {max}_{2}, \dots, f {max}_{n_{i} - c})}^{T} \in R^{(n_{i} - c) \times (n_{i - 1} k^{2})} .

(9)

Here,

f_{{max}_{k}}

,

k \in 1, 2, \dots, n_{i} - c

, denotes the filter with the k-th highest independence score. Note that the independence of the filter matrix is invariant to the ordering of its filters. Definition 2 can be formally justified via mathematical induction, utilizing the previously discussed estimation method.

Proof.

For the base case

c = 1

, we have

| | F_{sub - 1} {| |}_{*} = {| | F | |}_{*} - S_{F} (f_{j}) .

(10)

If

S_{F} (f_{j})

is minimal, then

| | F_{sub - 1} {| |}_{*}

is maximal.

Assume that for

c = n

, the submatrix

F_{sub - n} = {(f_{{max}_{1}}, f_{{max}_{2}}, \dots, f_{{max}_{n_{i} - n}})}^{T}

(11)

holds. For

c = n + 1

, the submatrix satisfies

| | F_{sub - (n + 1)} {| |}_{*} = | | F_{sub - n} {| |}_{*} - S_{F_{sub - n}} (f_{j}) .

(12)

Similarly, if

S_{F_{sub - n}} (f_{j})

is minimal, then

| | F_{sub - (n + 1)} {| |}_{*}

is maximal. In practice,

S_{F} (f_{{max}_{n_{i} - n}})

can be used to estimate

S_{F_{sub - n}} (f_{{max}_{n_{i} - n}})

, completing the induction step. □

4. Independence-Aware Pruning Strategy

This section first examines layer-wise pruning rates based on the independence of filters within each layer, providing insight into how the distribution of independence scores influences pruning decisions. Building on this analysis, a pruning method is proposed that optimizes filter selection according to the computed independence scores.

4.1. Relationship Between Pruning Rate and Filter Independence

Manually assigning layer-wise pruning rates has several limitations. Balancing the pruning rate and network performance is challenging, often resulting in either overly conservative pruning or noticeable accuracy degradation. Moreover, the trial-and-error process can be computationally expensive. A more efficient approach is to adopt customized pruning rates informed by available data. Since the independence score of each filter has been computed for all layers and layers are treated independently, the relationship between the distribution of layer-wise independence scores and the corresponding pruning rates is investigated.

The analysis is performed on the sequence of independence scores

S_{F}

, which contains all filter scores within a given layer. A predominance of low independence scores indicates strong correlations among the filters and high redundancy, suggesting a higher pruning rate for that layer. Conversely, layers with more filters exhibiting high independence scores warrant lower pruning rates. This approach is designed to minimize performance degradation. By assigning distinct pruning rates to each layer based on the distribution of independence scores, a favorable trade-off between compression and accuracy is achieved, enabling high overall pruning rates while preserving model performance.

To quantify the distribution of independence scores within a layer and guide the corresponding pruning rate, a ratio

η

is introduced to capture the relative spread of the independence scores.

Definition 3.

For a layer with independence score sequence

S_{F}

, the ratio η is defined as

η = ln (\frac{max (S_{F}) + min (S_{F})}{2 median (S_{F}) + ε}),

(13)

where ε is a small constant added to avoid division by zero.

The logarithm is applied to compress the range of the ratio while preserving relative differences. As illustrated in Figure 5, the ratio

η

effectively characterizes the distribution of the independence scores within a layer. A larger

η

indicates a predominance of filters with low independence scores (orange line), corresponding to higher redundancy and suggesting a higher layer pruning rate. Conversely, a smaller

η

reflects a greater number of highly independent filters (blue line), implying lower redundancy and a smaller pruning rate. Thus,

η

provides a principled criterion for guiding layer-wise pruning. In the figure, the 7th and 11th layers are presented as representative examples to demonstrate variations in score distributions across layers.

A layer-specific pruning strategy can be derived from each layer’s

η

value. When the

η

values across layers are closely clustered, it suggests similar redundancy levels, warranting uniform pruning rates. Conversely, a dispersed distribution of

η

indicates heterogeneous redundancy, motivating non-uniform pruning rates across layers. The variance of the sequence is commonly used to quantify distribution dispersion. For instance, on CIFAR-10, the variance is 37.6463 for VGG_16 and 0.1075 for ResNet_56, implying that VGG_16 benefits from non-uniform pruning, whereas ResNet_56 is more suited to uniform pruning. It should be noted, however, that variance primarily reflects general trends and does not account for outliers.

4.2. Independence-Aware Structured Pruning Algorithm

Inspired by the concept of linear transformation, redundant filters in the network are analyzed by assessing filter independence using the nuclear norm. The resulting independence scores serve a dual purpose: evaluating filter importance and guiding the pruning strategy. Following this principle, a nuclear-norm-based filter pruning algorithm is developed, as illustrated in Figure 6. In the left panel, the input consists of the convolutional layer weights, and the output is the set of computed filter independence scores along with the corresponding

η

. In the right panel, the input is the same set of weights, and the output is the pruned weights obtained by removing redundant filters according to the computed independence scores and

η

.

The proposed method computes each filter’s independence score by measuring the change in the nuclear norm. Filters with low independence scores, whose features can be largely represented by other filters, are pruned with minimal impact on network performance. The sequence of independence scores within each layer is then summarized using

η

to guide the pruning strategy. Across the network, the

η

values of all layers are normalized, and layers with higher

η

are assigned higher pruning rates, enabling adaptive, layer-specific pruning.

The network’s overall pruning rate is defined as the weighted sum of the layer-wise pruning rates, where the weights correspond to each layer’s proportion of filters. Formally, it is expressed as

P = \sum_{i = 1}^{L} a_{i} p_{i}

, where L is the total number of layers,

a_{i}

denotes the proportion of filters in the i-th layer relative to the entire network, and

p_{i}

represents the pruning rate of the i-th layer.

5. Experimental Results

This section presents the experimental design, implementation details, and results. The primary objective is to evaluate the effectiveness of the proposed method and to compare its performance with existing approaches across multiple datasets.

5.1. Experimental Setup

Datasets. The proposed pruning method is evaluated on five benchmark datasets: FashionMNIST [38], SVHN [39], CIFAR-10 [40], CIFAR-100 [40], and ImageNet [41]. Key statistics of each dataset, including the number of training and testing samples, the number of classes, and image dimensions, are summarized in Table 2.

To accommodate the varying image sizes across datasets, the input layer of the network was adjusted accordingly. Specifically, the input dimensions were set to

28 \times 28

for FashionMNIST,

32 \times 32

for SVHN, CIFAR-10, and CIFAR-100, and

224 \times 224

for ImageNet. This configuration allows the network to process images at their native or near-native resolution with minimal distortion, while ensuring that the proposed pruning method can be applied consistently across all datasets.

Models. A variety of models with different architectures and sizes were employed for the experiments. Specifically, AlexNet [42] was used for FashionMNIST; AlexNet and VGG_11 [43] for SVHN; VGG_16 [43] and ResNet_56/110 [44] for CIFAR-10 and CIFAR-100; and ResNet50 [44] for ImageNet.

Hyperparameter Configuration. All experiments were implemented in PyTorch (1.10.0) using stochastic gradient descent (SGD) as the optimization algorithm. Each network was trained from scratch to establish the baseline model, with hyperparameters adjusted according to the dataset. For FashionMNIST and SVHN, the batch size was 128, momentum was 0.9, weight decay was 0.0001, and the initial learning rate was 0.01; networks were trained for 20 and 50 epochs, respectively. For CIFAR-10 and CIFAR-100, the batch size was 256, momentum was 0.9, weight decay was 0.005, and the initial learning rate was 0.01; training was conducted for 400 epochs. For ImageNet, the batch size was 256, momentum was 0.99, weight decay was 0.0001, and the initial learning rate was 0.01; training lasted for 200 epochs.

After pruning, the networks were retrained starting from the pruned weights without re-initialization. Retraining employed the same optimizer, batch size, learning rate schedule, data augmentation, and loss function as in the original training. For VGG_16, the weight decay was set to 0.0005 during retraining. The procedure followed the protocol described in [19].

All pruning experiments were conducted deterministically using fixed pretrained models and fixed independence-score rankings. Under identical settings, repeated runs yielded the same pruning configurations and nearly identical evaluation results. Therefore, only single-run results are reported, consistent with prior studies on deterministic pruning.

5.2. Performance Evaluation of Pruned Networks

The effectiveness of the proposed pruning method was evaluated from two perspectives: classification performance, measured by Top-1 accuracy, and model compression, assessed by reductions in parameters and FLOPs. Table 3 presents the performance of pruned models across nine different pruning rates on the FashionMNIST and SVHN datasets. The proposed method substantially reduces both parameters and FLOPs, achieving up to 99% reduction in both metrics while incurring minimal Top-1 accuracy loss: 1.34% for AlexNet on FashionMNIST, 1.68% for AlexNet on SVHN, and 2.63% for VGG_11 on SVHN.

5.3. Comparison with State-of-the-Art Pruning Methods

This part of the experiment compares the classification accuracy, number of parameters, and FLOPs of pruned networks under different pruning schemes. Several recent methods were selected for comparison on CIFAR-10, all of which evaluate filter similarity or correlation using various metrics to prune redundancy. Specifically, CHIP [19] employs the nuclear norm of the feature map, VNGEP [20] uses cosine distance, Li et al. [23] adopt the Hamming distance, EACP [27] relies on filter clustering distance, and CLR-RNF [28] uses the L2 norm of the filter. Among these methods, CHIP and EACP are particularly notable. In contrast, the proposed approach leverages the nuclear norm of the filters themselves.

As shown in Table 4, the performance metrics of each method at different pruning rates are presented. The numbers following each method indicate distinct pruning rates; for example, CHIP1 and CHIP2 correspond to two pruning rates for the CHIP method. Across comparable accuracy levels, the proposed method consistently achieves lower parameters and FLOPs.

Specifically, for VGG_16, at 94.09% Top-1 accuracy, the proposed method reduces parameters by 92.4% and FLOPs by 64.43%, whereas CHIP1, at 93.86% accuracy, achieves 81.6% parameter reduction and 58.1% FLOPs reduction. At 93.57% accuracy, the proposed method attains 94.2% parameter reduction and 73.15% FLOPs reduction, compared with EACP1 at 93.29% accuracy, which achieves 75.4% and 71.77%, respectively.

For ResNet_56, the proposed method reduces parameters by 74.85% and FLOPs by 74.5% at 92.44% accuracy, while CHIP2 achieves 71.8% and 72.3% at 92.05% accuracy. For ResNet_110, the proposed method attains 68.23% parameter reduction and 67.92% FLOPs reduction at 93.7% accuracy, outperforming EACP1, which reduces parameters by 61.3% and FLOPs by 65.7% at 93.39% accuracy.

Owing to the limited number of studies on CIFAR-100, comparative experiments are conducted on VGG_16 and ResNet_56 between the proposed pruning method and APRS [13] and ASTER [17]. For ResNet_110, results are reported only for the proposed method, as summarized in Table 5.

For VGG_16, the proposed method outperforms both APRS and ASTER in terms of classification accuracy and compression. Specifically, at 73.86% Top-1 accuracy, the proposed method reduces parameters by 83.9% and FLOPs by 70.7%, whereas APRS3 achieves 70.79% accuracy with 67.2% parameter reduction and 70.2% FLOPs reduction. Similarly, at 74.12% Top-1 accuracy, the proposed method reduces parameters by 81.3% and FLOPs by 58.2%, while APRS2, with 73.02% accuracy, achieves 64% and 50.1% reductions, respectively. For ResNet_56, the proposed method achieves slightly lower classification accuracy than ASTER under higher compression levels.

Finally, comparative experiments were conducted on ImageNet. As shown in Table 6, at equivalent pruning rates, the proposed method exhibits the least degradation in classification performance compared with CLR-RNF [28], VNGEP [20], and CHIP [19].

5.4. Sensitivity Analysis of the Pruning-Rate Indicator $η$

Figure 7 presents the filter independence score sequences for all layers of the networks used in the experiments. In the proposed method,

η

characterizes the distribution of each sequence, with the variance of

η

across all sequences indicated above each plot. Networks with similar sequence distributions exhibit smaller variances, whereas networks with more heterogeneous distributions have larger variances.

For networks with large variance, non-uniform pruning is applied, while networks with small variance and no outliers adopt uniform pruning. When small-variance networks contain outlier sequences (highlighted by red dashed ellipses), the layer-specific pruning rates can be adjusted accordingly. For example, in VGG_16 on CIFAR-100, the pruning rates for layers with outliers should be slightly reduced, whereas in ResNet_56, they can be slightly increased to account for these deviations.

5.5. Layer-Wise Pruning Results for VGG _16

In this subsection, experiments are conducted to validate the effectiveness of

η

in guiding layer-specific pruning strategies. By directly measuring post-pruning accuracy, the influence of variations in layer pruning rates on overall network performance is analyzed.

The accuracy and loss of VGG_16 on CIFAR-10 are evaluated after sequentially pruning individual filters in each layer. Filters within each layer are removed one by one according to their normalized independence scores, as illustrated in Figure 8, which highlights 13 convolutional layers and one fully connected layer. Filters are sorted from lowest to highest independence score (blue curve), with the horizontal axis representing filter order. The green line indicates the network accuracy without retraining after pruning up to the corresponding filter, while the orange line represents the loss. The red star marks an accuracy of 94%.

As shown in Figure 8, the distribution of independence scores (blue lines) varies across layers. Several layers contain a substantial proportion of low-sensitivity filters, making them suitable for higher pruning rates; pruning multiple filters in these layers has a relatively minor effect on accuracy, as observed in the final four layers. Conversely, layers with a larger number of high-sensitivity filters are better candidates for lower pruning ratios, a trend particularly evident in the middle layers.

Layer 0 contains 64 filters, and the network maintains 94% accuracy after pruning 29 filters. In contrast, only four filters can be pruned in Layer 1 while preserving the same accuracy. The pruning rate increases markedly from Layer 8 onward, where layers with 512 filters allow pruning of 289, 409, 482, 478, 498, and 343 filters, respectively. When considering single-layer pruning while maintaining 94% network accuracy, the layer-specific pruning rates vary substantially. These single-layer pruning rates are then combined to form a set of layer-wise pruning strategies, which are applied to prune the entire network.

Table 7 presents the network accuracy and computational complexity resulting from different layer-wise pruning strategies. The pruning rates for the 13 convolutional layers and the single fully connected layer are denoted as

p_{0}

–

p_{13}

. Layer-specific pruning strategies have a substantial impact on network accuracy, which ranges from 10% to 94.06%. Strategies whose layer pruning rates better align with the ratio

η

achieve higher accuracy, as illustrated in Figure 9.

5.6. Efficient Deployment Results on NPUs

This subsection evaluates the suitability of the pruned models for mobile deployment. Experiments were conducted on the RK3588 platform (Rockchip Electronics Co., Ltd., Fuzhou, China), with both the pruned and original models deployed on the same Neural Processing Unit (NPU), as illustrated in Figure 10. Under the constraint that the difference in test accuracy does not exceed 3%, memory usage and total execution time are compared for the same input. The pruned models consistently require less memory and shorter execution time.

To quantify the performance changes, the relative difference index (RDI) is defined as

R D I = \frac{A_{pruned} - A_{original}}{A_{original}}

, where A represents an NPU performance metric, including weight memory and total execution time. Table 8 summarizes the results. For VGG_16, pruning reduced memory consumption by 94% and execution time by 84%, while for ResNet_56, memory usage and execution time were reduced by 85% and 50%, respectively.

6. Discussion

The experimental results demonstrate the advantages of the proposed nuclear-norm-based filter pruning strategy across multiple network architectures and datasets. Overall, the method achieves substantial reductions in model parameters and FLOPs while maintaining competitive classification accuracy, confirming its effectiveness for efficient deep network compression.

First, the proposed method efficiently reduces redundancy by quantifying filter independence using the nuclear norm. This approach accurately identifies redundant filters and removes them with minimal impact on performance, leading to higher compression ratios compared with existing pruning methods at similar accuracy levels (Table 3, Table 4, Table 5 and Table 6).

Second, the method enables layer-wise adaptive pruning. Unlike uniform pruning strategies, independence scores provide fine-grained guidance for determining pruning rates for each layer. Layers with higher redundancy are pruned more aggressively, while critical layers are preserved, yielding an improved trade-off between compression and accuracy. The heterogeneous distribution of independence scores across layers highlights that middle layers often contain high-sensitivity filters essential for maintaining representational capacity, whereas deeper layers have more low-independence filters that can be pruned with minimal accuracy loss. These observations underscore the necessity of adaptive, layer-specific pruning rates guided by the proposed

η

indicator rather than heuristic or manually designed schemes.

Third, the proposed framework exhibits strong generality and robustness. The consistent performance gains observed across different architectures (VGG_16, ResNet_56, and ResNet_110) and datasets (CIFAR-10, CIFAR-100, and ImageNet) demonstrate that the method is architecture-agnostic and broadly applicable without requiring task- or network-specific modifications. Even under high pruning ratios, pruned networks retain most of their original accuracy after retraining, often outperforming or matching existing approaches in both compression efficiency and classification performance.

While the proposed independence-aware pruning method is primarily designed for networks containing convolutional and fully connected layers, it can also be extended to transformer-based architectures that incorporate such components. The principle of filter independence remains valid for convolutional and fully connected blocks within transformers, enabling pruning to reduce redundancy without compromising the unique contributions of distinct filters. These considerations indicate that the method is not limited to standard CNNs and can be adapted to a broader range of architectures, providing practical guidance for future extensions.

From a deployment perspective, independence-aware pruning provides significant practical benefits for resource-constrained hardware. The substantial reductions in memory usage and execution time on the RK3588 NPU demonstrate that the method effectively removes filters that disproportionately contribute to computational overhead. Its structured pruning nature aligns well with NPU execution characteristics, enabling consistent performance gains without the need for specialized hardware-aware optimization. The use of the relative difference index (RDI) further offers a hardware-agnostic metric to quantify deployment benefits, suggesting that the approach can be extended to other mobile and edge platforms.

Although the deployment experiments focus on the RK3588 NPU, the proposed pruning framework itself is inherently hardware-agnostic and can be applied to other edge computing devices with minimal adaptation. Extending evaluations to a broader range of hardware architectures, including various NPUs and mobile accelerators, constitutes an important avenue for future work.

In summary, the proposed method effectively balances model compactness and performance preservation. By combining accurate redundancy identification, adaptive layer-wise pruning, and strong generalization, it provides a principled and practical approach for efficient neural network compression and deployment across diverse scenarios.

7. Conclusions

This paper presents an independence-aware filter pruning framework for deep neural networks, designed to reduce computational complexity while preserving predictive performance. Through extensive experiments on VGG and ResNet architectures across multiple datasets—including CIFAR-10, CIFAR-100, SVHN, FashionMNIST, and ImageNet—the proposed method consistently achieves substantial reductions in both model parameters and FLOPs while maintaining competitive Top-1 accuracy. These results demonstrate the effectiveness of independence-aware pruning in minimizing redundancy and maximizing compression efficiency.

From a methodological perspective, this work introduces a principled approach to filter-level redundancy evaluation based on the differential nuclear norm. Unlike prior approaches that rely on feature map statistics or heuristic importance metrics, the independence score directly operates on pretrained weights, capturing both correlation structure and effective dimensionality of convolutional filters. Building upon this metric, the proposed layer-wise pruning indicator

η

enables adaptive pruning rates across layers by reflecting the distribution characteristics of filter independence scores. This formulation allows the pruning process to explicitly account for heterogeneous redundancy patterns within a network, providing a unified, architecture-agnostic framework for structured pruning without the need for manual tuning of layer-specific rates.

Beyond algorithmic contributions, the proposed method exhibits clear practical benefits for deployment on resource-constrained hardware. The structured nature of the pruning strategy aligns with NPU execution characteristics, leading to significant reductions in memory usage and inference latency. This demonstrates that independence-aware pruning not only improves model compactness but also facilitates real-world deployment on mobile and edge devices, where computational resources and energy budgets are limited.

Despite its effectiveness, the proposed method has several limitations. First, the current framework focuses on the inference stage and does not reduce computational cost during training, as independence scores are computed from pretrained weights. Second, the pruning process is performed offline, which may not fully capture potential redundancy variations that arise during training or in dynamic deployment environments. These limitations constrain the applicability of the method in scenarios requiring training-time efficiency or adaptive network adjustment.

Future work will aim to extend independence-aware pruning to more flexible and dynamic settings. One direction is to integrate the independence evaluation into the training process, enabling training-time or online pruning with minimal overhead. Another avenue is to investigate dynamic pruning mechanisms that adaptively adjust network structures in response to evolving data distributions or changing hardware constraints. Additionally, developing computationally efficient approximations of the independence score could facilitate real-time or on-device pruning, further broadening the practical applicability of the method to a wider range of edge and embedded platforms.

In summary, the proposed independence-aware filter pruning framework provides a principled, effective, and deployment-friendly approach for deep network compression. By combining accurate redundancy identification, adaptive layer-wise pruning, and robust performance across diverse architectures and datasets, the method establishes a reliable strategy for balancing model compactness, computational efficiency, and predictive performance, offering both theoretical insight and practical value for modern deep learning applications.

Author Contributions

J.W.: Conceptualization, Methodology, Writing—Original Draft, and Writing—Review and Editing. Z.J., Y.Z. and W.M.: Validation and Visualization. H.B.: Funding Acquisition and Writing—Review and Editing. Y.F.: Writing—Review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the Science and Technology Innovation 2030 Major Projects (Grant No. 2022ZD0211603) and the Beijing Natural Science Foundation-Joint Funds of Haidian Original Innovation Project (L232056).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No datasets were generated or analyzed during the current study.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (version GPT-5 mini, OpenAI, San Francisco, CA, USA) for the purposes of checking and correcting grammatical and spelling errors. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
NPU	Neural Processing Unit

References

Dong, B.; Liu, Y.; Gui, G.; Fu, X.; Dong, H.; Adebisi, B.; Gacanin, H.; Sari, H. A lightweight decentralized-learning-based automatic modulation classification method for resource-constrained edge devices. IEEE Internet Things J. 2022, 9, 24708–24720. [Google Scholar] [CrossRef]
Yuan, F.; Li, K.; Wang, C.; Fang, Z. A lightweight network for smoke semantic segmentation. Pattern Recognit. 2023, 137, 109289. [Google Scholar] [CrossRef]
Wang, W.; Liu, X. Research on the Application of Pruning Algorithm Based on Local Linear Embedding Method in Traffic Sign Recognition. Appl. Sci. 2024, 14, 7184. [Google Scholar] [CrossRef]
Chen, W.; Wang, P.; Cheng, J. Towards Automatic Model Compression via a Unified Two-Stage Framework. Pattern Recognit. 2023, 140, 109527. [Google Scholar] [CrossRef]
Kirchhoffer, H.; Haase, P.; Samek, W.; Müller, K.; Rezazadegan-Tavakoli, H.; Cricri, F.; Aksu, E.B.; Hannuksela, M.M.; Jiang, W.; Wang, W.; et al. Overview of the neural network compression and representation (NNR) standard. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3203–3216. [Google Scholar] [CrossRef]
Zu, X.; Li, Y.; Yin, B. Consecutive layer collaborative filter similarity for differentiable neural network pruning. Neurocomputing 2023, 533, 35–45. [Google Scholar] [CrossRef]
Ding, X.; Hao, T.; Tan, J.; Liu, J.; Han, J.; Guo, Y.; Ding, G. ResRep: Lossless CNN pruning via decoupling remembering and forgetting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 4510–4520. [Google Scholar]
Guo, S.; Lai, B.; Yang, S.; Zhao, J.; Shen, F. Sensitivity Pruner: Filter-Level compression algorithm for deep neural networks. Pattern Recognit. 2023, 140, 109508. [Google Scholar] [CrossRef]
Guan, Y.; Liu, N.; Zhao, P.; Che, Z.; Bian, K.; Wang, Y.; Tang, J. DAIS: Automatic channel pruning via differentiable annealing indicator search. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9847–9858. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Rao, X.; Xiao, S.; Zhao, B.; Liu, D. Pruner to Predictor: An efficient pruning method for neural networks compression. In 2022 14th International Conference on Advanced Computational Intelligence (ICACI); IEEE: New York, NY, USA, 2022; pp. 9–14. [Google Scholar]
Gao, S.; Huang, F.; Cai, W.; Huang, H. Network pruning via performance maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 9270–9280. [Google Scholar]
Lee, S.; Song, B.C. Fast filter pruning via coarse-to-fine neural architecture search and contrastive knowledge transfer. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 9674–9685. [Google Scholar] [CrossRef]
Xiao, H.; Wang, Y.; Liu, J.; Huo, J.; Hu, Y.; Wang, Y. APRS: Automatic pruning ratio search using Siamese network with layer-level rewards. Digit. Signal Process 2023, 133, 103864. [Google Scholar] [CrossRef]
Wang, Z.; Li, C.; Wang, X. Convolutional neural network pruning with structural redundancy reduction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 14913–14922. [Google Scholar]
Meng, F.; Cheng, H.; Li, K.; Luo, H.; Guo, X.; Lu, G.; Sun, X. Pruning filter in filter. NeurIPS 2020, 33, 17629–17640. [Google Scholar]
He, Y.; Dong, X.; Kang, G.; Fu, Y.; Yan, C.; Yang, Y. Asymptotic soft filter pruning for deep convolutional neural networks. IEEE Trans. Cybern. 2019, 50, 3594–3604. [Google Scholar] [CrossRef]
Zhang, Y.; Freris, N.M. Adaptive filter pruning via sensitivity feedback. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 10996–11008. [Google Scholar] [CrossRef]
Ruan, X.; Liu, Y.; Li, B.; Yuan, C.; Hu, W. DPFPS: Dynamic and progressive filter pruning for compressing convolutional neural networks from scratch. AAAI Conf. Artif. Intell. 2021, 35, 2495–2503. [Google Scholar] [CrossRef]
Sui, Y.; Yin, M.; Xie, Y.; Phan, H.; Aliari Zonouz, S.; Yuan, B. CHIP: Channel independence-based pruning for compact neural networks. NeurIPS 2021, 34, 24604–24616. [Google Scholar]
Shi, C.; Hao, Y.; Li, G.; Xu, S. VNGEP: Filter pruning based on von Neumann graph entropy. Neurocomputing 2023, 528, 113–124. [Google Scholar] [CrossRef]
Lin, M.; Ji, R.; Wang, Y.; Zhang, Y.; Zhang, B.; Tian, Y.; Shao, L. HRank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 1529–1538. [Google Scholar]
Chang, J.; Lu, Y.; Xue, P.; Xu, Y.; Wei, Z. Iterative clustering pruning for convolutional neural networks. Knowl. Based Syst. 2023, 265, 110386. [Google Scholar] [CrossRef]
Li, J.; Shao, H.; Zhai, S.; Jiang, Y.; Deng, X. A graphical approach for filter pruning by exploring the similarity relation between feature maps. Pattern Recogn Lett. 2023, 166, 69–75. [Google Scholar] [CrossRef]
Yao, K.; Cao, F.; Leung, Y.; Liang, J. Deep neural network compression through interpretability-based filter pruning. Pattern Recogn 2021, 119, 108056. [Google Scholar] [CrossRef]
Geng, L.; Niu, B. Pruning convolutional neural networks via filter similarity analysis. Mach. Learn. 2022, 111, 3161–3180. [Google Scholar] [CrossRef]
Wang, Y.; Guo, S.; Guo, J.; Zhang, J.; Zhang, W.; Yan, C.; Zhang, Y. Towards performance-maximizing neural network pruning via global channel attention. Neural Netw. 2024, 171, 104–113. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Wu, D.; Zhou, W.; Fan, K.; Zhou, Z. EACP: An effective automatic channel pruning for neural networks. Neurocomputing 2023, 526, 131–142. [Google Scholar] [CrossRef]
Lin, M.; Cao, L.; Zhang, Y.; Shao, L.; Lin, C.W.; Ji, R. Pruning networks with cross-layer ranking & k-reciprocal nearest filters. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9139–9148. [Google Scholar]
Yang, L.; Gu, S.; Shen, C.; Zhao, X.; Hu, Q. Skeleton neural networks via low-rank guided filter pruning. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7197–7211. [Google Scholar] [CrossRef]
Qiu, Y.; Niu, L.; Sha, F.; Cheng, Z.; Yanai, K. Entropy-Guided Search Space Optimization for Efficient Neural Network Pruning. Algorithms 2025, 18, 736. [Google Scholar] [CrossRef]
Diao, H.; Li, G.; Xu, S.; Kong, C.; Wang, W.; Liu, S.; He, Y. Self-distillation enhanced adaptive pruning of convolutional neural networks. Pattern Recogn 2025, 157, 110942. [Google Scholar] [CrossRef]
Palakonda, V.; Tursunboev, J.; Kang, J.M.; Moon, S. Metaheuristics for pruning convolutional neural networks: A comparative study. Expert Syst. Appl. 2025, 268, 126326. [Google Scholar] [CrossRef]
Wei, W.; Lu, Q.; Huang, C.; Lee, D.; Luo, J. SymRefine: A symbolic regression approach for refining and compressing neural networks. Neurocomputing 2026, 672, 132719. [Google Scholar] [CrossRef]
Yue, Y.; Liu, H.; Liu, X.; da Porto, F.; Saler, E.; Cui, J.; Donà, M. Efficient on-device damage segmentation for cultural heritage using pruning and knowledge distillation. J. Cult. Herit. 2026, 77, 284–293. [Google Scholar] [CrossRef]
He, W.; Mei, S.; Hu, J.; Ma, L.; Hao, S.; Lv, Z. Filter-Wise Mask Pruning and FPGA Acceleration for Object Classification and Detection. Remote Sens. 2025, 17, 3582. [Google Scholar] [CrossRef]
Chung, Y.L. Efficient Lightweight Image Classification via Coordinate Attention and Channel Pruning for Resource-Constrained Systems. Future Internet 2025, 17, 489. [Google Scholar] [CrossRef]
Cheng, Y.; Zhang, K.; Xu, Z. Matrix Theory; Northwestern Polytechnical University Press: Xi’an, China, 2013. [Google Scholar]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading Digits in Natural Images with Unsupervised Feature Learning; NeurIPS: Granada, Spain, 2011; Volume 2011, p. 4. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, Canada, 2009; Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 10 November 2025).
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25, pp. 1097–1105. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]

Figure 1. Schematic illustration of the linear transformation from the input space to the feature space, where the transformation matrix is represented by the filter matrix, highlighting redundancy induced by correlated filters. The four modules from top to bottom correspond to Step 1: a schematic illustration of the convolution operation in a single convolutional layer, Step 2: reformulating the convolution computation as matrix multiplication, Step 3: constructing a linear transformation by reshaping filters into row vectors, and Step 4: redundancy induced by correlated filters in the resulting transformation.

Figure 2. Schematic illustration of filter redundancy in a convolutional layer. The same input image is processed by three filters,

f_{1}

,

f_{2}

, and

f_{3}

, producing the corresponding output feature maps

O 1

,

O 2

, and

O 3

through convolution. Filters

f_{1}

and

f_{2}

are highly similar and generate nearly identical feature maps, indicating redundancy, whereas filter

f_{3}

produces a distinct feature map.

Figure 2. Schematic illustration of filter redundancy in a convolutional layer. The same input image is processed by three filters,

f_{1}

,

f_{2}

, and

f_{3}

, producing the corresponding output feature maps

O 1

,

O 2

, and

O 3

through convolution. Filters

f_{1}

and

f_{2}

are highly similar and generate nearly identical feature maps, indicating redundancy, whereas filter

f_{3}

produces a distinct feature map.

Figure 3. (a) The

Δ r a n k

corresponding to each filter in the 10th convolution layer of VGG_16. (b) The

Δ n u c l e a r_n o r m

corresponds to each filter in the 10th convolution layer of VGG_16, which is also the independence score in this paper.

Figure 3. (a) The

Δ r a n k

corresponding to each filter in the 10th convolution layer of VGG_16. (b) The

Δ n u c l e a r_n o r m

corresponds to each filter in the 10th convolution layer of VGG_16, which is also the independence score in this paper.

Figure 4. The feature maps corresponding to 16 filters in the first convolutional layer of the ResNet_56 trained on the CIFAR-10 are obtained from any 8 samples, with each row of feature maps coming from the same sample. The numbers in parentheses represent the filter numbers. (a) Filters sorted by independence score. (b) Filters sorted by L2 norm. The feature maps with a smaller L2 norm value but containing more detailed information will be retained, as shown in the red dashed box. The feature map with a larger L2 norm value but only containing contour information will be discarded, as shown in the purple dashed box.

Figure 5. The sorted independence score sequences for the 7th and 11th convolutional layers of VGG_16 trained on CIFAR-10 is shown. The blue line (

η = - 0.2877

) indicates a higher number of high-independence filters, suggesting that a smaller pruning rate should be applied. In contrast, the orange line (

η = 13.8155

) represents more low-independence filters, allowing for a larger pruning rate. The red dashed line indicates the threshold of independence score at 0.4.

Figure 5. The sorted independence score sequences for the 7th and 11th convolutional layers of VGG_16 trained on CIFAR-10 is shown. The blue line (

η = - 0.2877

) indicates a higher number of high-independence filters, suggesting that a smaller pruning rate should be applied. In contrast, the orange line (

η = 13.8155

) represents more low-independence filters, allowing for a larger pruning rate. The red dashed line indicates the threshold of independence score at 0.4.

Figure 6. Diagram of the filter pruning process based on the nuclear norm, consisting of two steps: calculating the independence score and pruning filters based on this score.

Figure 7. Visualization of independence score distributions across layers for different network architectures and datasets. Each subfigure corresponds to a specific network–dataset combination: (a) AlexNet on FashionMNIST, (b) AlexNet on SVHN, (c) VGG_11 on SVHN, (d) VGG_16 on CIFAR-10, (e) VGG_16 on CIFAR-100, (f) ResNet_56 on CIFAR-10, (g) ResNet_56 on CIFAR-100, and (h) ResNet50 on ImageNet. For each configuration, the distribution of layer-wise independence scores is characterized by the pruning rate indicator

η

, with the variance of

η

across layers reported above each plot. Layers exhibiting outlier distributions are highlighted by red dashed ellipses.

Figure 7. Visualization of independence score distributions across layers for different network architectures and datasets. Each subfigure corresponds to a specific network–dataset combination: (a) AlexNet on FashionMNIST, (b) AlexNet on SVHN, (c) VGG_11 on SVHN, (d) VGG_16 on CIFAR-10, (e) VGG_16 on CIFAR-100, (f) ResNet_56 on CIFAR-10, (g) ResNet_56 on CIFAR-100, and (h) ResNet50 on ImageNet. For each configuration, the distribution of layer-wise independence scores is characterized by the pruning rate indicator

η

, with the variance of

η

across layers reported above each plot. Layers exhibiting outlier distributions are highlighted by red dashed ellipses.

Figure 8. Layer-wise sensitivity analysis of VGG_16 on CIFAR-10 by sequentially pruning filters. Each subfigure corresponds to one layer in the network. Filters are removed one by one in ascending order of normalized independence scores. The Top-1 accuracy (green curve) and loss (orange curve) are reported without retraining after each pruning step. The red star denotes the pruning point at which the accuracy first drops below 94%. The baseline Top-1 accuracy before pruning is 94.39%.

Figure 9. Accuracy of pruned VGG_16 on CIFAR-10 with varying pruning strategies.

Figure 10. Efficient Neural Networks on NPU-based IoT Devices.

Table 1. Comparison of representative filter pruning methods for convolutional neural networks.

Method	Methodology	Datasets	Results *	Limitations
HRank (2020) [21]	Pruning filters that produce low-rank feature maps.	CIFAR-10, ImageNet	Reduces FLOPs and parameters by approximately 40–60% with minimal accuracy loss.	Requires feature map rank computation during pruning.
CHIP (2021) [19]	Inter-channel pruning based on filter independence.	CIFAR-10, ImageNet	Reduces model size and FLOPs by approximately 40–50% with slight accuracy improvement.	Requires correlation analysis and is mainly validated on ResNet architectures.
Interpretability-based (2021) [24]	Layer-wise pruning guided by activation maximization.	CIFAR-10, ImageNet	Achieves about 60% model compression with minimal accuracy degradation.	Requires per-layer visualization analysis.
FSABP (2022) [25]	Backward layer-wise pruning based on filter similarity.	MNIST, CIFAR-10, KAGGLE, ILSVRC2012	Effectively reduces parameter redundancy with negligible accuracy change.	Lacks a theoretical justification for similarity-based redundancy estimation.
CLR-RNF (2022) [28]	Non-learning pruning via cross-layer ranking and reciprocal nearest filter selection.	CIFAR-10, ImageNet	Removes 60–95% of FLOPs and parameters with minor accuracy degradation.	Lacks theoretical explanation for the pruning criterion.
VNGEP (2023) [20]	Graph-based pruning using von Neumann graph entropy.	CIFAR-10, ImageNet	Reduces model size and FLOPs by approximately 40–50% while maintaining accuracy.	Graph construction and entropy computation introduce additional overhead.
ICP (2023) [22]	Iterative clustering-based pruning with knowledge transfer.	CIFAR-10, CIFAR-100, ImageNet, PASCAL VOC	Reduces parameters and FLOPs with minor or no accuracy loss.	Iterative pruning and distillation increase training complexity.
Graph-based One-shot (2023) [23]	One-shot filter pruning using graph-based similarity modeling.	CIFAR-10	Achieves large reductions (approximately 60–90%) with minimal accuracy loss.	Graph construction introduces additional computational cost.
EACP (2023) [27]	Automatic channel pruning via hierarchical clustering and optimization.	CIFAR-10, ILSVRC2012	Reduces FLOPs by over 50% with maintained or slightly improved accuracy.	Relies on clustering and iterative optimization procedures.
SkeletonNN (2023) [29]	Iterative pruning with low-rank regularization.	MNIST, CIFAR-10, ILSVRC2012	Achieves high pruning rates with maintained or improved accuracy.	Alternating training and pruning increase computational cost.
GlobalPru (2024) [26]	Static channel pruning via global attention-based ranking.	ImageNet, SVHN, CIFAR-10/100	Achieves high compression ratios with strong performance.	May suffer from sample bias and introduces additional computation and memory overhead.
EGSSO (2025) [30]	Entropy-guided layer-wise pruning with search space optimization.	COCO, VisDrone	Improves accuracy while reducing computational cost.	Entropy computation and iterative search add overhead.
SDAP (2025) [31]	Adaptive channel pruning integrated into training using channel gates and self-distillation.	CIFAR-10/100, ImageNet	Removes at least 75% of parameters while maintaining or improving accuracy.	Introduces additional distillation loss and trainable pruning parameters.
FMP (2025) [35]	Filter-wise mask pruning with structural constraints and FPGA-oriented acceleration.	CIFAR-10, Mini-ImageNet, Aerial datasets	Achieves high pruning rates (50–85%) with negligible accuracy loss and effective hardware speedup.	Requires auxiliary mask learning and hardware-specific design.
CA + Pruning (2025) [36]	Pruning-aware coordinate attention combined with L1-regularized channel pruning.	CIFAR-10, Fashion-MNIST, GTSRB	Reduces parameters and FLOPs while maintaining or slightly improving accuracy.	Introduces attention overhead without explicitly modeling channel redundancy.

* Results are reported as described in the original papers.

Table 2. Summary of benchmark datasets used in our experiments.

Dataset	Training Samples	Testing Samples	Classes	Image Size
FashionMNIST	60,000	10,000	10	$28 \times 28$ (grayscale)
SVHN	73,257	26,032	10	$32 \times 32$ (RGB)
CIFAR-10	60,000	10,000	10	$32 \times 32$ (RGB)
CIFAR-100	50,000	10,000	100	$32 \times 32$ (RGB)
ImageNet	1,200,000	50,000	1000	$224 \times 224$ (RGB)

Table 3. Performance of pruned networks with different pruning rates: AlexNet on FashionMNIST and AlexNet and VGG_11 on SVHN.

Dataset	Model	Top1_Acc% (Δ)	#Params (↓%)	#FLOPs (↓%)
Fashion MNIST	AlexNet	92.38 (0)	1.87 M (0)	238 M (0)
		92.26 (−0.12)	1.51 M (19)	198 M (17)
		92.12 (−0.26)	1.19 M (36)	152 M (36)
		92.07 (−0.31)	0.92 M (51)	117 M (51)
		92.19 (−0.19)	0.68 M (64)	85.8 M (64)
		92.05 (−0.33)	0.48 M (75)	60.4 M (75)
		92.03 (−0.35)	0.30 M (84)	38.4 M (84)
		91.73 (−0.65)	0.17 M (91)	21.7 M (91)
		91.69 (−0.69)	0.08 M (96)	9.79 M (96)
		91.01 (−1.37)	0.02 M (99)	2.52 M (99)
SVHN	AlexNet	93.52 (0)	1.87 M (0)	245 M (0)
		93.78 (0.26)	1.51 M (19)	198 M (19)
		93.83 (0.31)	1.20 M (36)	157 M (36)
		93.69 (0.17)	0.92 M (51)	121 M (51)
		93.79 (0.27)	0.68 M (64)	89.7 M (63)
		93.68 (0.16)	0.48 M (74)	63.7 M (74)
		93.55 (0.03)	0.30 M (84)	41.0 M (83)
		93.42 (−0.1)	0.17 M (91)	23.7 M (90)
		93.46 (−0.06)	0.08 M (96)	11.1 M (95)
		91.84 (−1.68)	0.02 M (99)	3.14 M (99)
	VGG_11	94.71 (0)	9.23 M (0)	153 M (0)
		94.85 (0.14)	7.45 M (19)	124 M (19)
		94.80 (0.09)	5.89 M (36)	98.0 M (36)
		94.87 (0.16)	4.51 M (51)	75.2 M (51)
		94.76 (0.05)	3.32 M (64)	55.4 M (64)
		94.61 (−0.1)	2.31 M (75)	38.9 M (75)
		94.73 (0.02)	1.47 M (84)	24.8 M (84)
		94.52 (−0.19)	0.83 M (91)	14.1 M (91)
		93.92 (−0.79)	0.37 M (96)	6.37 M (96)
		92.08 (−2.63)	0.09 M (99)	1.67 M (99)

Table 4. Performance comparison of pruned VGG_16, ResNet_56, and ResNet_110 on CIFAR-10 with different pruning methods.

Model	Method	Top-1 Accuracy (%)			#Params (↓%)	#FLOPs (↓%)
Model	Method	Baseline	Pruned	$Δ$	#Params (↓%)	#FLOPs (↓%)
VGG_16	CHIP1 [19]	93.96	93.86	−0.1	2.76 M (81.6)	131.17 M (58.1)
	VNGEP1 [20]	93.96	94.33	0.37	2.76 M (81.6)	131.17 M (58.1)
	Li et al.1 [23]	93.96	92.59	−1.37	1.38 M (90.8)	125.25 M (60)
	ours (p = 0.73)	94.39	94.09	−0.3	1.13 M (92.4)	111.89 M (64.43)
	CHIP2 [19]	93.96	93.72	−0.24	2.50 M (83.3)	104.78 M (66.6)
	VNGEP2 [20]	93.96	93.95	−0.01	2.50 M (83.3)	104.78 M (66.6)
	EACP1 [27]	93.26	93.29	0.03	3.62 M (75.4)	88.64 M (71.77)
	ours (p = 0.76)	94.39	93.57	−0.82	0.87 M (94.2)	84.47 M (73.15)
	Li et al.2 [23]	93.96	90.92	−3.04	0.7 M (95.3)	82.65 M (73.6)
	CLR-RNF [28]	-	93.32	-	0.74 M (95.0)	81.31 M (74.1)
	EACP2 [27]	93.26	93.2	−0.06	3.36 M (77.19)	71.06 M (77.37)
	CHIP3 [19]	93.96	93.18	−0.78	1.90 M (87.3)	66.95 M (78.6)
	VNGEP3 [20]	93.96	93.3	−0.66	1.90 M (87.3)	66.95 M (78.6)
	ours (p = 0.76)	94.39	93.16	−1.23	0.65 M (95.67)	65.9 M (79.05)
ResNet_56	VNGEP1 [20]	93.26	94.28	1.02	0.66 M (22.3)	90.35 M (28)
	Li et al.1 [23]	93.26	92.88	−0.38	0.57 M (33.7)	80.77 M (35.8)
	ours (p = 0.23)	94.69	94.23	−0.46	0.5 M (41.21)	80.64 M (36.81)
	CHIP1 [19]	93.26	94.16	0.9	0.48 M (42.8)	65.94 M (47.4)
	VNGEP2 [20]	93.26	93.65	0.39	0.48 M (42.8)	65.94 M (47.4)
	ours (p = 0.33)	94.69	94.14	−0.55	0.38 M (55.54)	64.34 M (49.58)
	Li et al.2 [23]	93.26	92.42	−0.84	0.48 M (44)	62.95 M (49.9)
	CLR-RNF [28]	-	93.27	-	0.38 M (55.5)	54 M (57.3)
	EACP [27]	93.2	93.11	−0.09	0.31 M (63.5)	40.44 M (68)
	CHIP2 [19]	93.26	92.05	−1.21	0.24 M (71.8)	34.79 M (72.3)
	VNGEP3 [20]	93.26	92.63	−0.63	0.24 M (71.8)	34.79 M (72.3)
	ours (p = 0.48)	94.69	92.44	−2.25	0.21 M (74.85)	32.55 M (74.5)
ResNet_110	CHIP1 [19]	93.5	94.44	0.94	0.89 M (48.3)	121.09 M (52.1)
	VNGEP1 [20]	93.5	94.57	1.07	0.89 M (48.3)	121.09 M (52.1)
	ours (p = 0.34)	95.07	94.34	−0.73	0.75 M (56.6)	115.45 M (55.09)
	Li et al.1 [23]	93.5	92.97	−0.53	0.64 M (63.1)	107.64 M (57.5)
	VNGEP2 [20]	93.5	94.22	0.72	0.72 M (58.1)	101.97 M (59.6)
	ours (p = 0.39)	95.07	94.22	−0.85	0.65 M (62.27)	99.9 M (61.14)
	EACP1 [27]	93.3	93.39	0.09	0.67 M (61.3)	87.36(65.7)
	CLR-RNF [28]	-	93.71	-	0.53 M (69.1)	86.8 M (66)
	ours (p = 0.42)	95.07	93.7	−1.37	0.55 M (68.23)	82.49 M (67.92)
	EACP2 [27]	93.3	93.35	0.05	0.59 M (65.9)	80.9 M (68.3)
	Li et al.2 [23]	93.5	92.72	−0.78	0.51 M (70.4)	74.81 M (70.4)
	VNGEP3 [20]	93.5	93.72	0.22	0.54 M (68.3)	71.69 M (71.6)
	CHIP2 [19]	93.5	93.63	0.13	0.54 M (68.3)	71.69 M (71.6)
	ours (p = 0.48)	95.07	93.44	−1.63	0.43 M (74.87)	65.43 M (74.55)

Table 5. Performance comparison of pruned VGG_16 and ResNet_56 on CIFAR-100 with different pruning methods, and performance of pruned ResNet_110 on CIFAR-100 with different pruning rates.

Model	Method	Top-1 Accuracy (%)			#Params (↓%)	#FLOPs (↓%)
Model	Method	Baseline	Pruned	$Δ$	#Params (↓%)	#FLOPs (↓%)
VGG_16	APRS1 [13]	73.77	73.71	−0.06	7.21 M (52)	220.26 M (29.8)
	ours (p = 0.4)	74.74	74.92	0.18	5.79 M (61.5)	191.32 M (39.2)
	SP [8]	72.44	72.78	0.34	-	173 M (45)
	ASTER [17]	73.45	73.95	0.5	-	161 M (48.4)
	APRS2 [13]	73.77	73.02	−0.75	5.42 M (64)	156.44 M (50.1)
	ours (p = 0.61)	74.74	74.12	−0.62	2.81 M (81.3)	131.42 M (58.2)
	APRS3 [13]	73.77	70.79	−2.98	4.93 M (67.2)	93.44 M (70.2)
	ours (p = 0.64)	74.74	73.86	−0.88	2.42 M (83.9)	92.32 M (70.7)
ResNet_56	ASTER1 [17]	72.32	72.23	−0.09	-	92(26.7)
	ours (p = 0.12)	72.72	71.7	−1.02	0.68 M (21.3)	91.88 M (28)
	ASTER2 [17]	72.32	71.78	−0.54	-	58.4 M (53.6)
	ICP [22]	71.36	70.92	−0.44	-	55.67 M (56.2)
	ours (p = 0.26)	72.72	69.15	−3.57	0.47 M (45.4)	55.19 M (56.8)
ResNet_110	ours (p = 0.07)	74.42	74.08	−0.34	1.49 M (13.9)	208.05 M (19.1)
	ours (p = 0.14)	74.42	73.03	−1.39	1.3 M (25)	169.59 M (34)
	ours (p = 0.21)	74.42	72.46	−1.96	1.05 M (39.3)	142.09 M (44.7)

Table 6. Performance comparison of pruned ResNet50 on ImageNet with different pruning methods.

Method	Top-1 Accuracy (%)			Top-5 Accuracy (%)			#Params (↓%)	#FLOPs (↓%)
Method	Baseline	Pruned	$Δ$	Baseline	Pruned	$Δ$	#Params (↓%)	#FLOPs (↓%)
CLR-RNF [28]	-	74.85	-	−	92.31	-	16.92 M (33.8)	2.45 G (40.68)
VNGEP1 [20]	76.15	76.4	0.25	92.87	92.99	0.12	15.09 M (40.8)	2.28 G (44.8)
CHIP1 [19]	76.15	76.3	0.15	92.87	93.02	0.15	15.09 M (40.8)	2.28 G (44.8)
ours (p = 0.17)	76.15	76.54	0.39	92.87	93.14	0.27	15.09 M (40.8)	2.28 G (44.8)
VNGEP2 [20]	76.15	75.32	−0.83	92.87	92.51	−0.36	11.05 M (56.7)	1.54 G (62.8)
CHIP2 [19]	76.15	75.26	−0.89	92.87	92.53	−0.34	11.05 M (56.7)	1.54 G (62.8)
ours (p = 0.32)	76.15	75.39	−0.76	92.87	92.43	−0.44	11.05 M (56.7)	1.54 G (62.8)

Table 7. The performance of VGG_16 on CIFAR-10 corresponding to different pruning strategies.

$p_{0}$ – $p_{13}$	Acc (%)	#Params (↓%)	#FLOPs (↓%)
0.3, 0, 0, 0, 0, 0, 0, 0, 0.2, 0.4, 0.5, 0.8, 0.9, 0.6	94.06	6.95 M (53.67)	251.05 M (20.19)
0.56, 0, 0, 0, 0, 0, 0, 0, 0.56, 0.56, 0.56, 0.56, 0.56, 0.56	90.99	5.84 M (61.08)	218.28 M (30.61)
0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37	18.95	5.98 M (60.13)	126.85 M (59.68)
0.24, 0.63, 0.63, 0.63, 0.63, 0.63, 0.63, 0.63, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24	10	6.72 M (55.2)	85.49 M (72.83)
0.1, 0.89, 0.89, 0.89, 0.89, 0.89, 0.89, 0.89, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1	10	8.16 M (45.6)	66.1 M (78.99)

Table 8. Performance comparison of original and pruned networks on NPU.

Network	Weight Memory (B)			Total Time (s)
Network	Original	Pruned	RDI	Original	Pruned	RDI
VGG_16	28.59 M	1.67 M	−94%	1.67	0.26	−84%
ResNet_56	1.64 M	0.24 M	−85%	0.38	0.19	−50%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Bie, H.; Jing, Z.; Zhi, Y.; Fan, Y.; Ma, W. Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI. Electronics 2026, 15, 794. https://doi.org/10.3390/electronics15040794

AMA Style

Wang J, Bie H, Jing Z, Zhi Y, Fan Y, Ma W. Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI. Electronics. 2026; 15(4):794. https://doi.org/10.3390/electronics15040794

Chicago/Turabian Style

Wang, Jiali, Hongxia Bie, Zhao Jing, Yichen Zhi, Yongkai Fan, and Wentao Ma. 2026. "Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI" Electronics 15, no. 4: 794. https://doi.org/10.3390/electronics15040794

APA Style

Wang, J., Bie, H., Jing, Z., Zhi, Y., Fan, Y., & Ma, W. (2026). Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI. Electronics, 15(4), 794. https://doi.org/10.3390/electronics15040794

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI

Abstract

1. Introduction

2. Related Work

3. Methodology: Filter Independence Modeling

3.1. Filter Redundancy Analysis

3.2. Filter Independence Evaluation Based on Nuclear Norm

3.3. Computational Complexity

3.4. Independent Submatrix Estimation

4. Independence-Aware Pruning Strategy

4.1. Relationship Between Pruning Rate and Filter Independence

4.2. Independence-Aware Structured Pruning Algorithm

5. Experimental Results

5.1. Experimental Setup

5.2. Performance Evaluation of Pruned Networks

5.3. Comparison with State-of-the-Art Pruning Methods

5.4. Sensitivity Analysis of the Pruning-Rate Indicator $η$

5.5. Layer-Wise Pruning Results for VGG _16

5.6. Efficient Deployment Results on NPUs

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI

Abstract

1. Introduction

2. Related Work

3. Methodology: Filter Independence Modeling

3.1. Filter Redundancy Analysis

3.2. Filter Independence Evaluation Based on Nuclear Norm

3.3. Computational Complexity

3.4. Independent Submatrix Estimation

4. Independence-Aware Pruning Strategy

4.1. Relationship Between Pruning Rate and Filter Independence

4.2. Independence-Aware Structured Pruning Algorithm

5. Experimental Results

5.1. Experimental Setup

5.2. Performance Evaluation of Pruned Networks

5.3. Comparison with State-of-the-Art Pruning Methods

5.4. Sensitivity Analysis of the Pruning-Rate Indicator η

5.5. Layer-Wise Pruning Results for VGG _16

5.6. Efficient Deployment Results on NPUs

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.4. Sensitivity Analysis of the Pruning-Rate Indicator $η$