1. Introduction
Neural networks have demonstrated remarkable performance across a wide range of applications. However, their substantial computational and memory demands pose significant challenges for deployment on resource-constrained edge devices. In many practical scenarios, such as automatic modulation classification [
1], smoke segmentation [
2] and traffic sign recognition [
3] models with efficient inference and compact architectures are critically required. Consequently, reducing computational cost and model parameters has become a key research direction for developing practical and deployable neural networks in real-world applications.
Network pruning reduces computational and storage costs by removing redundant parameters while preserving model performance. It can be applied independently or in combination with quantization to achieve effective deep model compression [
4,
5]. Pruning methods are typically categorized as unstructured or structured. Unstructured pruning removes individual low-importance weights, creating sparse matrices that often require specialized hardware or software. Structured pruning removes entire parameter groups, such as filters or channels, producing dense architectures compatible with standard frameworks and hardware. It is therefore better suited for deployment on resource-constrained devices.
Regardless of whether pruning is unstructured, targeting individual weights, or structured, operating on parameter groups, a fundamental step in all pruning strategies is the evaluation of parameter importance. Accurate identification of parameters that contribute minimally to model performance enables pruning with limited accuracy degradation. Existing approaches for assessing parameter importance can be broadly categorized into two main classes.
The first category evaluates filter importance by designing networks that learn importance scores either concurrently with training or in an alternating manner, where filters correspond to groups of convolutional kernels associated with output channels. Representative approaches include predicting filter importance [
6,
7], selecting channel saliency [
8,
9], evaluating subnetwork performance [
10,
11], and searching for optimal substructures [
12]. In addition, reinforcement learning-based methods employ agents to autonomously discover effective compression strategies [
13]. Despite their effectiveness, these approaches generally increase training complexity and often lack clear interpretability regarding how the learned importance scores relate to the network’s underlying representations.
The second category evaluates filter importance based on network parameters or corresponding feature maps, thereby providing an explicit measure of each filter’s contribution. Parameter-based methods are conceptually simple and include criteria such as weight magnitude [
14,
15], weight norms [
16], Batch Normalization scaling factors [
17], and filter contributions to the loss [
18]. However, because correlations among filters are typically ignored, these methods may produce inaccurate importance rankings and consequently misidentify redundant filters.
In contrast, feature-map-based approaches explicitly account for inter-filter correlations [
19,
20], which improves redundancy detection. Nevertheless, these methods require storing feature maps during inference or training, leading to increased computational overhead and a strong dependence on the dataset, hyperparameter settings, and experimental repeatability.
A novel filter pruning method that explicitly accounts for inter-filter correlations is presented. Filter importance is quantified using an independence score, defined as the difference in the nuclear norm of the weight matrix before and after masking a given filter. Furthermore, the distribution of independence scores within each layer is analyzed to guide the selection of layer-wise pruning rates. Based on this criterion, an efficient nuclear-norm-based pruning algorithm is developed, and performance is preserved through retraining. Deployment on neural processing units (NPUs) demonstrates substantial reductions in memory consumption and inference latency, highlighting the suitability of the proposed approach for on-device AI applications.
The primary contributions of this work are summarized as follows:
- 1.
A structured pruning framework is introduced that leverages the nuclear norm as an independence score for filter evaluation. Unlike existing nuclear-norm-based pruning methods that focus on feature map low-rankness or intra-layer redundancy, this framework explicitly quantifies inter-filter dependencies directly from network weights without requiring feature map computation.
- 2.
Theoretical analysis indicates that the differential nuclear norm of a filter is positively correlated with its independence. On this basis, a filter independence evaluation criterion is established to identify filters that are critical for preserving the representational capacity of the network.
- 3.
A unified redundancy analysis applicable to both convolutional and fully connected layers is developed. By modeling correlations among filters or neurons at the weight level, the proposed independence assessment enables effective identification of redundant structures that extract highly similar representations.
- 4.
A layer-wise pruning indicator is designed by characterizing the distribution of filter independence scores. Specifically, the ratio between mid-range values and the median is utilized to estimate the proportion of low-independence filters, thereby enabling adaptive pruning rates across different layers.
2. Related Work
Filter pruning typically involves three stages: evaluating filter importance, removing less important filters, and retraining the pruned network to reduce accuracy loss. In the first stage, selecting appropriate criteria for filter evaluation is critical. In the later stages, determining layer-wise pruning rates is essential, as variations across layers can significantly affect the pruning outcome. The following discussion reviews these stages.
Feature map correlation. Feature-map-based methods assess filter importance by analyzing similarities or correlations among the corresponding feature maps. A variety of metrics have been explored to quantify these relationships, including rank estimation [
21], nuclear norm analysis [
19], cosine similarity [
20,
22], Hamming distance [
23], histogram intersection [
24], and correlation coefficients derived from inner products [
25]. More recently, cross-sample strategies have been investigated to further improve importance estimation, among which a global channel attention-based static pruning framework learns a unified channel ranking across samples [
26].
Filter similarity. Filter importance is evaluated using geometric or statistical relationships among filters. Clustering methods retain filters based on k-means or similar strategies [
27], while distance-based approaches identify less important filters via pairwise distances [
28]. Low-rank constraints have been used to enforce linear independence among filter groups [
29]. More recent studies adopt information-theoretic or embedding-based criteria, such as entropy-guided pruning on CIFAR-10 [
30].
Layer pruning rate. Early one-shot methods typically adopt fixed or manually tuned rates without explicit layer-wise analysis [
19,
21]. Adaptive strategies learn layer-wise rates during training using reinforcement learning or differentiable search techniques [
13]. Iterative frameworks, in contrast, adjust rates progressively through repeated pruning and retraining [
16]. More recently, adaptive frameworks enhanced with self-distillation have been applied to achieve substantial parameter reduction while maintaining accuracy [
31].
In addition, metaheuristic-based pruning approaches have been studied, highlighting both their effectiveness and computational overhead [
32]. Symbolic regression-based refinement methods have also been explored for neural network compression [
33]. Structured pruning has further been combined with deployment-oriented strategies—such as knowledge distillation [
34], FPGA acceleration [
35], and lightweight attention mechanisms [
36]—to enable efficient inference on mobile and resource-constrained platforms.
As summarized in
Table 1, feature-map-based methods incur substantial computational overhead, whereas filter-based approaches often lack a principled explanation of redundancy. In addition, layer-wise pruning rate design is frequently treated as a heuristic or auxiliary optimization problem. These limitations motivate the development of a unified framework that explicitly models filter dependencies at the weight level and utilizes this information to guide both filter selection and layer-wise pruning decisions.
3. Methodology: Filter Independence Modeling
This section analyzes filter group redundancy arising from correlations induced by linear transformations and introduces a method for assessing filter independence using the nuclear norm. Filters with higher independence are considered more important and are therefore retained during the pruning process. In addition, a procedure is proposed to identify filter submatrices that exhibit high independence.
3.1. Filter Redundancy Analysis
Let
denote the weights of the
i-th convolutional layer, where
is the number of output channels and
k is the size of the convolutional kernel. The weights are reshaped into a filter matrix
, which allows the convolution operation to be represented as a series of matrix multiplications. The product
performs the transformation from the input space to the feature space, with
F serving as the transformation matrix, as illustrated in
Figure 1. The first three steps of the figure demonstrate how the convolution kernels are represented as a transformation matrix, thereby enabling a linear mapping from the input space to the feature space.
A similar representation applies to fully connected layers, which also perform matrix multiplication. In a fully connected layer with n outputs, each neuron corresponds to a row of the filter matrix F, containing the weights connecting all input units to that output. The filter matrix F is used to represent fully connected layer parameters, analogous to its role in convolutional layers. Consequently, both convolutional and fully connected layers perform feature extraction through linear transformations from the input space to the feature space. The independence assessment and pruning strategy developed for convolutional filters can thus be directly applied to neurons in fully connected layers, ensuring that the proposed framework systematically covers the entire network.
As the transformation matrix for the feature space, the filter matrix is closely related to the features it generates. Specifically, the feature space corresponds to the column space of the filter matrix
F, with its column vectors spanning the space. The column space and the null space of a matrix are orthogonal, and the sum of their dimensions equals the number of rows of the matrix. When the row vectors of
F are correlated, the dimension of the null space increases, thereby reducing the dimension of the column space, i.e., the feature space. Consequently, features extracted by correlated filters are redundant, as illustrated in step 4 of
Figure 1.
When considering a single filter , , the convolution operation reduces to a vector inner product , compressing the input space into a one-dimensional feature space. In contrast, the filter matrix F transforms the input space into a feature space of dimension , which generally exceeds one-dimensional projections. Therefore, the filter matrix encodes more comprehensive and interrelated information, making its analysis more informative than assessing the importance of individual filters alone. Redundancy within the filter matrix can thus be interpreted as structural redundancy, providing valuable guidance for pruning.
Building on the preceding analysis, filter importance is assessed by examining correlations among filters. The central idea is to rank filters according to their degree of correlation with the rest of the layer. Specifically, a filter exhibiting high correlation is likely to be linearly dependent on other filters. In this case, the filter can be represented as a linear combination of the remaining filters, and the corresponding feature map can be approximated as a weighted combination of the feature maps generated by the other filters. Consequently, pruning a highly correlated filter has minimal impact, as its extracted feature map can be largely preserved or reconstructed from the outputs of the remaining filters.
To illustrate this intuition,
Figure 2 presents a simplified example. Filters
and
are highly similar, producing nearly identical feature maps
and
, whereas filter
extracts distinct features, resulting in a feature map
that differs substantially. This example demonstrates that highly similar filters generate redundant features, which can be exploited for pruning.
3.2. Filter Independence Evaluation Based on Nuclear Norm
Before presenting the formal definition, it should be emphasized that the proposed method imposes no additional assumptions or constraints beyond the standard convolutional and fully connected layer settings. The formulation relies solely on layer weights, rendering it architecture-agnostic and generally applicable across different CNN models. With these general settings established, a metric for quantifying filter correlations is introduced.
Compared with the dimension of the column space, the basis vectors encode more comprehensive and richer spatial information. While the rank of F indicates only the dimensionality of the feature space, the nuclear norm of F, defined as the sum of its singular values, reflects the scale of the basis vectors within the column space. The nuclear norm of the filter matrix F is thus employed to represent the effective size of the feature space. This metric captures additional spatial information, providing a more informative measure of correlations among filters.
Since the filter
corresponds to the
j-th row vector of
F, the following decomposition holds:
Here,
denotes the filter matrix obtained by excluding
. This decomposition separates the contribution of
from that of the remaining filters, which serves as the foundation for defining the filter independence metric.
Suppose that
admits a singular value decomposition
, where
. Then,
. The columns of
form a basis for
. Since
lies in this space, it can be expressed as
Consequently,
where
. This decomposition separates the contribution of
along the basis defined by
V, which underpins the subsequent definition of the filter independence score.
Let the singular values of
F be
. Then, the eigenvalues of
are given by
. According to the perturbation theorem (Matrix Theory [
37]), for any eigenvalue
of
, there exists an eigenvalue
of
such that
Consequently, taking square roots yields
This analysis indicates that the singular values of F and differ by at most the largest absolute coefficient in the representation of with respect to the basis of . Consequently, the contribution of a highly correlated filter to the overall feature space, as quantified by the nuclear norm, is relatively small. This observation provides a principled justification for assigning lower importance to such filters during pruning.
When is uniformly projected onto the column space of , its largest projection component is small, resulting in only a minor difference between the singular values of F and . In contrast, when is non-uniformly projected, the magnitude of the largest projection component increases. If becomes aligned with a basis vector of the column space of , the projection component attains its maximum, leading to a substantial difference between the singular values of F and . This behavior provides an intuitive explanation for why highly independent filters, which are less aligned with other filters, contribute more significantly to the feature space.
Thus, the difference between the singular values of F and provides a measure of the degree of independence of relative to the other filters in . A filter that induces a larger difference is more closely aligned with a basis vector of the row space of , indicating greater independence from the remaining filters. Motivated by this observation, the following definition of filter independence is introduced.
Definition 1. Let denote the filter matrix of the i-th convolutional layer. The independence score of a filter , , relative to the other filters in , is defined aswhere denotes the nuclear norm, ⊙
is the Hadamard product, and is a row-wise mask matrix with the j-th row set to zeros and all other rows set to ones. Specifically, a filter
is masked from the filter matrix
F, and the resulting change in the nuclear norm between the modified and original matrices is computed. One advantage of employing the nuclear norm difference to characterize filter correlation is its ability to capture fine-grained variations. As illustrated in
Figure 3a, using the rank difference to evaluate filter independence produces only two extreme values—0 for correlated filters and 1 for independent filters—thereby limiting its sensitivity to subtle variations. In contrast, the nuclear norm difference provides a more sensitive measure, enabling a finer distinction of filter independence, as shown in
Figure 3b.
The independence score
quantifies the correlation between a filter
and the remaining filters in
, with larger values indicating lower substitutability. Consequently, independence scores can be computed for all filters in a network layer to accurately identify less important filters for pruning. As illustrated in
Figure 4, feature maps extracted by filters sorted according to L2 norm and independence score are compared. The feature map corresponding to filter (7), which contains fine-grained details (highlighted by the red dotted box), is discarded when pruning based on the L2 norm but retained when pruning based on the independence score. Conversely, the feature map corresponding to filter (4), which primarily encodes contour information (highlighted by the purple dotted box), exhibits the opposite behavior.
3.3. Computational Complexity
For the
i-th convolutional layer, all filters are arranged into a matrix
. The nuclear norm is computed via singular value decomposition (SVD), which has a computational complexity of
. Since the nuclear norm of the full filter matrix,
, is shared across all filters, it needs to be computed only once. In contrast, the masked nuclear norm,
, must be computed for each of the
filters. Therefore, the total complexity for computing the independence scores of all filters in the
i-th layer is
In typical convolutional neural networks, the quantity
is often much larger than
, allowing the overall complexity to be approximated as
It is worth noting that this computation is performed only once per layer during the offline pruning stage and is independent of the spatial resolution of the feature maps. Consequently, unlike feature-map-based pruning methods that require repeated forward passes over large activation tensors, the proposed approach introduces no additional overhead during training or inference. In practice, its computational cost is negligible compared with the cumulative cost of convolution operations.
3.4. Independent Submatrix Estimation
This subsection focuses on estimating the filter submatrix with high independence among the numerous candidate submatrices , when c filters are to be pruned from the i-th convolutional layer. Here, is a multi-row mask matrix in which the rows are set to zeros and all other rows are set to ones.
Theoretically, the filter submatrix can be constructed iteratively. First, the independence scores are computed for all filters. The filter with the lowest score, , is then masked to obtain . Following this procedure, is computed for the remaining filters, and the filter with the lowest score, , is masked to form . This process is repeated for c iterations to obtain .
However, this iterative approach requires computing a large number of independence scores, resulting in significant computational cost. To mitigate this, can be used to estimate the relative magnitudes of , substantially reducing the computation in practice. Specifically, for filters , if , then . For instance, in a convolutional layer containing 500 filters where 100 filters are to be pruned, this estimation method reduces the total number of independence score computations from 45,000 (using the full iterative procedure) to only 500.
Definition 2. Let denote the filter matrix of the i-th convolutional layer, from which c filters are to be pruned. The submatrix with maximal independence, , consists of the remaining filters corresponding to the highest independence scores, i.e., . Formally, Here, , , denotes the filter with the k-th highest independence score. Note that the independence of the filter matrix is invariant to the ordering of its filters. Definition 2 can be formally justified via mathematical induction, utilizing the previously discussed estimation method.
Proof. For the base case
, we have
If
is minimal, then
is maximal.
Assume that for
, the submatrix
holds. For
, the submatrix satisfies
Similarly, if
is minimal, then
is maximal. In practice,
can be used to estimate
, completing the induction step. □
4. Independence-Aware Pruning Strategy
This section first examines layer-wise pruning rates based on the independence of filters within each layer, providing insight into how the distribution of independence scores influences pruning decisions. Building on this analysis, a pruning method is proposed that optimizes filter selection according to the computed independence scores.
4.1. Relationship Between Pruning Rate and Filter Independence
Manually assigning layer-wise pruning rates has several limitations. Balancing the pruning rate and network performance is challenging, often resulting in either overly conservative pruning or noticeable accuracy degradation. Moreover, the trial-and-error process can be computationally expensive. A more efficient approach is to adopt customized pruning rates informed by available data. Since the independence score of each filter has been computed for all layers and layers are treated independently, the relationship between the distribution of layer-wise independence scores and the corresponding pruning rates is investigated.
The analysis is performed on the sequence of independence scores , which contains all filter scores within a given layer. A predominance of low independence scores indicates strong correlations among the filters and high redundancy, suggesting a higher pruning rate for that layer. Conversely, layers with more filters exhibiting high independence scores warrant lower pruning rates. This approach is designed to minimize performance degradation. By assigning distinct pruning rates to each layer based on the distribution of independence scores, a favorable trade-off between compression and accuracy is achieved, enabling high overall pruning rates while preserving model performance.
To quantify the distribution of independence scores within a layer and guide the corresponding pruning rate, a ratio is introduced to capture the relative spread of the independence scores.
Definition 3. For a layer with independence score sequence , the ratio η is defined aswhere ε is a small constant added to avoid division by zero. The logarithm is applied to compress the range of the ratio while preserving relative differences. As illustrated in
Figure 5, the ratio
effectively characterizes the distribution of the independence scores within a layer. A larger
indicates a predominance of filters with low independence scores (orange line), corresponding to higher redundancy and suggesting a higher layer pruning rate. Conversely, a smaller
reflects a greater number of highly independent filters (blue line), implying lower redundancy and a smaller pruning rate. Thus,
provides a principled criterion for guiding layer-wise pruning. In the figure, the 7th and 11th layers are presented as representative examples to demonstrate variations in score distributions across layers.
A layer-specific pruning strategy can be derived from each layer’s value. When the values across layers are closely clustered, it suggests similar redundancy levels, warranting uniform pruning rates. Conversely, a dispersed distribution of indicates heterogeneous redundancy, motivating non-uniform pruning rates across layers. The variance of the sequence is commonly used to quantify distribution dispersion. For instance, on CIFAR-10, the variance is 37.6463 for VGG_16 and 0.1075 for ResNet_56, implying that VGG_16 benefits from non-uniform pruning, whereas ResNet_56 is more suited to uniform pruning. It should be noted, however, that variance primarily reflects general trends and does not account for outliers.
4.2. Independence-Aware Structured Pruning Algorithm
Inspired by the concept of linear transformation, redundant filters in the network are analyzed by assessing filter independence using the nuclear norm. The resulting independence scores serve a dual purpose: evaluating filter importance and guiding the pruning strategy. Following this principle, a nuclear-norm-based filter pruning algorithm is developed, as illustrated in
Figure 6. In the left panel, the input consists of the convolutional layer weights, and the output is the set of computed filter independence scores along with the corresponding
. In the right panel, the input is the same set of weights, and the output is the pruned weights obtained by removing redundant filters according to the computed independence scores and
.
The proposed method computes each filter’s independence score by measuring the change in the nuclear norm. Filters with low independence scores, whose features can be largely represented by other filters, are pruned with minimal impact on network performance. The sequence of independence scores within each layer is then summarized using to guide the pruning strategy. Across the network, the values of all layers are normalized, and layers with higher are assigned higher pruning rates, enabling adaptive, layer-specific pruning.
The network’s overall pruning rate is defined as the weighted sum of the layer-wise pruning rates, where the weights correspond to each layer’s proportion of filters. Formally, it is expressed as , where L is the total number of layers, denotes the proportion of filters in the i-th layer relative to the entire network, and represents the pruning rate of the i-th layer.
5. Experimental Results
This section presents the experimental design, implementation details, and results. The primary objective is to evaluate the effectiveness of the proposed method and to compare its performance with existing approaches across multiple datasets.
5.1. Experimental Setup
Datasets. The proposed pruning method is evaluated on five benchmark datasets: FashionMNIST [
38], SVHN [
39], CIFAR-10 [
40], CIFAR-100 [
40], and ImageNet [
41]. Key statistics of each dataset, including the number of training and testing samples, the number of classes, and image dimensions, are summarized in
Table 2.
To accommodate the varying image sizes across datasets, the input layer of the network was adjusted accordingly. Specifically, the input dimensions were set to for FashionMNIST, for SVHN, CIFAR-10, and CIFAR-100, and for ImageNet. This configuration allows the network to process images at their native or near-native resolution with minimal distortion, while ensuring that the proposed pruning method can be applied consistently across all datasets.
Models. A variety of models with different architectures and sizes were employed for the experiments. Specifically, AlexNet [
42] was used for FashionMNIST; AlexNet and VGG_11 [
43] for SVHN; VGG_16 [
43] and ResNet_56/110 [
44] for CIFAR-10 and CIFAR-100; and ResNet50 [
44] for ImageNet.
Hyperparameter Configuration. All experiments were implemented in PyTorch (1.10.0) using stochastic gradient descent (SGD) as the optimization algorithm. Each network was trained from scratch to establish the baseline model, with hyperparameters adjusted according to the dataset. For FashionMNIST and SVHN, the batch size was 128, momentum was 0.9, weight decay was 0.0001, and the initial learning rate was 0.01; networks were trained for 20 and 50 epochs, respectively. For CIFAR-10 and CIFAR-100, the batch size was 256, momentum was 0.9, weight decay was 0.005, and the initial learning rate was 0.01; training was conducted for 400 epochs. For ImageNet, the batch size was 256, momentum was 0.99, weight decay was 0.0001, and the initial learning rate was 0.01; training lasted for 200 epochs.
After pruning, the networks were retrained starting from the pruned weights without re-initialization. Retraining employed the same optimizer, batch size, learning rate schedule, data augmentation, and loss function as in the original training. For VGG_16, the weight decay was set to 0.0005 during retraining. The procedure followed the protocol described in [
19].
All pruning experiments were conducted deterministically using fixed pretrained models and fixed independence-score rankings. Under identical settings, repeated runs yielded the same pruning configurations and nearly identical evaluation results. Therefore, only single-run results are reported, consistent with prior studies on deterministic pruning.
5.2. Performance Evaluation of Pruned Networks
The effectiveness of the proposed pruning method was evaluated from two perspectives: classification performance, measured by Top-1 accuracy, and model compression, assessed by reductions in parameters and FLOPs.
Table 3 presents the performance of pruned models across nine different pruning rates on the FashionMNIST and SVHN datasets. The proposed method substantially reduces both parameters and FLOPs, achieving up to 99% reduction in both metrics while incurring minimal Top-1 accuracy loss: 1.34% for AlexNet on FashionMNIST, 1.68% for AlexNet on SVHN, and 2.63% for VGG_11 on SVHN.
5.3. Comparison with State-of-the-Art Pruning Methods
This part of the experiment compares the classification accuracy, number of parameters, and FLOPs of pruned networks under different pruning schemes. Several recent methods were selected for comparison on CIFAR-10, all of which evaluate filter similarity or correlation using various metrics to prune redundancy. Specifically, CHIP [
19] employs the nuclear norm of the feature map, VNGEP [
20] uses cosine distance, Li et al. [
23] adopt the Hamming distance, EACP [
27] relies on filter clustering distance, and CLR-RNF [
28] uses the L2 norm of the filter. Among these methods, CHIP and EACP are particularly notable. In contrast, the proposed approach leverages the nuclear norm of the filters themselves.
As shown in
Table 4, the performance metrics of each method at different pruning rates are presented. The numbers following each method indicate distinct pruning rates; for example, CHIP1 and CHIP2 correspond to two pruning rates for the CHIP method. Across comparable accuracy levels, the proposed method consistently achieves lower parameters and FLOPs.
Specifically, for VGG_16, at 94.09% Top-1 accuracy, the proposed method reduces parameters by 92.4% and FLOPs by 64.43%, whereas CHIP1, at 93.86% accuracy, achieves 81.6% parameter reduction and 58.1% FLOPs reduction. At 93.57% accuracy, the proposed method attains 94.2% parameter reduction and 73.15% FLOPs reduction, compared with EACP1 at 93.29% accuracy, which achieves 75.4% and 71.77%, respectively.
For ResNet_56, the proposed method reduces parameters by 74.85% and FLOPs by 74.5% at 92.44% accuracy, while CHIP2 achieves 71.8% and 72.3% at 92.05% accuracy. For ResNet_110, the proposed method attains 68.23% parameter reduction and 67.92% FLOPs reduction at 93.7% accuracy, outperforming EACP1, which reduces parameters by 61.3% and FLOPs by 65.7% at 93.39% accuracy.
Owing to the limited number of studies on CIFAR-100, comparative experiments are conducted on VGG_16 and ResNet_56 between the proposed pruning method and APRS [
13] and ASTER [
17]. For ResNet_110, results are reported only for the proposed method, as summarized in
Table 5.
For VGG_16, the proposed method outperforms both APRS and ASTER in terms of classification accuracy and compression. Specifically, at 73.86% Top-1 accuracy, the proposed method reduces parameters by 83.9% and FLOPs by 70.7%, whereas APRS3 achieves 70.79% accuracy with 67.2% parameter reduction and 70.2% FLOPs reduction. Similarly, at 74.12% Top-1 accuracy, the proposed method reduces parameters by 81.3% and FLOPs by 58.2%, while APRS2, with 73.02% accuracy, achieves 64% and 50.1% reductions, respectively. For ResNet_56, the proposed method achieves slightly lower classification accuracy than ASTER under higher compression levels.
Finally, comparative experiments were conducted on ImageNet. As shown in
Table 6, at equivalent pruning rates, the proposed method exhibits the least degradation in classification performance compared with CLR-RNF [
28], VNGEP [
20], and CHIP [
19].
5.4. Sensitivity Analysis of the Pruning-Rate Indicator
Figure 7 presents the filter independence score sequences for all layers of the networks used in the experiments. In the proposed method,
characterizes the distribution of each sequence, with the variance of
across all sequences indicated above each plot. Networks with similar sequence distributions exhibit smaller variances, whereas networks with more heterogeneous distributions have larger variances.
For networks with large variance, non-uniform pruning is applied, while networks with small variance and no outliers adopt uniform pruning. When small-variance networks contain outlier sequences (highlighted by red dashed ellipses), the layer-specific pruning rates can be adjusted accordingly. For example, in VGG_16 on CIFAR-100, the pruning rates for layers with outliers should be slightly reduced, whereas in ResNet_56, they can be slightly increased to account for these deviations.
5.5. Layer-Wise Pruning Results for VGG _16
In this subsection, experiments are conducted to validate the effectiveness of in guiding layer-specific pruning strategies. By directly measuring post-pruning accuracy, the influence of variations in layer pruning rates on overall network performance is analyzed.
The accuracy and loss of VGG_16 on CIFAR-10 are evaluated after sequentially pruning individual filters in each layer. Filters within each layer are removed one by one according to their normalized independence scores, as illustrated in
Figure 8, which highlights 13 convolutional layers and one fully connected layer. Filters are sorted from lowest to highest independence score (blue curve), with the horizontal axis representing filter order. The green line indicates the network accuracy without retraining after pruning up to the corresponding filter, while the orange line represents the loss. The red star marks an accuracy of 94%.
As shown in
Figure 8, the distribution of independence scores (blue lines) varies across layers. Several layers contain a substantial proportion of low-sensitivity filters, making them suitable for higher pruning rates; pruning multiple filters in these layers has a relatively minor effect on accuracy, as observed in the final four layers. Conversely, layers with a larger number of high-sensitivity filters are better candidates for lower pruning ratios, a trend particularly evident in the middle layers.
Layer 0 contains 64 filters, and the network maintains 94% accuracy after pruning 29 filters. In contrast, only four filters can be pruned in Layer 1 while preserving the same accuracy. The pruning rate increases markedly from Layer 8 onward, where layers with 512 filters allow pruning of 289, 409, 482, 478, 498, and 343 filters, respectively. When considering single-layer pruning while maintaining 94% network accuracy, the layer-specific pruning rates vary substantially. These single-layer pruning rates are then combined to form a set of layer-wise pruning strategies, which are applied to prune the entire network.
Table 7 presents the network accuracy and computational complexity resulting from different layer-wise pruning strategies. The pruning rates for the 13 convolutional layers and the single fully connected layer are denoted as
–
. Layer-specific pruning strategies have a substantial impact on network accuracy, which ranges from 10% to 94.06%. Strategies whose layer pruning rates better align with the ratio
achieve higher accuracy, as illustrated in
Figure 9.
5.6. Efficient Deployment Results on NPUs
This subsection evaluates the suitability of the pruned models for mobile deployment. Experiments were conducted on the RK3588 platform (Rockchip Electronics Co., Ltd., Fuzhou, China), with both the pruned and original models deployed on the same Neural Processing Unit (NPU), as illustrated in
Figure 10. Under the constraint that the difference in test accuracy does not exceed 3%, memory usage and total execution time are compared for the same input. The pruned models consistently require less memory and shorter execution time.
To quantify the performance changes, the relative difference index (RDI) is defined as
, where
A represents an NPU performance metric, including weight memory and total execution time.
Table 8 summarizes the results. For VGG_16, pruning reduced memory consumption by 94% and execution time by 84%, while for ResNet_56, memory usage and execution time were reduced by 85% and 50%, respectively.
6. Discussion
The experimental results demonstrate the advantages of the proposed nuclear-norm-based filter pruning strategy across multiple network architectures and datasets. Overall, the method achieves substantial reductions in model parameters and FLOPs while maintaining competitive classification accuracy, confirming its effectiveness for efficient deep network compression.
First, the proposed method efficiently reduces redundancy by quantifying filter independence using the nuclear norm. This approach accurately identifies redundant filters and removes them with minimal impact on performance, leading to higher compression ratios compared with existing pruning methods at similar accuracy levels (
Table 3,
Table 4,
Table 5 and
Table 6).
Second, the method enables layer-wise adaptive pruning. Unlike uniform pruning strategies, independence scores provide fine-grained guidance for determining pruning rates for each layer. Layers with higher redundancy are pruned more aggressively, while critical layers are preserved, yielding an improved trade-off between compression and accuracy. The heterogeneous distribution of independence scores across layers highlights that middle layers often contain high-sensitivity filters essential for maintaining representational capacity, whereas deeper layers have more low-independence filters that can be pruned with minimal accuracy loss. These observations underscore the necessity of adaptive, layer-specific pruning rates guided by the proposed indicator rather than heuristic or manually designed schemes.
Third, the proposed framework exhibits strong generality and robustness. The consistent performance gains observed across different architectures (VGG_16, ResNet_56, and ResNet_110) and datasets (CIFAR-10, CIFAR-100, and ImageNet) demonstrate that the method is architecture-agnostic and broadly applicable without requiring task- or network-specific modifications. Even under high pruning ratios, pruned networks retain most of their original accuracy after retraining, often outperforming or matching existing approaches in both compression efficiency and classification performance.
While the proposed independence-aware pruning method is primarily designed for networks containing convolutional and fully connected layers, it can also be extended to transformer-based architectures that incorporate such components. The principle of filter independence remains valid for convolutional and fully connected blocks within transformers, enabling pruning to reduce redundancy without compromising the unique contributions of distinct filters. These considerations indicate that the method is not limited to standard CNNs and can be adapted to a broader range of architectures, providing practical guidance for future extensions.
From a deployment perspective, independence-aware pruning provides significant practical benefits for resource-constrained hardware. The substantial reductions in memory usage and execution time on the RK3588 NPU demonstrate that the method effectively removes filters that disproportionately contribute to computational overhead. Its structured pruning nature aligns well with NPU execution characteristics, enabling consistent performance gains without the need for specialized hardware-aware optimization. The use of the relative difference index (RDI) further offers a hardware-agnostic metric to quantify deployment benefits, suggesting that the approach can be extended to other mobile and edge platforms.
Although the deployment experiments focus on the RK3588 NPU, the proposed pruning framework itself is inherently hardware-agnostic and can be applied to other edge computing devices with minimal adaptation. Extending evaluations to a broader range of hardware architectures, including various NPUs and mobile accelerators, constitutes an important avenue for future work.
In summary, the proposed method effectively balances model compactness and performance preservation. By combining accurate redundancy identification, adaptive layer-wise pruning, and strong generalization, it provides a principled and practical approach for efficient neural network compression and deployment across diverse scenarios.
7. Conclusions
This paper presents an independence-aware filter pruning framework for deep neural networks, designed to reduce computational complexity while preserving predictive performance. Through extensive experiments on VGG and ResNet architectures across multiple datasets—including CIFAR-10, CIFAR-100, SVHN, FashionMNIST, and ImageNet—the proposed method consistently achieves substantial reductions in both model parameters and FLOPs while maintaining competitive Top-1 accuracy. These results demonstrate the effectiveness of independence-aware pruning in minimizing redundancy and maximizing compression efficiency.
From a methodological perspective, this work introduces a principled approach to filter-level redundancy evaluation based on the differential nuclear norm. Unlike prior approaches that rely on feature map statistics or heuristic importance metrics, the independence score directly operates on pretrained weights, capturing both correlation structure and effective dimensionality of convolutional filters. Building upon this metric, the proposed layer-wise pruning indicator enables adaptive pruning rates across layers by reflecting the distribution characteristics of filter independence scores. This formulation allows the pruning process to explicitly account for heterogeneous redundancy patterns within a network, providing a unified, architecture-agnostic framework for structured pruning without the need for manual tuning of layer-specific rates.
Beyond algorithmic contributions, the proposed method exhibits clear practical benefits for deployment on resource-constrained hardware. The structured nature of the pruning strategy aligns with NPU execution characteristics, leading to significant reductions in memory usage and inference latency. This demonstrates that independence-aware pruning not only improves model compactness but also facilitates real-world deployment on mobile and edge devices, where computational resources and energy budgets are limited.
Despite its effectiveness, the proposed method has several limitations. First, the current framework focuses on the inference stage and does not reduce computational cost during training, as independence scores are computed from pretrained weights. Second, the pruning process is performed offline, which may not fully capture potential redundancy variations that arise during training or in dynamic deployment environments. These limitations constrain the applicability of the method in scenarios requiring training-time efficiency or adaptive network adjustment.
Future work will aim to extend independence-aware pruning to more flexible and dynamic settings. One direction is to integrate the independence evaluation into the training process, enabling training-time or online pruning with minimal overhead. Another avenue is to investigate dynamic pruning mechanisms that adaptively adjust network structures in response to evolving data distributions or changing hardware constraints. Additionally, developing computationally efficient approximations of the independence score could facilitate real-time or on-device pruning, further broadening the practical applicability of the method to a wider range of edge and embedded platforms.
In summary, the proposed independence-aware filter pruning framework provides a principled, effective, and deployment-friendly approach for deep network compression. By combining accurate redundancy identification, adaptive layer-wise pruning, and robust performance across diverse architectures and datasets, the method establishes a reliable strategy for balancing model compactness, computational efficiency, and predictive performance, offering both theoretical insight and practical value for modern deep learning applications.