Next Article in Journal
Closing the Loop: Sustainable and Cost-Effective Glucose Biosensors Through a Circular and Digital Design
Previous Article in Journal
Reinforcement Learning for Enhancing Bitcoin Risk-Aware Trading with Predictive Signals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI

1
Intelligent Media Computing Center, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
2
School of Computer and Cyber Sciences, Communication University of China, Beijing 100024, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(4), 794; https://doi.org/10.3390/electronics15040794
Submission received: 6 January 2026 / Revised: 4 February 2026 / Accepted: 10 February 2026 / Published: 12 February 2026
(This article belongs to the Section Artificial Intelligence)

Abstract

Filter pruning is an effective approach for improving the inference efficiency of neural networks and is particularly attractive for on-device artificial intelligence (AI) applications. However, many existing methods fail to accurately identify redundant filters due to limited modeling of inter-filter dependencies. A filter pruning method based on nuclear norm analysis is proposed to quantify filter independence and guide structured pruning. By analyzing the layer-wise distribution of independence scores, a principled trade-off between pruning rate and accuracy preservation is achieved. In most evaluation scenarios, the proposed method achieves 75–95% parameter reduction and 70–80% FLOPs reduction, while substantially higher compression ratios (up to 99%) can be obtained for more redundant network architectures, with consistent performance trends observed across multiple accuracy-related metrics. Furthermore, deployment on an RK3588 neural processing unit (NPU) demonstrates substantial reductions in memory consumption and inference latency, confirming the practical effectiveness of the method for mobile and edge AI applications.

1. Introduction

Neural networks have demonstrated remarkable performance across a wide range of applications. However, their substantial computational and memory demands pose significant challenges for deployment on resource-constrained edge devices. In many practical scenarios, such as automatic modulation classification [1], smoke segmentation [2] and traffic sign recognition [3] models with efficient inference and compact architectures are critically required. Consequently, reducing computational cost and model parameters has become a key research direction for developing practical and deployable neural networks in real-world applications.
Network pruning reduces computational and storage costs by removing redundant parameters while preserving model performance. It can be applied independently or in combination with quantization to achieve effective deep model compression [4,5]. Pruning methods are typically categorized as unstructured or structured. Unstructured pruning removes individual low-importance weights, creating sparse matrices that often require specialized hardware or software. Structured pruning removes entire parameter groups, such as filters or channels, producing dense architectures compatible with standard frameworks and hardware. It is therefore better suited for deployment on resource-constrained devices.
Regardless of whether pruning is unstructured, targeting individual weights, or structured, operating on parameter groups, a fundamental step in all pruning strategies is the evaluation of parameter importance. Accurate identification of parameters that contribute minimally to model performance enables pruning with limited accuracy degradation. Existing approaches for assessing parameter importance can be broadly categorized into two main classes.
The first category evaluates filter importance by designing networks that learn importance scores either concurrently with training or in an alternating manner, where filters correspond to groups of convolutional kernels associated with output channels. Representative approaches include predicting filter importance [6,7], selecting channel saliency [8,9], evaluating subnetwork performance [10,11], and searching for optimal substructures [12]. In addition, reinforcement learning-based methods employ agents to autonomously discover effective compression strategies [13]. Despite their effectiveness, these approaches generally increase training complexity and often lack clear interpretability regarding how the learned importance scores relate to the network’s underlying representations.
The second category evaluates filter importance based on network parameters or corresponding feature maps, thereby providing an explicit measure of each filter’s contribution. Parameter-based methods are conceptually simple and include criteria such as weight magnitude [14,15], weight norms [16], Batch Normalization scaling factors [17], and filter contributions to the loss [18]. However, because correlations among filters are typically ignored, these methods may produce inaccurate importance rankings and consequently misidentify redundant filters.
In contrast, feature-map-based approaches explicitly account for inter-filter correlations [19,20], which improves redundancy detection. Nevertheless, these methods require storing feature maps during inference or training, leading to increased computational overhead and a strong dependence on the dataset, hyperparameter settings, and experimental repeatability.
A novel filter pruning method that explicitly accounts for inter-filter correlations is presented. Filter importance is quantified using an independence score, defined as the difference in the nuclear norm of the weight matrix before and after masking a given filter. Furthermore, the distribution of independence scores within each layer is analyzed to guide the selection of layer-wise pruning rates. Based on this criterion, an efficient nuclear-norm-based pruning algorithm is developed, and performance is preserved through retraining. Deployment on neural processing units (NPUs) demonstrates substantial reductions in memory consumption and inference latency, highlighting the suitability of the proposed approach for on-device AI applications.
The primary contributions of this work are summarized as follows:
1.
A structured pruning framework is introduced that leverages the nuclear norm as an independence score for filter evaluation. Unlike existing nuclear-norm-based pruning methods that focus on feature map low-rankness or intra-layer redundancy, this framework explicitly quantifies inter-filter dependencies directly from network weights without requiring feature map computation.
2.
Theoretical analysis indicates that the differential nuclear norm of a filter is positively correlated with its independence. On this basis, a filter independence evaluation criterion is established to identify filters that are critical for preserving the representational capacity of the network.
3.
A unified redundancy analysis applicable to both convolutional and fully connected layers is developed. By modeling correlations among filters or neurons at the weight level, the proposed independence assessment enables effective identification of redundant structures that extract highly similar representations.
4.
A layer-wise pruning indicator is designed by characterizing the distribution of filter independence scores. Specifically, the ratio between mid-range values and the median is utilized to estimate the proportion of low-independence filters, thereby enabling adaptive pruning rates across different layers.

2. Related Work

Filter pruning typically involves three stages: evaluating filter importance, removing less important filters, and retraining the pruned network to reduce accuracy loss. In the first stage, selecting appropriate criteria for filter evaluation is critical. In the later stages, determining layer-wise pruning rates is essential, as variations across layers can significantly affect the pruning outcome. The following discussion reviews these stages.
Feature map correlation. Feature-map-based methods assess filter importance by analyzing similarities or correlations among the corresponding feature maps. A variety of metrics have been explored to quantify these relationships, including rank estimation [21], nuclear norm analysis [19], cosine similarity [20,22], Hamming distance [23], histogram intersection [24], and correlation coefficients derived from inner products [25]. More recently, cross-sample strategies have been investigated to further improve importance estimation, among which a global channel attention-based static pruning framework learns a unified channel ranking across samples [26].
Filter similarity. Filter importance is evaluated using geometric or statistical relationships among filters. Clustering methods retain filters based on k-means or similar strategies [27], while distance-based approaches identify less important filters via pairwise distances [28]. Low-rank constraints have been used to enforce linear independence among filter groups [29]. More recent studies adopt information-theoretic or embedding-based criteria, such as entropy-guided pruning on CIFAR-10 [30].
Layer pruning rate. Early one-shot methods typically adopt fixed or manually tuned rates without explicit layer-wise analysis [19,21]. Adaptive strategies learn layer-wise rates during training using reinforcement learning or differentiable search techniques [13]. Iterative frameworks, in contrast, adjust rates progressively through repeated pruning and retraining [16]. More recently, adaptive frameworks enhanced with self-distillation have been applied to achieve substantial parameter reduction while maintaining accuracy [31].
In addition, metaheuristic-based pruning approaches have been studied, highlighting both their effectiveness and computational overhead [32]. Symbolic regression-based refinement methods have also been explored for neural network compression [33]. Structured pruning has further been combined with deployment-oriented strategies—such as knowledge distillation [34], FPGA acceleration [35], and lightweight attention mechanisms [36]—to enable efficient inference on mobile and resource-constrained platforms.
As summarized in Table 1, feature-map-based methods incur substantial computational overhead, whereas filter-based approaches often lack a principled explanation of redundancy. In addition, layer-wise pruning rate design is frequently treated as a heuristic or auxiliary optimization problem. These limitations motivate the development of a unified framework that explicitly models filter dependencies at the weight level and utilizes this information to guide both filter selection and layer-wise pruning decisions.

3. Methodology: Filter Independence Modeling

This section analyzes filter group redundancy arising from correlations induced by linear transformations and introduces a method for assessing filter independence using the nuclear norm. Filters with higher independence are considered more important and are therefore retained during the pruning process. In addition, a procedure is proposed to identify filter submatrices that exhibit high independence.

3.1. Filter Redundancy Analysis

Let ω i R n i × n i 1 × k × k denote the weights of the i-th convolutional layer, where n i is the number of output channels and k is the size of the convolutional kernel. The weights are reshaped into a filter matrix F = ( f 1 , f 2 , , f n i ) R n i × ( n i 1 k 2 ) , which allows the convolution operation to be represented as a series of matrix multiplications. The product F x performs the transformation from the input space to the feature space, with F serving as the transformation matrix, as illustrated in Figure 1. The first three steps of the figure demonstrate how the convolution kernels are represented as a transformation matrix, thereby enabling a linear mapping from the input space to the feature space.
A similar representation applies to fully connected layers, which also perform matrix multiplication. In a fully connected layer with n outputs, each neuron corresponds to a row of the filter matrix F, containing the weights connecting all input units to that output. The filter matrix F is used to represent fully connected layer parameters, analogous to its role in convolutional layers. Consequently, both convolutional and fully connected layers perform feature extraction through linear transformations from the input space to the feature space. The independence assessment and pruning strategy developed for convolutional filters can thus be directly applied to neurons in fully connected layers, ensuring that the proposed framework systematically covers the entire network.
As the transformation matrix for the feature space, the filter matrix is closely related to the features it generates. Specifically, the feature space corresponds to the column space of the filter matrix F, with its column vectors spanning the space. The column space and the null space of a matrix are orthogonal, and the sum of their dimensions equals the number of rows of the matrix. When the row vectors of F are correlated, the dimension of the null space increases, thereby reducing the dimension of the column space, i.e., the feature space. Consequently, features extracted by correlated filters are redundant, as illustrated in step 4 of Figure 1.
When considering a single filter f j R n i 1 k 2 , j { 1 , 2 , , n i } , the convolution operation reduces to a vector inner product f j · x , compressing the input space into a one-dimensional feature space. In contrast, the filter matrix F transforms the input space into a feature space of dimension Rank ( F ) , which generally exceeds n i one-dimensional projections. Therefore, the filter matrix encodes more comprehensive and interrelated information, making its analysis more informative than assessing the importance of individual filters alone. Redundancy within the filter matrix can thus be interpreted as structural redundancy, providing valuable guidance for pruning.
Building on the preceding analysis, filter importance is assessed by examining correlations among filters. The central idea is to rank filters according to their degree of correlation with the rest of the layer. Specifically, a filter exhibiting high correlation is likely to be linearly dependent on other filters. In this case, the filter can be represented as a linear combination of the remaining filters, and the corresponding feature map can be approximated as a weighted combination of the feature maps generated by the other filters. Consequently, pruning a highly correlated filter has minimal impact, as its extracted feature map can be largely preserved or reconstructed from the outputs of the remaining filters.
To illustrate this intuition, Figure 2 presents a simplified example. Filters f 1 and f 2 are highly similar, producing nearly identical feature maps O 1 and O 2 , whereas filter f 3 extracts distinct features, resulting in a feature map O 3 that differs substantially. This example demonstrates that highly similar filters generate redundant features, which can be exploited for pruning.

3.2. Filter Independence Evaluation Based on Nuclear Norm

Before presenting the formal definition, it should be emphasized that the proposed method imposes no additional assumptions or constraints beyond the standard convolutional and fully connected layer settings. The formulation relies solely on layer weights, rendering it architecture-agnostic and generally applicable across different CNN models. With these general settings established, a metric for quantifying filter correlations is introduced.
Compared with the dimension of the column space, the basis vectors encode more comprehensive and richer spatial information. While the rank of F indicates only the dimensionality of the feature space, the nuclear norm of F, defined as the sum of its singular values, reflects the scale of the basis vectors within the column space. The nuclear norm of the filter matrix F is thus employed to represent the effective size of the feature space. This metric captures additional spatial information, providing a more informative measure of correlations among filters.
Since the filter f j corresponds to the j-th row vector of F, the following decomposition holds:
F T F = ( f 1 , , f n i ) f 1 T f n i T = f 1 f 1 T + + f j f j T + + f n i f n i T = ( f 1 f 1 T + + f n i f n i T ) + f j f j T = ( F ) T F + f j f j T
Here, F = ( f 1 , , f j 1 , f j + 1 , , f n i ) T R ( n i 1 ) × ( n i 1 k 2 ) denotes the filter matrix obtained by excluding f j . This decomposition separates the contribution of f j from that of the remaining filters, which serves as the foundation for defining the filter independence metric.
Suppose that F admits a singular value decomposition F = U Σ V T , where Σ = diag ( σ 1 , , σ n i 1 ) R ( n i 1 ) × ( n i 1 k 2 ) . Then, ( F ) T F = V Σ 2 V T . The columns of V T R ( n i 1 k 2 ) × ( n i 1 k 2 ) form a basis for R n i 1 k 2 . Since f j R n i 1 k 2 lies in this space, it can be expressed as
f j = V a j 1 a j , n i 1 k 2 .
Consequently,
f j f j T = V a j , 1 a j , n i 1 k 2 ( a j , 1 , , a j , n i 1 k 2 ) V T = V Σ a 2 V T
where Σ a 2 = diag ( a j 1 2 , , a j , n i 1 k 2 2 ) . This decomposition separates the contribution of f j along the basis defined by V, which underpins the subsequent definition of the filter independence score.
Let the singular values of F be μ 1 , , μ n i 1 k 2 . Then, the eigenvalues of F T F are given by μ 1 2 , , μ n i 1 k 2 2 . According to the perturbation theorem (Matrix Theory [37]), for any eigenvalue μ p 2 of F T F , there exists an eigenvalue σ q 2 of ( F ) T F such that
| μ p 2 σ q 2 | max m a j m 2 , m { 1 , 2 , , n i 1 k 2 } .
Consequently, taking square roots yields
| μ p σ q | | μ p 2 σ q 2 | max m a j m 2 = max m | a j m | .
This analysis indicates that the singular values of F and F differ by at most the largest absolute coefficient in the representation of f j with respect to the basis of F . Consequently, the contribution of a highly correlated filter f j to the overall feature space, as quantified by the nuclear norm, is relatively small. This observation provides a principled justification for assigning lower importance to such filters during pruning.
When f j is uniformly projected onto the column space of V T , its largest projection component is small, resulting in only a minor difference between the singular values of F and F . In contrast, when f j is non-uniformly projected, the magnitude of the largest projection component increases. If f j becomes aligned with a basis vector of the column space of V T , the projection component attains its maximum, leading to a substantial difference between the singular values of F and F . This behavior provides an intuitive explanation for why highly independent filters, which are less aligned with other filters, contribute more significantly to the feature space.
Thus, the difference between the singular values of F and F provides a measure of the degree of independence of f j relative to the other filters in F . A filter that induces a larger difference is more closely aligned with a basis vector of the row space of F , indicating greater independence from the remaining filters. Motivated by this observation, the following definition of filter independence is introduced.
Definition 1.
Let F ( i ) = ( f 1 , f 2 , , f n i ) T R n i × ( n i 1 k 2 ) denote the filter matrix of the i-th convolutional layer. The independence score of a filter f j R n i 1 k 2 , j { 1 , 2 , , n i } , relative to the other filters in F ( i ) , is defined as
S F ( i ) ( f j ) = | | F ( i ) | | * | | F ( i ) M j | | * ,
where | | · | | * denotes the nuclear norm,is the Hadamard product, and M j R n i × ( n i 1 k 2 ) is a row-wise mask matrix with the j-th row set to zeros and all other rows set to ones.
Specifically, a filter f j is masked from the filter matrix F, and the resulting change in the nuclear norm between the modified and original matrices is computed. One advantage of employing the nuclear norm difference to characterize filter correlation is its ability to capture fine-grained variations. As illustrated in Figure 3a, using the rank difference to evaluate filter independence produces only two extreme values—0 for correlated filters and 1 for independent filters—thereby limiting its sensitivity to subtle variations. In contrast, the nuclear norm difference provides a more sensitive measure, enabling a finer distinction of filter independence, as shown in Figure 3b.
The independence score S F ( i ) ( f j ) quantifies the correlation between a filter f j and the remaining filters in F ( i ) , with larger values indicating lower substitutability. Consequently, independence scores can be computed for all filters in a network layer to accurately identify less important filters for pruning. As illustrated in Figure 4, feature maps extracted by filters sorted according to L2 norm and independence score are compared. The feature map corresponding to filter (7), which contains fine-grained details (highlighted by the red dotted box), is discarded when pruning based on the L2 norm but retained when pruning based on the independence score. Conversely, the feature map corresponding to filter (4), which primarily encodes contour information (highlighted by the purple dotted box), exhibits the opposite behavior.

3.3. Computational Complexity

For the i-th convolutional layer, all filters are arranged into a matrix F ( i ) R n i × ( n i 1 k 2 ) . The nuclear norm is computed via singular value decomposition (SVD), which has a computational complexity of O n i ( n i 1 k 2 ) min ( n i , n i 1 k 2 ) . Since the nuclear norm of the full filter matrix, | | F ( i ) | | * , is shared across all filters, it needs to be computed only once. In contrast, the masked nuclear norm, | | F ( i ) M j | | * , must be computed for each of the n i filters. Therefore, the total complexity for computing the independence scores of all filters in the i-th layer is
O n i 2 ( n i 1 k 2 ) min ( n i , n i 1 k 2 ) .
In typical convolutional neural networks, the quantity n i 1 k 2 is often much larger than n i , allowing the overall complexity to be approximated as
O ( n i 3 n i 1 k 2 ) .
It is worth noting that this computation is performed only once per layer during the offline pruning stage and is independent of the spatial resolution of the feature maps. Consequently, unlike feature-map-based pruning methods that require repeated forward passes over large activation tensors, the proposed approach introduces no additional overhead during training or inference. In practice, its computational cost is negligible compared with the cumulative cost of convolution operations.

3.4. Independent Submatrix Estimation

This subsection focuses on estimating the filter submatrix F ˜ sub - c with high independence among the numerous candidate submatrices F sub - c = F M j 1 , j 2 , , j c , when c filters are to be pruned from the i-th convolutional layer. Here, M j 1 , j 2 , , j c R n i × ( n i 1 k 2 ) is a multi-row mask matrix in which the j 1 , j 2 , , j c rows are set to zeros and all other rows are set to ones.
Theoretically, the filter submatrix F sub - c can be constructed iteratively. First, the independence scores S F ( f j ) are computed for all filters. The filter with the lowest score, f j 1 , is then masked to obtain F sub - 1 . Following this procedure, S F sub - 1 ( f j ) is computed for the remaining filters, and the filter with the lowest score, f j 2 , is masked to form F sub - 2 . This process is repeated for c iterations to obtain F sub - c .
However, this iterative approach requires computing a large number of independence scores, resulting in significant computational cost. To mitigate this, S F ( f j ) can be used to estimate the relative magnitudes of S F sub - c ( f j ) , substantially reducing the computation in practice. Specifically, for filters f j , f k { j 1 , j 2 , , j a } , if S F ( f j ) > S F ( f k ) , then S F sub - a ( f j ) > S F sub - a ( f k ) , a { 1 , 2 , , c 1 } . For instance, in a convolutional layer containing 500 filters where 100 filters are to be pruned, this estimation method reduces the total number of independence score computations from 45,000 (using the full iterative procedure) to only 500.
Definition 2.
Let F = ( f 1 , f 2 , , f n i ) T R n i × ( n i 1 k 2 ) denote the filter matrix of the i-th convolutional layer, from which c filters are to be pruned. The submatrix with maximal independence, F ˜ sub - c , consists of the remaining n i c filters corresponding to the highest independence scores, i.e., f max 1 , f max 2 , , f max n i c . Formally,
F ˜ sub - c = f max 1 , f max 2 , , f max n i c T R ( n i c ) × ( n i 1 k 2 ) .
Here, f max k , k 1 , 2 , , n i c , denotes the filter with the k-th highest independence score. Note that the independence of the filter matrix is invariant to the ordering of its filters. Definition 2 can be formally justified via mathematical induction, utilizing the previously discussed estimation method.
Proof. 
For the base case c = 1 , we have
| | F sub - 1 | | * = | | F | | * S F ( f j ) .
If S F ( f j ) is minimal, then | | F sub - 1 | | * is maximal.
Assume that for c = n , the submatrix
F sub - n = f max 1 , f max 2 , , f max n i n T
holds. For c = n + 1 , the submatrix satisfies
| | F sub - ( n + 1 ) | | * = | | F sub - n | | * S F sub - n ( f j ) .
Similarly, if S F sub - n ( f j ) is minimal, then | | F sub - ( n + 1 ) | | * is maximal. In practice, S F ( f max n i n ) can be used to estimate S F sub - n ( f max n i n ) , completing the induction step. □

4. Independence-Aware Pruning Strategy

This section first examines layer-wise pruning rates based on the independence of filters within each layer, providing insight into how the distribution of independence scores influences pruning decisions. Building on this analysis, a pruning method is proposed that optimizes filter selection according to the computed independence scores.

4.1. Relationship Between Pruning Rate and Filter Independence

Manually assigning layer-wise pruning rates has several limitations. Balancing the pruning rate and network performance is challenging, often resulting in either overly conservative pruning or noticeable accuracy degradation. Moreover, the trial-and-error process can be computationally expensive. A more efficient approach is to adopt customized pruning rates informed by available data. Since the independence score of each filter has been computed for all layers and layers are treated independently, the relationship between the distribution of layer-wise independence scores and the corresponding pruning rates is investigated.
The analysis is performed on the sequence of independence scores S F , which contains all filter scores within a given layer. A predominance of low independence scores indicates strong correlations among the filters and high redundancy, suggesting a higher pruning rate for that layer. Conversely, layers with more filters exhibiting high independence scores warrant lower pruning rates. This approach is designed to minimize performance degradation. By assigning distinct pruning rates to each layer based on the distribution of independence scores, a favorable trade-off between compression and accuracy is achieved, enabling high overall pruning rates while preserving model performance.
To quantify the distribution of independence scores within a layer and guide the corresponding pruning rate, a ratio η is introduced to capture the relative spread of the independence scores.
Definition 3.
For a layer with independence score sequence S F , the ratio η is defined as
η = ln max ( S F ) + min ( S F ) 2 median ( S F ) + ε ,
where ε is a small constant added to avoid division by zero.
The logarithm is applied to compress the range of the ratio while preserving relative differences. As illustrated in Figure 5, the ratio η effectively characterizes the distribution of the independence scores within a layer. A larger η indicates a predominance of filters with low independence scores (orange line), corresponding to higher redundancy and suggesting a higher layer pruning rate. Conversely, a smaller η reflects a greater number of highly independent filters (blue line), implying lower redundancy and a smaller pruning rate. Thus, η provides a principled criterion for guiding layer-wise pruning. In the figure, the 7th and 11th layers are presented as representative examples to demonstrate variations in score distributions across layers.
A layer-specific pruning strategy can be derived from each layer’s η value. When the η values across layers are closely clustered, it suggests similar redundancy levels, warranting uniform pruning rates. Conversely, a dispersed distribution of η indicates heterogeneous redundancy, motivating non-uniform pruning rates across layers. The variance of the sequence is commonly used to quantify distribution dispersion. For instance, on CIFAR-10, the variance is 37.6463 for VGG_16 and 0.1075 for ResNet_56, implying that VGG_16 benefits from non-uniform pruning, whereas ResNet_56 is more suited to uniform pruning. It should be noted, however, that variance primarily reflects general trends and does not account for outliers.

4.2. Independence-Aware Structured Pruning Algorithm

Inspired by the concept of linear transformation, redundant filters in the network are analyzed by assessing filter independence using the nuclear norm. The resulting independence scores serve a dual purpose: evaluating filter importance and guiding the pruning strategy. Following this principle, a nuclear-norm-based filter pruning algorithm is developed, as illustrated in Figure 6. In the left panel, the input consists of the convolutional layer weights, and the output is the set of computed filter independence scores along with the corresponding η . In the right panel, the input is the same set of weights, and the output is the pruned weights obtained by removing redundant filters according to the computed independence scores and η .
The proposed method computes each filter’s independence score by measuring the change in the nuclear norm. Filters with low independence scores, whose features can be largely represented by other filters, are pruned with minimal impact on network performance. The sequence of independence scores within each layer is then summarized using η to guide the pruning strategy. Across the network, the η values of all layers are normalized, and layers with higher η are assigned higher pruning rates, enabling adaptive, layer-specific pruning.
The network’s overall pruning rate is defined as the weighted sum of the layer-wise pruning rates, where the weights correspond to each layer’s proportion of filters. Formally, it is expressed as P = i = 1 L a i p i , where L is the total number of layers, a i denotes the proportion of filters in the i-th layer relative to the entire network, and p i represents the pruning rate of the i-th layer.

5. Experimental Results

This section presents the experimental design, implementation details, and results. The primary objective is to evaluate the effectiveness of the proposed method and to compare its performance with existing approaches across multiple datasets.

5.1. Experimental Setup

Datasets. The proposed pruning method is evaluated on five benchmark datasets: FashionMNIST [38], SVHN [39], CIFAR-10 [40], CIFAR-100 [40], and ImageNet [41]. Key statistics of each dataset, including the number of training and testing samples, the number of classes, and image dimensions, are summarized in Table 2.
To accommodate the varying image sizes across datasets, the input layer of the network was adjusted accordingly. Specifically, the input dimensions were set to 28 × 28 for FashionMNIST, 32 × 32 for SVHN, CIFAR-10, and CIFAR-100, and 224 × 224 for ImageNet. This configuration allows the network to process images at their native or near-native resolution with minimal distortion, while ensuring that the proposed pruning method can be applied consistently across all datasets.
Models. A variety of models with different architectures and sizes were employed for the experiments. Specifically, AlexNet [42] was used for FashionMNIST; AlexNet and VGG_11 [43] for SVHN; VGG_16 [43] and ResNet_56/110 [44] for CIFAR-10 and CIFAR-100; and ResNet50 [44] for ImageNet.
Hyperparameter Configuration. All experiments were implemented in PyTorch (1.10.0) using stochastic gradient descent (SGD) as the optimization algorithm. Each network was trained from scratch to establish the baseline model, with hyperparameters adjusted according to the dataset. For FashionMNIST and SVHN, the batch size was 128, momentum was 0.9, weight decay was 0.0001, and the initial learning rate was 0.01; networks were trained for 20 and 50 epochs, respectively. For CIFAR-10 and CIFAR-100, the batch size was 256, momentum was 0.9, weight decay was 0.005, and the initial learning rate was 0.01; training was conducted for 400 epochs. For ImageNet, the batch size was 256, momentum was 0.99, weight decay was 0.0001, and the initial learning rate was 0.01; training lasted for 200 epochs.
After pruning, the networks were retrained starting from the pruned weights without re-initialization. Retraining employed the same optimizer, batch size, learning rate schedule, data augmentation, and loss function as in the original training. For VGG_16, the weight decay was set to 0.0005 during retraining. The procedure followed the protocol described in [19].
All pruning experiments were conducted deterministically using fixed pretrained models and fixed independence-score rankings. Under identical settings, repeated runs yielded the same pruning configurations and nearly identical evaluation results. Therefore, only single-run results are reported, consistent with prior studies on deterministic pruning.

5.2. Performance Evaluation of Pruned Networks

The effectiveness of the proposed pruning method was evaluated from two perspectives: classification performance, measured by Top-1 accuracy, and model compression, assessed by reductions in parameters and FLOPs. Table 3 presents the performance of pruned models across nine different pruning rates on the FashionMNIST and SVHN datasets. The proposed method substantially reduces both parameters and FLOPs, achieving up to 99% reduction in both metrics while incurring minimal Top-1 accuracy loss: 1.34% for AlexNet on FashionMNIST, 1.68% for AlexNet on SVHN, and 2.63% for VGG_11 on SVHN.

5.3. Comparison with State-of-the-Art Pruning Methods

This part of the experiment compares the classification accuracy, number of parameters, and FLOPs of pruned networks under different pruning schemes. Several recent methods were selected for comparison on CIFAR-10, all of which evaluate filter similarity or correlation using various metrics to prune redundancy. Specifically, CHIP [19] employs the nuclear norm of the feature map, VNGEP [20] uses cosine distance, Li et al. [23] adopt the Hamming distance, EACP [27] relies on filter clustering distance, and CLR-RNF [28] uses the L2 norm of the filter. Among these methods, CHIP and EACP are particularly notable. In contrast, the proposed approach leverages the nuclear norm of the filters themselves.
As shown in Table 4, the performance metrics of each method at different pruning rates are presented. The numbers following each method indicate distinct pruning rates; for example, CHIP1 and CHIP2 correspond to two pruning rates for the CHIP method. Across comparable accuracy levels, the proposed method consistently achieves lower parameters and FLOPs.
Specifically, for VGG_16, at 94.09% Top-1 accuracy, the proposed method reduces parameters by 92.4% and FLOPs by 64.43%, whereas CHIP1, at 93.86% accuracy, achieves 81.6% parameter reduction and 58.1% FLOPs reduction. At 93.57% accuracy, the proposed method attains 94.2% parameter reduction and 73.15% FLOPs reduction, compared with EACP1 at 93.29% accuracy, which achieves 75.4% and 71.77%, respectively.
For ResNet_56, the proposed method reduces parameters by 74.85% and FLOPs by 74.5% at 92.44% accuracy, while CHIP2 achieves 71.8% and 72.3% at 92.05% accuracy. For ResNet_110, the proposed method attains 68.23% parameter reduction and 67.92% FLOPs reduction at 93.7% accuracy, outperforming EACP1, which reduces parameters by 61.3% and FLOPs by 65.7% at 93.39% accuracy.
Owing to the limited number of studies on CIFAR-100, comparative experiments are conducted on VGG_16 and ResNet_56 between the proposed pruning method and APRS [13] and ASTER [17]. For ResNet_110, results are reported only for the proposed method, as summarized in Table 5.
For VGG_16, the proposed method outperforms both APRS and ASTER in terms of classification accuracy and compression. Specifically, at 73.86% Top-1 accuracy, the proposed method reduces parameters by 83.9% and FLOPs by 70.7%, whereas APRS3 achieves 70.79% accuracy with 67.2% parameter reduction and 70.2% FLOPs reduction. Similarly, at 74.12% Top-1 accuracy, the proposed method reduces parameters by 81.3% and FLOPs by 58.2%, while APRS2, with 73.02% accuracy, achieves 64% and 50.1% reductions, respectively. For ResNet_56, the proposed method achieves slightly lower classification accuracy than ASTER under higher compression levels.
Finally, comparative experiments were conducted on ImageNet. As shown in Table 6, at equivalent pruning rates, the proposed method exhibits the least degradation in classification performance compared with CLR-RNF [28], VNGEP [20], and CHIP [19].

5.4. Sensitivity Analysis of the Pruning-Rate Indicator η

Figure 7 presents the filter independence score sequences for all layers of the networks used in the experiments. In the proposed method, η characterizes the distribution of each sequence, with the variance of η across all sequences indicated above each plot. Networks with similar sequence distributions exhibit smaller variances, whereas networks with more heterogeneous distributions have larger variances.
For networks with large variance, non-uniform pruning is applied, while networks with small variance and no outliers adopt uniform pruning. When small-variance networks contain outlier sequences (highlighted by red dashed ellipses), the layer-specific pruning rates can be adjusted accordingly. For example, in VGG_16 on CIFAR-100, the pruning rates for layers with outliers should be slightly reduced, whereas in ResNet_56, they can be slightly increased to account for these deviations.

5.5. Layer-Wise Pruning Results for VGG _16

In this subsection, experiments are conducted to validate the effectiveness of η in guiding layer-specific pruning strategies. By directly measuring post-pruning accuracy, the influence of variations in layer pruning rates on overall network performance is analyzed.
The accuracy and loss of VGG_16 on CIFAR-10 are evaluated after sequentially pruning individual filters in each layer. Filters within each layer are removed one by one according to their normalized independence scores, as illustrated in Figure 8, which highlights 13 convolutional layers and one fully connected layer. Filters are sorted from lowest to highest independence score (blue curve), with the horizontal axis representing filter order. The green line indicates the network accuracy without retraining after pruning up to the corresponding filter, while the orange line represents the loss. The red star marks an accuracy of 94%.
As shown in Figure 8, the distribution of independence scores (blue lines) varies across layers. Several layers contain a substantial proportion of low-sensitivity filters, making them suitable for higher pruning rates; pruning multiple filters in these layers has a relatively minor effect on accuracy, as observed in the final four layers. Conversely, layers with a larger number of high-sensitivity filters are better candidates for lower pruning ratios, a trend particularly evident in the middle layers.
Layer 0 contains 64 filters, and the network maintains 94% accuracy after pruning 29 filters. In contrast, only four filters can be pruned in Layer 1 while preserving the same accuracy. The pruning rate increases markedly from Layer 8 onward, where layers with 512 filters allow pruning of 289, 409, 482, 478, 498, and 343 filters, respectively. When considering single-layer pruning while maintaining 94% network accuracy, the layer-specific pruning rates vary substantially. These single-layer pruning rates are then combined to form a set of layer-wise pruning strategies, which are applied to prune the entire network.
Table 7 presents the network accuracy and computational complexity resulting from different layer-wise pruning strategies. The pruning rates for the 13 convolutional layers and the single fully connected layer are denoted as p 0 p 13 . Layer-specific pruning strategies have a substantial impact on network accuracy, which ranges from 10% to 94.06%. Strategies whose layer pruning rates better align with the ratio η achieve higher accuracy, as illustrated in Figure 9.

5.6. Efficient Deployment Results on NPUs

This subsection evaluates the suitability of the pruned models for mobile deployment. Experiments were conducted on the RK3588 platform (Rockchip Electronics Co., Ltd., Fuzhou, China), with both the pruned and original models deployed on the same Neural Processing Unit (NPU), as illustrated in Figure 10. Under the constraint that the difference in test accuracy does not exceed 3%, memory usage and total execution time are compared for the same input. The pruned models consistently require less memory and shorter execution time.
To quantify the performance changes, the relative difference index (RDI) is defined as R D I = A pruned A original A original , where A represents an NPU performance metric, including weight memory and total execution time. Table 8 summarizes the results. For VGG_16, pruning reduced memory consumption by 94% and execution time by 84%, while for ResNet_56, memory usage and execution time were reduced by 85% and 50%, respectively.

6. Discussion

The experimental results demonstrate the advantages of the proposed nuclear-norm-based filter pruning strategy across multiple network architectures and datasets. Overall, the method achieves substantial reductions in model parameters and FLOPs while maintaining competitive classification accuracy, confirming its effectiveness for efficient deep network compression.
First, the proposed method efficiently reduces redundancy by quantifying filter independence using the nuclear norm. This approach accurately identifies redundant filters and removes them with minimal impact on performance, leading to higher compression ratios compared with existing pruning methods at similar accuracy levels (Table 3, Table 4, Table 5 and Table 6).
Second, the method enables layer-wise adaptive pruning. Unlike uniform pruning strategies, independence scores provide fine-grained guidance for determining pruning rates for each layer. Layers with higher redundancy are pruned more aggressively, while critical layers are preserved, yielding an improved trade-off between compression and accuracy. The heterogeneous distribution of independence scores across layers highlights that middle layers often contain high-sensitivity filters essential for maintaining representational capacity, whereas deeper layers have more low-independence filters that can be pruned with minimal accuracy loss. These observations underscore the necessity of adaptive, layer-specific pruning rates guided by the proposed η indicator rather than heuristic or manually designed schemes.
Third, the proposed framework exhibits strong generality and robustness. The consistent performance gains observed across different architectures (VGG_16, ResNet_56, and ResNet_110) and datasets (CIFAR-10, CIFAR-100, and ImageNet) demonstrate that the method is architecture-agnostic and broadly applicable without requiring task- or network-specific modifications. Even under high pruning ratios, pruned networks retain most of their original accuracy after retraining, often outperforming or matching existing approaches in both compression efficiency and classification performance.
While the proposed independence-aware pruning method is primarily designed for networks containing convolutional and fully connected layers, it can also be extended to transformer-based architectures that incorporate such components. The principle of filter independence remains valid for convolutional and fully connected blocks within transformers, enabling pruning to reduce redundancy without compromising the unique contributions of distinct filters. These considerations indicate that the method is not limited to standard CNNs and can be adapted to a broader range of architectures, providing practical guidance for future extensions.
From a deployment perspective, independence-aware pruning provides significant practical benefits for resource-constrained hardware. The substantial reductions in memory usage and execution time on the RK3588 NPU demonstrate that the method effectively removes filters that disproportionately contribute to computational overhead. Its structured pruning nature aligns well with NPU execution characteristics, enabling consistent performance gains without the need for specialized hardware-aware optimization. The use of the relative difference index (RDI) further offers a hardware-agnostic metric to quantify deployment benefits, suggesting that the approach can be extended to other mobile and edge platforms.
Although the deployment experiments focus on the RK3588 NPU, the proposed pruning framework itself is inherently hardware-agnostic and can be applied to other edge computing devices with minimal adaptation. Extending evaluations to a broader range of hardware architectures, including various NPUs and mobile accelerators, constitutes an important avenue for future work.
In summary, the proposed method effectively balances model compactness and performance preservation. By combining accurate redundancy identification, adaptive layer-wise pruning, and strong generalization, it provides a principled and practical approach for efficient neural network compression and deployment across diverse scenarios.

7. Conclusions

This paper presents an independence-aware filter pruning framework for deep neural networks, designed to reduce computational complexity while preserving predictive performance. Through extensive experiments on VGG and ResNet architectures across multiple datasets—including CIFAR-10, CIFAR-100, SVHN, FashionMNIST, and ImageNet—the proposed method consistently achieves substantial reductions in both model parameters and FLOPs while maintaining competitive Top-1 accuracy. These results demonstrate the effectiveness of independence-aware pruning in minimizing redundancy and maximizing compression efficiency.
From a methodological perspective, this work introduces a principled approach to filter-level redundancy evaluation based on the differential nuclear norm. Unlike prior approaches that rely on feature map statistics or heuristic importance metrics, the independence score directly operates on pretrained weights, capturing both correlation structure and effective dimensionality of convolutional filters. Building upon this metric, the proposed layer-wise pruning indicator η enables adaptive pruning rates across layers by reflecting the distribution characteristics of filter independence scores. This formulation allows the pruning process to explicitly account for heterogeneous redundancy patterns within a network, providing a unified, architecture-agnostic framework for structured pruning without the need for manual tuning of layer-specific rates.
Beyond algorithmic contributions, the proposed method exhibits clear practical benefits for deployment on resource-constrained hardware. The structured nature of the pruning strategy aligns with NPU execution characteristics, leading to significant reductions in memory usage and inference latency. This demonstrates that independence-aware pruning not only improves model compactness but also facilitates real-world deployment on mobile and edge devices, where computational resources and energy budgets are limited.
Despite its effectiveness, the proposed method has several limitations. First, the current framework focuses on the inference stage and does not reduce computational cost during training, as independence scores are computed from pretrained weights. Second, the pruning process is performed offline, which may not fully capture potential redundancy variations that arise during training or in dynamic deployment environments. These limitations constrain the applicability of the method in scenarios requiring training-time efficiency or adaptive network adjustment.
Future work will aim to extend independence-aware pruning to more flexible and dynamic settings. One direction is to integrate the independence evaluation into the training process, enabling training-time or online pruning with minimal overhead. Another avenue is to investigate dynamic pruning mechanisms that adaptively adjust network structures in response to evolving data distributions or changing hardware constraints. Additionally, developing computationally efficient approximations of the independence score could facilitate real-time or on-device pruning, further broadening the practical applicability of the method to a wider range of edge and embedded platforms.
In summary, the proposed independence-aware filter pruning framework provides a principled, effective, and deployment-friendly approach for deep network compression. By combining accurate redundancy identification, adaptive layer-wise pruning, and robust performance across diverse architectures and datasets, the method establishes a reliable strategy for balancing model compactness, computational efficiency, and predictive performance, offering both theoretical insight and practical value for modern deep learning applications.

Author Contributions

J.W.: Conceptualization, Methodology, Writing—Original Draft, and Writing—Review and Editing. Z.J., Y.Z. and W.M.: Validation and Visualization. H.B.: Funding Acquisition and Writing—Review and Editing. Y.F.: Writing—Review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the Science and Technology Innovation 2030 Major Projects (Grant No. 2022ZD0211603) and the Beijing Natural Science Foundation-Joint Funds of Haidian Original Innovation Project (L232056).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No datasets were generated or analyzed during the current study.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (version GPT-5 mini, OpenAI, San Francisco, CA, USA) for the purposes of checking and correcting grammatical and spelling errors. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
NPUNeural Processing Unit

References

  1. Dong, B.; Liu, Y.; Gui, G.; Fu, X.; Dong, H.; Adebisi, B.; Gacanin, H.; Sari, H. A lightweight decentralized-learning-based automatic modulation classification method for resource-constrained edge devices. IEEE Internet Things J. 2022, 9, 24708–24720. [Google Scholar] [CrossRef]
  2. Yuan, F.; Li, K.; Wang, C.; Fang, Z. A lightweight network for smoke semantic segmentation. Pattern Recognit. 2023, 137, 109289. [Google Scholar] [CrossRef]
  3. Wang, W.; Liu, X. Research on the Application of Pruning Algorithm Based on Local Linear Embedding Method in Traffic Sign Recognition. Appl. Sci. 2024, 14, 7184. [Google Scholar] [CrossRef]
  4. Chen, W.; Wang, P.; Cheng, J. Towards Automatic Model Compression via a Unified Two-Stage Framework. Pattern Recognit. 2023, 140, 109527. [Google Scholar] [CrossRef]
  5. Kirchhoffer, H.; Haase, P.; Samek, W.; Müller, K.; Rezazadegan-Tavakoli, H.; Cricri, F.; Aksu, E.B.; Hannuksela, M.M.; Jiang, W.; Wang, W.; et al. Overview of the neural network compression and representation (NNR) standard. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3203–3216. [Google Scholar] [CrossRef]
  6. Zu, X.; Li, Y.; Yin, B. Consecutive layer collaborative filter similarity for differentiable neural network pruning. Neurocomputing 2023, 533, 35–45. [Google Scholar] [CrossRef]
  7. Ding, X.; Hao, T.; Tan, J.; Liu, J.; Han, J.; Guo, Y.; Ding, G. ResRep: Lossless CNN pruning via decoupling remembering and forgetting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 4510–4520. [Google Scholar]
  8. Guo, S.; Lai, B.; Yang, S.; Zhao, J.; Shen, F. Sensitivity Pruner: Filter-Level compression algorithm for deep neural networks. Pattern Recognit. 2023, 140, 109508. [Google Scholar] [CrossRef]
  9. Guan, Y.; Liu, N.; Zhao, P.; Che, Z.; Bian, K.; Wang, Y.; Tang, J. DAIS: Automatic channel pruning via differentiable annealing indicator search. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9847–9858. [Google Scholar] [CrossRef] [PubMed]
  10. Li, J.; Rao, X.; Xiao, S.; Zhao, B.; Liu, D. Pruner to Predictor: An efficient pruning method for neural networks compression. In 2022 14th International Conference on Advanced Computational Intelligence (ICACI); IEEE: New York, NY, USA, 2022; pp. 9–14. [Google Scholar]
  11. Gao, S.; Huang, F.; Cai, W.; Huang, H. Network pruning via performance maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 9270–9280. [Google Scholar]
  12. Lee, S.; Song, B.C. Fast filter pruning via coarse-to-fine neural architecture search and contrastive knowledge transfer. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 9674–9685. [Google Scholar] [CrossRef]
  13. Xiao, H.; Wang, Y.; Liu, J.; Huo, J.; Hu, Y.; Wang, Y. APRS: Automatic pruning ratio search using Siamese network with layer-level rewards. Digit. Signal Process 2023, 133, 103864. [Google Scholar] [CrossRef]
  14. Wang, Z.; Li, C.; Wang, X. Convolutional neural network pruning with structural redundancy reduction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 14913–14922. [Google Scholar]
  15. Meng, F.; Cheng, H.; Li, K.; Luo, H.; Guo, X.; Lu, G.; Sun, X. Pruning filter in filter. NeurIPS 2020, 33, 17629–17640. [Google Scholar]
  16. He, Y.; Dong, X.; Kang, G.; Fu, Y.; Yan, C.; Yang, Y. Asymptotic soft filter pruning for deep convolutional neural networks. IEEE Trans. Cybern. 2019, 50, 3594–3604. [Google Scholar] [CrossRef]
  17. Zhang, Y.; Freris, N.M. Adaptive filter pruning via sensitivity feedback. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 10996–11008. [Google Scholar] [CrossRef]
  18. Ruan, X.; Liu, Y.; Li, B.; Yuan, C.; Hu, W. DPFPS: Dynamic and progressive filter pruning for compressing convolutional neural networks from scratch. AAAI Conf. Artif. Intell. 2021, 35, 2495–2503. [Google Scholar] [CrossRef]
  19. Sui, Y.; Yin, M.; Xie, Y.; Phan, H.; Aliari Zonouz, S.; Yuan, B. CHIP: Channel independence-based pruning for compact neural networks. NeurIPS 2021, 34, 24604–24616. [Google Scholar]
  20. Shi, C.; Hao, Y.; Li, G.; Xu, S. VNGEP: Filter pruning based on von Neumann graph entropy. Neurocomputing 2023, 528, 113–124. [Google Scholar] [CrossRef]
  21. Lin, M.; Ji, R.; Wang, Y.; Zhang, Y.; Zhang, B.; Tian, Y.; Shao, L. HRank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 1529–1538. [Google Scholar]
  22. Chang, J.; Lu, Y.; Xue, P.; Xu, Y.; Wei, Z. Iterative clustering pruning for convolutional neural networks. Knowl. Based Syst. 2023, 265, 110386. [Google Scholar] [CrossRef]
  23. Li, J.; Shao, H.; Zhai, S.; Jiang, Y.; Deng, X. A graphical approach for filter pruning by exploring the similarity relation between feature maps. Pattern Recogn Lett. 2023, 166, 69–75. [Google Scholar] [CrossRef]
  24. Yao, K.; Cao, F.; Leung, Y.; Liang, J. Deep neural network compression through interpretability-based filter pruning. Pattern Recogn 2021, 119, 108056. [Google Scholar] [CrossRef]
  25. Geng, L.; Niu, B. Pruning convolutional neural networks via filter similarity analysis. Mach. Learn. 2022, 111, 3161–3180. [Google Scholar] [CrossRef]
  26. Wang, Y.; Guo, S.; Guo, J.; Zhang, J.; Zhang, W.; Yan, C.; Zhang, Y. Towards performance-maximizing neural network pruning via global channel attention. Neural Netw. 2024, 171, 104–113. [Google Scholar] [CrossRef] [PubMed]
  27. Liu, Y.; Wu, D.; Zhou, W.; Fan, K.; Zhou, Z. EACP: An effective automatic channel pruning for neural networks. Neurocomputing 2023, 526, 131–142. [Google Scholar] [CrossRef]
  28. Lin, M.; Cao, L.; Zhang, Y.; Shao, L.; Lin, C.W.; Ji, R. Pruning networks with cross-layer ranking & k-reciprocal nearest filters. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9139–9148. [Google Scholar]
  29. Yang, L.; Gu, S.; Shen, C.; Zhao, X.; Hu, Q. Skeleton neural networks via low-rank guided filter pruning. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7197–7211. [Google Scholar] [CrossRef]
  30. Qiu, Y.; Niu, L.; Sha, F.; Cheng, Z.; Yanai, K. Entropy-Guided Search Space Optimization for Efficient Neural Network Pruning. Algorithms 2025, 18, 736. [Google Scholar] [CrossRef]
  31. Diao, H.; Li, G.; Xu, S.; Kong, C.; Wang, W.; Liu, S.; He, Y. Self-distillation enhanced adaptive pruning of convolutional neural networks. Pattern Recogn 2025, 157, 110942. [Google Scholar] [CrossRef]
  32. Palakonda, V.; Tursunboev, J.; Kang, J.M.; Moon, S. Metaheuristics for pruning convolutional neural networks: A comparative study. Expert Syst. Appl. 2025, 268, 126326. [Google Scholar] [CrossRef]
  33. Wei, W.; Lu, Q.; Huang, C.; Lee, D.; Luo, J. SymRefine: A symbolic regression approach for refining and compressing neural networks. Neurocomputing 2026, 672, 132719. [Google Scholar] [CrossRef]
  34. Yue, Y.; Liu, H.; Liu, X.; da Porto, F.; Saler, E.; Cui, J.; Donà, M. Efficient on-device damage segmentation for cultural heritage using pruning and knowledge distillation. J. Cult. Herit. 2026, 77, 284–293. [Google Scholar] [CrossRef]
  35. He, W.; Mei, S.; Hu, J.; Ma, L.; Hao, S.; Lv, Z. Filter-Wise Mask Pruning and FPGA Acceleration for Object Classification and Detection. Remote Sens. 2025, 17, 3582. [Google Scholar] [CrossRef]
  36. Chung, Y.L. Efficient Lightweight Image Classification via Coordinate Attention and Channel Pruning for Resource-Constrained Systems. Future Internet 2025, 17, 489. [Google Scholar] [CrossRef]
  37. Cheng, Y.; Zhang, K.; Xu, Z. Matrix Theory; Northwestern Polytechnical University Press: Xi’an, China, 2013. [Google Scholar]
  38. Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
  39. Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading Digits in Natural Images with Unsupervised Feature Learning; NeurIPS: Granada, Spain, 2011; Volume 2011, p. 4. [Google Scholar]
  40. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, Canada, 2009; Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 10 November 2025).
  41. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar]
  42. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25, pp. 1097–1105. [Google Scholar]
  43. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  44. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
Figure 1. Schematic illustration of the linear transformation from the input space to the feature space, where the transformation matrix is represented by the filter matrix, highlighting redundancy induced by correlated filters. The four modules from top to bottom correspond to Step 1: a schematic illustration of the convolution operation in a single convolutional layer, Step 2: reformulating the convolution computation as matrix multiplication, Step 3: constructing a linear transformation by reshaping filters into row vectors, and Step 4: redundancy induced by correlated filters in the resulting transformation.
Figure 1. Schematic illustration of the linear transformation from the input space to the feature space, where the transformation matrix is represented by the filter matrix, highlighting redundancy induced by correlated filters. The four modules from top to bottom correspond to Step 1: a schematic illustration of the convolution operation in a single convolutional layer, Step 2: reformulating the convolution computation as matrix multiplication, Step 3: constructing a linear transformation by reshaping filters into row vectors, and Step 4: redundancy induced by correlated filters in the resulting transformation.
Electronics 15 00794 g001
Figure 2. Schematic illustration of filter redundancy in a convolutional layer. The same input image is processed by three filters, f 1 , f 2 , and f 3 , producing the corresponding output feature maps O 1 , O 2 , and O 3 through convolution. Filters f 1 and f 2 are highly similar and generate nearly identical feature maps, indicating redundancy, whereas filter f 3 produces a distinct feature map.
Figure 2. Schematic illustration of filter redundancy in a convolutional layer. The same input image is processed by three filters, f 1 , f 2 , and f 3 , producing the corresponding output feature maps O 1 , O 2 , and O 3 through convolution. Filters f 1 and f 2 are highly similar and generate nearly identical feature maps, indicating redundancy, whereas filter f 3 produces a distinct feature map.
Electronics 15 00794 g002
Figure 3. (a) The Δ r a n k corresponding to each filter in the 10th convolution layer of VGG_16. (b) The Δ n u c l e a r _ n o r m corresponds to each filter in the 10th convolution layer of VGG_16, which is also the independence score in this paper.
Figure 3. (a) The Δ r a n k corresponding to each filter in the 10th convolution layer of VGG_16. (b) The Δ n u c l e a r _ n o r m corresponds to each filter in the 10th convolution layer of VGG_16, which is also the independence score in this paper.
Electronics 15 00794 g003
Figure 4. The feature maps corresponding to 16 filters in the first convolutional layer of the ResNet_56 trained on the CIFAR-10 are obtained from any 8 samples, with each row of feature maps coming from the same sample. The numbers in parentheses represent the filter numbers. (a) Filters sorted by independence score. (b) Filters sorted by L2 norm. The feature maps with a smaller L2 norm value but containing more detailed information will be retained, as shown in the red dashed box. The feature map with a larger L2 norm value but only containing contour information will be discarded, as shown in the purple dashed box.
Figure 4. The feature maps corresponding to 16 filters in the first convolutional layer of the ResNet_56 trained on the CIFAR-10 are obtained from any 8 samples, with each row of feature maps coming from the same sample. The numbers in parentheses represent the filter numbers. (a) Filters sorted by independence score. (b) Filters sorted by L2 norm. The feature maps with a smaller L2 norm value but containing more detailed information will be retained, as shown in the red dashed box. The feature map with a larger L2 norm value but only containing contour information will be discarded, as shown in the purple dashed box.
Electronics 15 00794 g004
Figure 5. The sorted independence score sequences for the 7th and 11th convolutional layers of VGG_16 trained on CIFAR-10 is shown. The blue line ( η = 0.2877 ) indicates a higher number of high-independence filters, suggesting that a smaller pruning rate should be applied. In contrast, the orange line ( η = 13.8155 ) represents more low-independence filters, allowing for a larger pruning rate. The red dashed line indicates the threshold of independence score at 0.4.
Figure 5. The sorted independence score sequences for the 7th and 11th convolutional layers of VGG_16 trained on CIFAR-10 is shown. The blue line ( η = 0.2877 ) indicates a higher number of high-independence filters, suggesting that a smaller pruning rate should be applied. In contrast, the orange line ( η = 13.8155 ) represents more low-independence filters, allowing for a larger pruning rate. The red dashed line indicates the threshold of independence score at 0.4.
Electronics 15 00794 g005
Figure 6. Diagram of the filter pruning process based on the nuclear norm, consisting of two steps: calculating the independence score and pruning filters based on this score.
Figure 6. Diagram of the filter pruning process based on the nuclear norm, consisting of two steps: calculating the independence score and pruning filters based on this score.
Electronics 15 00794 g006
Figure 7. Visualization of independence score distributions across layers for different network architectures and datasets. Each subfigure corresponds to a specific network–dataset combination: (a) AlexNet on FashionMNIST, (b) AlexNet on SVHN, (c) VGG_11 on SVHN, (d) VGG_16 on CIFAR-10, (e) VGG_16 on CIFAR-100, (f) ResNet_56 on CIFAR-10, (g) ResNet_56 on CIFAR-100, and (h) ResNet50 on ImageNet. For each configuration, the distribution of layer-wise independence scores is characterized by the pruning rate indicator η , with the variance of η across layers reported above each plot. Layers exhibiting outlier distributions are highlighted by red dashed ellipses.
Figure 7. Visualization of independence score distributions across layers for different network architectures and datasets. Each subfigure corresponds to a specific network–dataset combination: (a) AlexNet on FashionMNIST, (b) AlexNet on SVHN, (c) VGG_11 on SVHN, (d) VGG_16 on CIFAR-10, (e) VGG_16 on CIFAR-100, (f) ResNet_56 on CIFAR-10, (g) ResNet_56 on CIFAR-100, and (h) ResNet50 on ImageNet. For each configuration, the distribution of layer-wise independence scores is characterized by the pruning rate indicator η , with the variance of η across layers reported above each plot. Layers exhibiting outlier distributions are highlighted by red dashed ellipses.
Electronics 15 00794 g007
Figure 8. Layer-wise sensitivity analysis of VGG_16 on CIFAR-10 by sequentially pruning filters. Each subfigure corresponds to one layer in the network. Filters are removed one by one in ascending order of normalized independence scores. The Top-1 accuracy (green curve) and loss (orange curve) are reported without retraining after each pruning step. The red star denotes the pruning point at which the accuracy first drops below 94%. The baseline Top-1 accuracy before pruning is 94.39%.
Figure 8. Layer-wise sensitivity analysis of VGG_16 on CIFAR-10 by sequentially pruning filters. Each subfigure corresponds to one layer in the network. Filters are removed one by one in ascending order of normalized independence scores. The Top-1 accuracy (green curve) and loss (orange curve) are reported without retraining after each pruning step. The red star denotes the pruning point at which the accuracy first drops below 94%. The baseline Top-1 accuracy before pruning is 94.39%.
Electronics 15 00794 g008
Figure 9. Accuracy of pruned VGG_16 on CIFAR-10 with varying pruning strategies.
Figure 9. Accuracy of pruned VGG_16 on CIFAR-10 with varying pruning strategies.
Electronics 15 00794 g009
Figure 10. Efficient Neural Networks on NPU-based IoT Devices.
Figure 10. Efficient Neural Networks on NPU-based IoT Devices.
Electronics 15 00794 g010
Table 1. Comparison of representative filter pruning methods for convolutional neural networks.
Table 1. Comparison of representative filter pruning methods for convolutional neural networks.
MethodMethodologyDatasetsResults *Limitations
HRank (2020) [21]Pruning filters that produce low-rank feature maps.CIFAR-10, ImageNetReduces FLOPs and parameters by approximately 40–60% with minimal accuracy loss.Requires feature map rank computation during pruning.
CHIP (2021) [19]Inter-channel pruning based on filter independence.CIFAR-10, ImageNetReduces model size and FLOPs by approximately 40–50% with slight accuracy improvement.Requires correlation analysis and is mainly validated on ResNet architectures.
Interpretability-based (2021) [24]Layer-wise pruning guided by activation maximization.CIFAR-10, ImageNetAchieves about 60% model compression with minimal accuracy degradation.Requires per-layer visualization analysis.
FSABP (2022) [25]Backward layer-wise pruning based on filter similarity.MNIST, CIFAR-10, KAGGLE, ILSVRC2012Effectively reduces parameter redundancy with negligible accuracy change.Lacks a theoretical justification for similarity-based redundancy estimation.
CLR-RNF (2022) [28]Non-learning pruning via cross-layer ranking and reciprocal nearest filter selection.CIFAR-10, ImageNetRemoves 60–95% of FLOPs and parameters with minor accuracy degradation.Lacks theoretical explanation for the pruning criterion.
VNGEP (2023) [20]Graph-based pruning using von Neumann graph entropy.CIFAR-10, ImageNetReduces model size and FLOPs by approximately 40–50% while maintaining accuracy.Graph construction and entropy computation introduce additional overhead.
ICP (2023) [22]Iterative clustering-based pruning with knowledge transfer.CIFAR-10, CIFAR-100, ImageNet, PASCAL VOCReduces parameters and FLOPs with minor or no accuracy loss.Iterative pruning and distillation increase training complexity.
Graph-based One-shot (2023) [23]One-shot filter pruning using graph-based similarity modeling.CIFAR-10Achieves large reductions (approximately 60–90%) with minimal accuracy loss.Graph construction introduces additional computational cost.
EACP (2023) [27]Automatic channel pruning via hierarchical clustering and optimization.CIFAR-10, ILSVRC2012Reduces FLOPs by over 50% with maintained or slightly improved accuracy.Relies on clustering and iterative optimization procedures.
SkeletonNN (2023) [29]Iterative pruning with low-rank regularization.MNIST, CIFAR-10, ILSVRC2012Achieves high pruning rates with maintained or improved accuracy.Alternating training and pruning increase computational cost.
GlobalPru (2024) [26]Static channel pruning via global attention-based ranking.ImageNet, SVHN, CIFAR-10/100Achieves high compression ratios with strong performance.May suffer from sample bias and introduces additional computation and memory overhead.
EGSSO (2025) [30]Entropy-guided layer-wise pruning with search space optimization.COCO, VisDroneImproves accuracy while reducing computational cost.Entropy computation and iterative search add overhead.
SDAP (2025) [31]Adaptive channel pruning integrated into training using channel gates and self-distillation.CIFAR-10/100, ImageNetRemoves at least 75% of parameters while maintaining or improving accuracy.Introduces additional distillation loss and trainable pruning parameters.
FMP (2025) [35]Filter-wise mask pruning with structural constraints and FPGA-oriented acceleration.CIFAR-10, Mini-ImageNet, Aerial datasetsAchieves high pruning rates (50–85%) with negligible accuracy loss and effective hardware speedup.Requires auxiliary mask learning and hardware-specific design.
CA + Pruning (2025) [36]Pruning-aware coordinate attention combined with L1-regularized channel pruning.CIFAR-10, Fashion-MNIST, GTSRBReduces parameters and FLOPs while maintaining or slightly improving accuracy.Introduces attention overhead without explicitly modeling channel redundancy.
* Results are reported as described in the original papers.
Table 2. Summary of benchmark datasets used in our experiments.
Table 2. Summary of benchmark datasets used in our experiments.
DatasetTraining SamplesTesting SamplesClassesImage Size
FashionMNIST60,00010,00010 28 × 28 (grayscale)
SVHN73,25726,03210 32 × 32 (RGB)
CIFAR-1060,00010,00010 32 × 32 (RGB)
CIFAR-10050,00010,000100 32 × 32 (RGB)
ImageNet1,200,00050,0001000 224 × 224 (RGB)
Table 3. Performance of pruned networks with different pruning rates: AlexNet on FashionMNIST and AlexNet and VGG_11 on SVHN.
Table 3. Performance of pruned networks with different pruning rates: AlexNet on FashionMNIST and AlexNet and VGG_11 on SVHN.
DatasetModelTop1_Acc% (Δ)#Params (↓%)#FLOPs (↓%)
Fashion MNISTAlexNet92.38 (0)1.87 M (0)238 M (0)
92.26 (−0.12)1.51 M (19)198 M (17)
92.12 (−0.26)1.19 M (36)152 M (36)
92.07 (−0.31)0.92 M (51)117 M (51)
92.19 (−0.19)0.68 M (64)85.8 M (64)
92.05 (−0.33)0.48 M (75)60.4 M (75)
92.03 (−0.35)0.30 M (84)38.4 M (84)
91.73 (−0.65)0.17 M (91)21.7 M (91)
91.69 (−0.69)0.08 M (96)9.79 M (96)
91.01 (−1.37)0.02 M (99)2.52 M (99)
SVHNAlexNet93.52 (0)1.87 M (0)245 M (0)
93.78 (0.26)1.51 M (19)198 M (19)
93.83 (0.31)1.20 M (36)157 M (36)
93.69 (0.17)0.92 M (51)121 M (51)
93.79 (0.27)0.68 M (64)89.7 M (63)
93.68 (0.16)0.48 M (74)63.7 M (74)
93.55 (0.03)0.30 M (84)41.0 M (83)
93.42 (−0.1)0.17 M (91)23.7 M (90)
93.46 (−0.06)0.08 M (96)11.1 M (95)
91.84 (−1.68)0.02 M (99)3.14 M (99)
VGG_1194.71 (0)9.23 M (0)153 M (0)
94.85 (0.14)7.45 M (19)124 M (19)
94.80 (0.09)5.89 M (36)98.0 M (36)
94.87 (0.16)4.51 M (51)75.2 M (51)
94.76 (0.05)3.32 M (64)55.4 M (64)
94.61 (−0.1)2.31 M (75)38.9 M (75)
94.73 (0.02)1.47 M (84)24.8 M (84)
94.52 (−0.19)0.83 M (91)14.1 M (91)
93.92 (−0.79)0.37 M (96)6.37 M (96)
92.08 (−2.63)0.09 M (99)1.67 M (99)
Table 4. Performance comparison of pruned VGG_16, ResNet_56, and ResNet_110 on CIFAR-10 with different pruning methods.
Table 4. Performance comparison of pruned VGG_16, ResNet_56, and ResNet_110 on CIFAR-10 with different pruning methods.
ModelMethodTop-1 Accuracy (%)#Params (↓%)#FLOPs (↓%)
BaselinePruned Δ
VGG_16CHIP1 [19]93.9693.86−0.12.76 M (81.6)131.17 M (58.1)
VNGEP1 [20]93.9694.330.372.76 M (81.6)131.17 M (58.1)
Li et al.1 [23]93.9692.59−1.371.38 M (90.8)125.25 M (60)
ours (p = 0.73)94.3994.09−0.31.13 M (92.4)111.89 M (64.43)
CHIP2 [19]93.9693.72−0.242.50 M (83.3)104.78 M (66.6)
VNGEP2 [20]93.9693.95−0.012.50 M (83.3)104.78 M (66.6)
EACP1 [27]93.2693.290.033.62 M (75.4)88.64 M (71.77)
ours (p = 0.76)94.3993.57−0.820.87 M (94.2)84.47 M (73.15)
Li et al.2 [23]93.9690.92−3.040.7 M (95.3)82.65 M (73.6)
CLR-RNF [28]-93.32-0.74 M (95.0)81.31 M (74.1)
EACP2 [27]93.2693.2−0.063.36 M (77.19)71.06 M (77.37)
CHIP3 [19]93.9693.18−0.781.90 M (87.3)66.95 M (78.6)
VNGEP3 [20]93.9693.3−0.661.90 M (87.3)66.95 M (78.6)
ours (p = 0.76)94.3993.16−1.230.65 M (95.67)65.9 M (79.05)
ResNet_56VNGEP1 [20]93.2694.281.020.66 M (22.3)90.35 M (28)
Li et al.1 [23]93.2692.88−0.380.57 M (33.7)80.77 M (35.8)
ours (p = 0.23)94.6994.23−0.460.5 M (41.21)80.64 M (36.81)
CHIP1 [19]93.2694.160.90.48 M (42.8)65.94 M (47.4)
VNGEP2 [20]93.2693.650.390.48 M (42.8)65.94 M (47.4)
ours (p = 0.33)94.6994.14−0.550.38 M (55.54)64.34 M (49.58)
Li et al.2 [23]93.2692.42−0.840.48 M (44)62.95 M (49.9)
CLR-RNF [28]-93.27-0.38 M (55.5)54 M (57.3)
EACP [27]93.293.11−0.090.31 M (63.5)40.44 M (68)
CHIP2 [19]93.2692.05−1.210.24 M (71.8)34.79 M (72.3)
VNGEP3 [20]93.2692.63−0.630.24 M (71.8)34.79 M (72.3)
ours (p = 0.48)94.6992.44−2.250.21 M (74.85)32.55 M (74.5)
ResNet_110CHIP1 [19]93.594.440.940.89 M (48.3)121.09 M (52.1)
VNGEP1 [20]93.594.571.070.89 M (48.3)121.09 M (52.1)
ours (p = 0.34)95.0794.34−0.730.75 M (56.6)115.45 M (55.09)
Li et al.1 [23]93.592.97−0.530.64 M (63.1)107.64 M (57.5)
VNGEP2 [20]93.594.220.720.72 M (58.1)101.97 M (59.6)
ours (p = 0.39)95.0794.22−0.850.65 M (62.27)99.9 M (61.14)
EACP1 [27]93.393.390.090.67 M (61.3)87.36(65.7)
CLR-RNF [28]-93.71-0.53 M (69.1)86.8 M (66)
ours (p = 0.42)95.0793.7−1.370.55 M (68.23)82.49 M (67.92)
EACP2 [27]93.393.350.050.59 M (65.9)80.9 M (68.3)
Li et al.2 [23]93.592.72−0.780.51 M (70.4)74.81 M (70.4)
VNGEP3 [20]93.593.720.220.54 M (68.3)71.69 M (71.6)
CHIP2 [19]93.593.630.130.54 M (68.3)71.69 M (71.6)
ours (p = 0.48)95.0793.44−1.630.43 M (74.87)65.43 M (74.55)
Table 5. Performance comparison of pruned VGG_16 and ResNet_56 on CIFAR-100 with different pruning methods, and performance of pruned ResNet_110 on CIFAR-100 with different pruning rates.
Table 5. Performance comparison of pruned VGG_16 and ResNet_56 on CIFAR-100 with different pruning methods, and performance of pruned ResNet_110 on CIFAR-100 with different pruning rates.
ModelMethodTop-1 Accuracy (%)#Params (↓%)#FLOPs (↓%)
BaselinePruned Δ
VGG_16APRS1 [13]73.7773.71−0.067.21 M (52)220.26 M (29.8)
ours (p = 0.4)74.7474.920.185.79 M (61.5)191.32 M (39.2)
SP [8]72.4472.780.34-173 M (45)
ASTER [17]73.4573.950.5-161 M (48.4)
APRS2 [13]73.7773.02−0.755.42 M (64)156.44 M (50.1)
ours (p = 0.61)74.7474.12−0.622.81 M (81.3)131.42 M (58.2)
APRS3 [13]73.7770.79−2.984.93 M (67.2)93.44 M (70.2)
ours (p = 0.64)74.7473.86−0.882.42 M (83.9)92.32 M (70.7)
ResNet_56ASTER1 [17]72.3272.23−0.09-92(26.7)
ours (p = 0.12)72.7271.7−1.020.68 M (21.3)91.88 M (28)
ASTER2 [17]72.3271.78−0.54-58.4 M (53.6)
ICP [22]71.3670.92−0.44-55.67 M (56.2)
ours (p = 0.26)72.7269.15−3.570.47 M (45.4)55.19 M (56.8)
ResNet_110ours (p = 0.07)74.4274.08−0.341.49 M (13.9)208.05 M (19.1)
ours (p = 0.14)74.4273.03−1.391.3 M (25)169.59 M (34)
ours (p = 0.21)74.4272.46−1.961.05 M (39.3)142.09 M (44.7)
Table 6. Performance comparison of pruned ResNet50 on ImageNet with different pruning methods.
Table 6. Performance comparison of pruned ResNet50 on ImageNet with different pruning methods.
MethodTop-1 Accuracy (%)Top-5 Accuracy (%)#Params (↓%)#FLOPs (↓%)
BaselinePruned Δ BaselinePruned Δ
CLR-RNF [28]-74.85-92.31-16.92 M (33.8)2.45 G (40.68)
VNGEP1 [20]76.1576.40.2592.8792.990.1215.09 M (40.8)2.28 G (44.8)
CHIP1 [19]76.1576.30.1592.8793.020.1515.09 M (40.8)2.28 G (44.8)
ours (p = 0.17)76.1576.540.3992.8793.140.2715.09 M (40.8)2.28 G (44.8)
VNGEP2 [20]76.1575.32−0.8392.8792.51−0.3611.05 M (56.7)1.54 G (62.8)
CHIP2 [19]76.1575.26−0.8992.8792.53−0.3411.05 M (56.7)1.54 G (62.8)
ours (p = 0.32)76.1575.39−0.7692.8792.43−0.4411.05 M (56.7)1.54 G (62.8)
Table 7. The performance of VGG_16 on CIFAR-10 corresponding to different pruning strategies.
Table 7. The performance of VGG_16 on CIFAR-10 corresponding to different pruning strategies.
p 0 p 13 Acc (%)#Params (↓%)#FLOPs (↓%)
0.3, 0, 0, 0, 0, 0, 0, 0, 0.2, 0.4, 0.5, 0.8, 0.9, 0.694.066.95 M (53.67)251.05 M (20.19)
0.56, 0, 0, 0, 0, 0, 0, 0, 0.56, 0.56, 0.56, 0.56, 0.56, 0.5690.995.84 M (61.08)218.28 M (30.61)
0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.37, 0.3718.955.98 M (60.13)126.85 M (59.68)
0.24, 0.63, 0.63, 0.63, 0.63, 0.63, 0.63, 0.63, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24106.72 M (55.2)85.49 M (72.83)
0.1, 0.89, 0.89, 0.89, 0.89, 0.89, 0.89, 0.89, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1108.16 M (45.6)66.1 M (78.99)
Table 8. Performance comparison of original and pruned networks on NPU.
Table 8. Performance comparison of original and pruned networks on NPU.
NetworkWeight Memory (B)Total Time (s)
OriginalPrunedRDIOriginalPrunedRDI
VGG_1628.59 M1.67 M−94%1.670.26−84%
ResNet_561.64 M0.24 M−85%0.380.19−50%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Bie, H.; Jing, Z.; Zhi, Y.; Fan, Y.; Ma, W. Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI. Electronics 2026, 15, 794. https://doi.org/10.3390/electronics15040794

AMA Style

Wang J, Bie H, Jing Z, Zhi Y, Fan Y, Ma W. Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI. Electronics. 2026; 15(4):794. https://doi.org/10.3390/electronics15040794

Chicago/Turabian Style

Wang, Jiali, Hongxia Bie, Zhao Jing, Yichen Zhi, Yongkai Fan, and Wentao Ma. 2026. "Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI" Electronics 15, no. 4: 794. https://doi.org/10.3390/electronics15040794

APA Style

Wang, J., Bie, H., Jing, Z., Zhi, Y., Fan, Y., & Ma, W. (2026). Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI. Electronics, 15(4), 794. https://doi.org/10.3390/electronics15040794

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop