Next Article in Journal
Personalized Human Activity Recognition Based on Integrated Wearable Sensor and Transfer Learning
Previous Article in Journal
Passive Wireless Dual-Tag UHF RFID Sensor System for Surface Crack Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks

School of Electronics and Information, Northwestern Polytechnical University, 127 West Youyi Road, Xi’an 710072, China
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(3), 880; https://doi.org/10.3390/s21030880
Submission received: 24 December 2020 / Revised: 25 January 2021 / Accepted: 25 January 2021 / Published: 28 January 2021
(This article belongs to the Section Intelligent Sensors)

Abstract

:
Deep neural networks have evolved significantly in the past decades and are now able to achieve better progression of sensor data. Nonetheless, most of the deep models verify the ruling maxim in deep learning—bigger is better—so they have very complex structures. As the models become more complex, the computational complexity and resource consumption of these deep models are increasing significantly, making them difficult to perform on resource-limited platforms, such as sensor platforms. In this paper, we observe that different layers often have different pruning requirements, and propose a differential evolutionary layer-wise weight pruning method. Firstly, the pruning sensitivity of each layer is analyzed, and then the network is compressed by iterating the weight pruning process. Unlike some other methods that deal with pruning ratio by greedy ways or statistical analysis, we establish an optimization model to find the optimal pruning sensitivity set for each layer. Differential evolution is an effective method based on population optimization which can be used to address this task. Furthermore, we adopt a strategy to recovery some of the removed connections to increase the capacity of the pruned model during the fine-tuning phase. The effectiveness of our method has been demonstrated in experimental studies. Our method compresses the number of weight parameters in LeNet-300-100, LeNet-5, AlexNet and VGG16 by 24 × , 14 × , 29 × and 12 × , respectively.

1. Introduction

Deep neural networks have significant performance in computer vision, speech recognition, natural language processing, etc. For sensor data processing, deep neural networks can also achieve good results. However, we must face up to some problems. Deep neural networks often require huge computing storage resources, while the resources of sensor devices are usually limited. Several investigate researches on deep neural networks, such as [1], provide a comprehensive analysis of important metrics for practical applications: accuracy, memory footprint, parameters, operations count, inference time and power consumption. With the development of deep neural networks, the number of network parameters increases rapidly. For example, in order to solve the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), on the ImageNet dataset, AlexNet [2] has about 60 million parameters, while VGG [3] has about 150 million parameters. Even if some more efficient connections or modules are designed, such as residual network [4] or inception network [5], the size of parameter is reduced, but it is still huge. Deep neural networks with superior performance have deeper and wider architectures, which can result in expensive storage and computing costs. Especially for sensor devices or platforms, these deep neural networks are difficult to deploy as usual. Therefore, in order to reduce the size of the model, it is necessary to simplify the neural network architecture. Neural network pruning is a simple yet efficient method, which aims at removing some unimportant synapses and neurons to obtain sparse neural networks.
Neural network pruning is a traditional task that has received increasing interest with the development of neural networks. Especially, as deep neural network plays an increasing important role in practical applications, neural network pruning has become a highlight area of research, as it helps to achieve good working of deep neural networks on mobile or embedded devices. Moreover, neural network pruning is also a technique for designing or searching neural network structures. Back in 1990, LeCun et al. proposed Optimal Brain Damage (OBD) [6], which reduces the number of connections by finding the minimum active connections. In [7], Hassibi et al. used a method named Optimal Brain Surgeon (OBS) to remove the unimportant weights determined by the second-order derivative information. Compared to OBD, OBS has better pruning effectiveness, but the computational cost is more expensive. From these two methods onwards, many neural network pruning methods based on neurons or connections importance have been constantly proposed [8,9,10,11,12,13]. Moreover, some layer-wise methods [13,14,15] have been proposed for multilayer feed forward networks. For convolutional neural networks (CNNs), some structural pruning strategies, such as channel or filter pruning, have been proposed [16,17]. Among these structural pruning methods, they pay more to channel wise, kernel wise and inter kernel strides sparsity.
In model pruning, we not only want a sufficiently sparse network, but also want the network to retain as much of its original performance as possible. Most existing neural network weight pruning methods remove some unimportant weights or neurons by preset pruning ratio of each layer or global pruning ratio for the whole model. A simple experiment was conducted to verify the influence of pruning ratio per layer on pruning performance. In the experiment, we pruned the LeNet-300-100 that is a fully connected network with three layers, and the total pruning rate in the different pruning schemes is always 80%, but the pruning rate of each layer was different. The experimental results are shown in Table 1. The results revealed significant differences in the pruning results obtained by different pruning schemes. Therefore, We believe that this is not the best solution for model pruning, because different layers have different sensitivity to pruning, and this sensitivity is unpredictable. It is possible to promote pruning of the whole network, if each layer is finely pruned. Therefore, the challenge is to determine the pruning sensitivity of each layer. In addition, removing too many connections during pruning can cause accuracy loss, and iterative pruning (cycle pruning and fine-tuning) can be effective in maintaining the accuracy of the pruned model [12,14,18,19]. However, it is difficult to obtain high performance network with high sparsity by simple fine-tuning operation. Thus, we also need to improve the strategy of network fine-tuning after pruning to find a trade-off solution with better accuracy and sparsity.
To address the above problems, we first propose a differential evolutionary neural network compression framework (DENNC). In DENNC, we use a differential evolutionary algorithm to solve the problem of layer sensitivity to pruning. The mathematical representation of model weight pruning is an optimization problem with two terms, two of which correspond to sparsity and network accuracy. The decision variable for the above problem is the pruning mask, which can be determined by the sensitivity of each layer to pruning. After pruning the model with differential evolution, we will fine-tune the pruned model. We also loop the above two steps using an iterative pruning framework to obtain better solutions. It is worth noting that at the beginning of the model fine-tuning, we randomly recover a few of the removed connections to maintain the capacity of the model.
The remainder of this paper is organized as follows. In Section 2, we will review the background of proposed method and some related works. In the third section, the differential evolution framework for compressing neural network will be introduced in detail. We will conduct the experimental studies in Section 5. Finally, we will give concluding remarks of this paper.

2. Background and Related Work

2.1. Differential Evolution

Differential evolution (DE) [20,21,22] is one of the most well-known optimization algorithms in the field of continuous optimization because of its efficient performance. Compare to traditional evolutionary algorithms, DE generates new candidates through the weighted difference of the individuals in current population. More specifically, the weighted difference of several random individuals is added to current individual to generate new solution in mutation operation. After mutation, a crossover operation is performed on the original population and its corresponding mutant individuals. Finally, greedy selection is used to generate new population.
For mutation operations, there are five popular mutation strategies [22] in DE algorithms as below:
  • DE/rand/1:
    v i g = x p 1 g + F × ( x p 2 g x p 3 g )
  • DE/best/1:
    v i g = x b e s t g + F × ( x p 1 g x p 2 g )
  • DE/rand-to-best/1:
    v i g = x i g + F × ( x b e s t g x i g + x p 1 g x p 2 g )
  • DE/best/2:
    v i g = x b e s t g + F × ( x p 1 g x p 2 g + x p 3 g x p 4 g )
  • DE/rand/2:
    v i g = x p 1 g + F × ( x p 2 g x p 3 g + x p 4 g x p 5 g )
where v i g is the mutant individual generated from the i-th individual x i g in current population at the g-th generation. p 1 , p 2 , p 3 , p 4 , p 5 are the random and mutually exclusive indices within the range [1, n], where n is the size of population. And x b e s t g is the best solution in the population of generation g. F is a positive scaling factor to scale the difference vector.
Crossover operation is performed on each population individual x i g and its corresponding mutant individual v i g to generate a trial individual:
u i j g = v i j g i f ( r a n d C R ) x i j g o t h e r w i s e
where the crossover rate C R controls the probability of coping the gene from corresponding mutant individual. For each gene in a individual, we will perform above crossover operation to generate new solution.
x i g + 1 = u i g i f f ( u i g ) f ( x i g ) x i g o t h e r w i s e
Before selection operations, all of the above generated trial individuals are evaluated on the fitness function corresponding to optimization problem. Let f ( x i g ) , f ( u i g ) be the fitness of individual x i g and its corresponding trial individual u i g . Subsequently DE selects the best one between x i g and u i g to enter the new formed population, which can be shown as Equation (7).

2.2. Neural Network Compression

There is significant redundancy in deep neural networks which is mainly caused by over parameterization [23]. These over parameterized models require significant memory and computation resources, and are prone to model over-fitting problem. Therefore, it is necessary to compress the model by removing unimportant parameters to decrease the memory and computation cost. Generally, with the increasing of network compressing ratio, model computational cost is decreasing. For fully connected network, the ratio of computational cost is approximate to weight compression, the computational ratio is about half of compressing ratio for convolutional network [12]. Next, we will introduce some different researches about deep models compression method.

2.2.1. Neural Network Pruning

Neural network pruning is a classic technique in the filed of model compression, and it can be traced back to the 90s in the 20th century [6,7]. In recent years, Han et al. [12,24] have proposed an iterative pruning method to remove small weights below a threshold. The method focuses on weight filtering to obtain architecturally sparse model. However, unless we design special hardware computing units, it is difficult to improve the inference speed of the compressed model. Therefore, some researchers design pruning against neurons to achieve the goal of speeding up model inference [16,25,26]. They define a number of neuron or filter importance metric indices to rank neurons or filters, and achieve model pruning by eliminating less useful neurons or filters.

2.2.2. Parameter Quantization

Parameter quantization is another efficient method for compressing models. Recently, parameter quantization has two main research directions, one is weights sharing, the basic idea of which is that multiple network connections share a weight, and the other is weights reduction, i.e., weight represent with low bit. Han et al. [24] proposed a weight share method based on K-means clustering. The paper clusters the weight matrices of each layer into several clusters by K-means algorithm and represents the weight of the clusters by the cluster center value of each cluster. Chen et al. have designed a new network architecture, Hashnet [27]. A hash function is used to randomly group connection weights into hash buckets, with connections in each bucket sharing the same weight parameters. Dettmers [28] compressed the 32-bit gradient and activation values of to 8-bit, where 8-bit are used to represent floating point numbers. He also used a dynamically determination of the range of exponential and decimal bits to reduce errors. Literature [29] shows that when using a stochastic rounding, 16-bit fixed-point notation can significantly reduce the memory and floating point operations.

2.2.3. Knowledge Distillation

Knowledge distillation can be regarded as a transfer learning method where knowledge from a trained large model (teacher model) is extracted and transferred to a tiny model (student model). Hinton et al. introduced the concept of knowledge in [30] by learning about soft goals. Moreover, they used a temperature parameter to control the soft level of the probability distribution. Romero et al. [31] introduce intermediate layer hits to improve the model. The core idea is to enable the student model to learn the intermediate representation from the teacher model. In [32], Yim et al. argue that the relationship between layers is a better representation of the knowledge than the model output. Therefore, they calculate the flow of solution procedure (FSP) matrix to represent the relationship between layers and transfer the FSP matrix of teacher model to student model.

3. Methodology

In this section, we introduce the proposed differential evolutionary neural network compression method in detail. The proposed method adopts an iterative pruning and fine-tuning framework, and the differential evolution is used to achieve neural network weight pruning. Moreover, we adjust the fine-tuning strategy in order to promote the model capacity, and a simple computational complexity analysis is given at the end of this section.

3.1. Preliminaries

We formally introduce the symbols and annotations for neural network weight pruning firstly. The deep neural network can be parameterized by { W ( i ) R N i × N i + 1 or W ( i ) R K × K × N i × N i + 1 , 1 i < L } , W ( i ) denotes the matrix of connection weights of the i-th layer. N i and N i + 1 denote the number of input and output neuron (channel) of i-th layer, respectively. If the dimension of R is 2, it is a fully connection layer, otherwise it belongs to convolutional layer. K means the kernel size, L denotes the number of layers. We define F as the mask that can guide the process of pruning, and the detailed pruning operation for each layer can be written as follows:
W ^ ( i ) = F ( i ) W ( i ) ,
where W ^ denotes pruned connection weights, ⊙ indicates the Hadamard product operator. The pruning mask F can be constructed according to pruning strategy. For example, when we use threshold method to prune fully connection layers, the F can be calculated by
F i j = 0 if | W i j | < ζ ( l ) 1 otherwise
where ζ ( l ) denotes the pruning threshold of l-th layer.

3.2. Framework of Proposed Method

We follow the iteration pruning framework [12] which divides the neural network pruning into two phases. In the first phase, the original dense network is pruned to sparse network. In the second phase, the sparse network from previous phase is fine-tuned to recovery model performance. It is well known that removing connections affects the model performance, especially when too many connections are removed, the performance of the model is bound to degrade. And fine-tuning the pruned model helps to restore the model performance. Therefore, it is necessary to perform proper fine-tuning operations after differential evolutionary pruning, which can lead to a better pruned model.
The detailed framework of DENNC is summarized in Figure 1. In general, a dense neural network is used as input, such as the first network in Figure 1. The iterative pruning, which implements pruning and fine-tuning alternately, is used to improve final performance of neural network compression. In each iteration, we adopt a mode of fine-tuning after pruning. In the first phase, we implement differential evolutionary layer-wise weight pruning to simplify the dense network to a sparse network (as shown in the second network Figure 1), in which the best individual in population is selected as layer-wise pruning solution to obtain the pruned network. In the second phase, we fine-tune the previous pruned network by updating the reserved weights and reestablishing few connections in order to promote the model capacity. A illustrated result is shown as the third network in Figure 1. The differential evolutionary neural network weight pruning and fine-tuning strategy will be described in detail bellow.

3.3. Differential Evolutionary Neural Network Weight Pruning

For hierarchical neural networks, different layers can usually extract different features, and thus the weight distribution of each layer is different. Taking a fully connected neural network as example, it is impossible for us to remove 90% connections of each layer without losing accuracy. On the one hand, if we prune too many connections, the compressed neural network will not perform as well as before. On the other hand, if we remove too few connections, we will not get a sufficiently streamlined network. Therefore, it is not reasonable to use a uniform pruning threshold for all layers. To address this problem, we use a differential evolutionary algorithm to autonomously optimize the pruning sensitivity of each layer. We design the objective function of differential evolution to directly respond to the pruning effect, and set the pruning sensitivity of each layer as the decision variable of the objective function.
Before establishing objective function to optimize pruning sensitivity, we first denote the error ratio of the neural network E ( · ) by
E ( W ) = F P + F N T P + F P + T N + F N | W ,
where W denotes the current weight of the model, T P —True Positive—the number of observations correctly assigned to the positive class; T N —True Negative–the number of observations correctly assigned to the negative class; F P —False Positive—the number of observations assigned by the model to the positive class, which in reality belong to the negative class; F N —False Negative—the number of observations assigned by the model to the negative class, which actually belongs to the positive class. For neural network weight pruning, the evaluation of pruning level can be stated as
f = F l 0 s . t . E ( W ^ ) E ( W ) < ϵ ,
where ϵ is a very small number. We can transfer the constrained optimization problem (11) to the unconstrained problem with a parameter λ , and the pruning problem can be rewritten as
min f = λ F l 0   +   E ( W ^ ) E ( W ) = λ F l 0   +   E ( F W ) E ( W ) ,
where F can be calculated by Equation (9). In order to balance the huge numerical difference between these two terms, we normalize the F l 0 as
min f = λ F l 0 F   +   E ( W ^ ) E ( W ) = λ F l 0 F   +   E ( F W ) E ( W ) ,
where F denotes the number of elements in F . From Equations (9) and (13), we know that the parameter ζ affects the result of pruning and is independent for each layer. Therefore, we can refer to ζ as the pruning sensitivity.
The differential evolutionary algorithms have significant advantages in solving the NP-hard problem. The problem (13) is clearly an NP-hard problem, which is difficult to be handled by general optimization methods due to l 0 -norm. Therefore, we use a differential evolutionary algorithm to handle the above problem for the purpose of analyzing pruning sensitivity. In the differential evolutionary pruning sensitivity analysis (DEPS), we let ζ be the decision variable, Z = ( ζ 1 , ζ 2 , . . . , ζ L ) T , which measures the pruning sensitivity of each layer. According to Equation (13), we simplify it appropriately to create the fitness function as follow
min f = λ F l 0 F   +   E ( F W ) E ( W ) = λ F l 0 F   +   E ( F W ) C λ F l 0 F   +   E ( F W )
where C denotes a constant. In this fitness function, we are interested in the pruning effect and the accuracy of the model after pruning, rather than the change in accuracy, which increases the probability of getting a more accurate model. The pseudocode of DEPS is shown in Algorithm 1. After running DEPS, we can obtain the final pruning sensitivity Z and the pruning mask F can be calculated by Equation (9). Next, the model will be pruned with mask F by Equation (8).
Algorithm 1: Pseudocode of DEPS
Sensors 21 00880 i001

3.4. Fine-Tune Pruned Neural Network

Through the above pruning operation, we can obtain a sparse neural network, as shown in the second network in Figure 1. Generally, the pruned model is sparse but with some loss of performance. Therefore, it is necessary to compensate for the degradation in model accuracy caused by the directly removal of connections. Fine-tuning is a common strategy used in deep neural network training, which improves the performance of the model on specific problems by adjusting the pre-trained model. In our neural network pruning framework, we use the fine-tuning strategy to recover the model performance after the pruning operation.
In the model fine-tuning stage, we not only update the remaining weights, but also recover a few pruned connections, which is known as the recovery connections fine-tuning strategy. Updating the remaining weights helps to improve the accuracy of the model, while restoring pruned connections can promote the capacity og the model. During fine-tuning, we randomly recover some previously pruned connections and set random weights to them firstly, and then train the pruned neural network for a few epochs as usual. The detailed fine-tuning parameters are the same as those of the model training, except that the epochs are few.

3.5. Computational Complexity of DENNC

The computational complexity of DENNC model consists of two parts, the computational cost of DEPS for model pruning and the computational cost of fine-tuning pruned model. In DEPS, the main complexity comes from fitness evaluation. Assuming the number of weight parameters is P, the computational cost of fitness evaluation is O ( P ) for each individual. Therefore, for a population with N individuals, the computational cost is O ( N P ) . The total computational cost of DEPS is O ( N P G ) because of generation G. Assuming the computational cost of neural network training for each epoch is O ( T ) , the fine-tuning costs O ( e · T ) computations, e denotes the number of epoch in fine-tuning. Thence, the computational cost of DENNC in each cycle is O ( N P G + e · T ) . If K stands for total number of iteration, the computational complexity of our method is O ( K ( N P G + e · T ) ) .

4. Experimental Studies

In this section, we demonstrate the performance of DENNC with experimental studies. Firstly, we simply introduce datasets and experimental settings. Secondly, experimental results on LeNet, AlexNet and VGG16 are presented in detail. Lastly, we analyze some parameters and restoring connections fine-tuning strategies in the ablation studies.

4.1. Datasets and Experimental Settings

In experimental studies, we choose four different neural networks on two datasets, LeNet-300-100 and LeNet-5 [33] on MNIST, AlexNet [2] and VGG16 [3] on CIFAR10. A simple introduction about four models are shown as follow.
  • LeNet-300-100 is a fully connected neural network which has two hidden layers with 300 and 100 neurons, respectively.
  • LeNet-5 is a simple convolutional neural network with two convolutional layers and three fully connected layers.
  • AlexNet is a deep convolutional neural network, which includes five convolutional layers and three fully connected layers.
  • VGG16 is a more deeper convolutional neural network with sixteen layers in total, in which thirteen convolutional layers and three fully connected layers are included.
Detailed network information, such as model type, accuracy and structure of each layer, is shown in the following Table 2. MNIST [33] has a training set of 60,000 examples, and a test set of 10,000 examples of handwritten digits. The images are centered in a 28 × 28 image. CIFAR10 [34] consists of 60,000 32 × 32 color images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images in CIFAR10.
The comparison methods consist of naive cut [35], iterative pruning [12] and multi-objective neural network pruning (MONNP) [36]. Note all of these methods prune neural network with threshold optimization. Naive cut method removes the weights which are below the predefined global threshold. The computational complexity is O ( P ) , which is the smallest in these methods, where P is the number of weight parameters. Iterative pruning method mentioned in [12] also prunes weights below predefined global threshold firstly, and then fine-tunes the pruned model, recursively runs the above two operations until the stop criterion is satisfied. When iteration number is K, the computational cost of iterative pruning is O ( K ( P + e T ) ) , where e and T mean the epoch number in fine-tuning and the cost of fine-tuning for each epoch. The MONNP establishes a multi-objective neural network pruning model and uses multi-objective particle swarm optimization to handle the problem. Compared with proposed DENNC, the computational complexity is similar.
Note that all model used in experiment are trained by ourselves because we fail to obtain original model weights. Therefore there are some different results compared with that of reference papers.

4.2. Experimental Results

4.2.1. Overall Results

Firstly, overall experimental results of proposed DENNC and comparison methods are shown in Table 3. In Table 3, we use five metrics to measure the effect of neural network pruning, which are the error of model, the number of weights, the percentage of pruned weights, the memory resource of model and compression ratio, respectively. It is clear that our method can efficiently prune these four models with a compressing ratio of 24.33, 14.47, 29.15 and 12.55, respectively. There is no doubt that the model obtained by our method requires minimal memory resource. From the perspective of model error, our DENNC performs best on LeNet-300-100 with smaller error, but on the other three models, the model error of DENNC is bigger than that of original model. Compared with the other methods, our method appears to be at a moderate level, and keeps acceptable errors. While it’s not doing very well on error, our method pruned the most weight parameters on all models. In the view of compressing ratio, the performance of our method is undoubtedly the best. In addition to our method, the iterative pruning method performs better. However, under the comprehensive consideration of error and compressing ratio, our method performs better than the iterative pruning method, because the iterative pruning method has bigger error on three models. In summary, the experimental results prove that our method can efficiently compress models and outperforms the comparison pruning methods.
Secondly, we analyze the performance of the proposed DENNC on LeNet-5 pruning task. Our DENNC belongs iterative pruning methods, and in each iteration, we pruning weights by the differential evolutionary layer-wise weight pruning method. So the fitness curves of the differential evolutionary layer-wise weight pruning under 100 generations are shown in Figure 2. There are five curves corresponding to the fitness changes of iteration 1, 2, 5, 15 and 30, respectively. We can clearly know that as the number of iterations increases, the overall fitness is decreasing. For example, the green line has smallest fitness and the corresponding iteration is the maximum of 30. It reveals that our iterative pruning is effective. For each curve, as evolutionary generation increases, the fitness is progressively decreasing in general, even if it remains unchanged in some generations. We can also observe that the range of variation for each curve is very limited, which indicates that the effect of a single pruning session is limited and that iterative pruning is necessary. The detailed iterative pruning effects with the number of iteration is shown in Figure 3. From the fitness curve, we can know that as the number of iterations increases, the fitness is decreasing in general. And in phases where the number of iterations is less than 5, fitness decreases faster, and as the number of iterations becomes larger, fitness decreases more slowly. The fitness is not always decreasing, sometimes the fitness may become slightly lager with the increasing of iteration. It is may be the result of our recovery connection fine-tuning strategy.
Next, we will analyze pruning results of each model in detail as follow.

4.2.2. LeNet on MNIST

We firstly prune LeNet-300-100 and LeNet5 on MNIST. The detailed pruning results of each layer of these two neural networks are shown in Table 4 and Table 5, respectively.
For LeNet-300-100, we present pruning results of each layer in Table 4. From Table 4, we can know that the proposed method removes 95.89% weights in total. There are the most number of weights in the first layer, and the percentage of removed weights is also the highest. DENNC removes 97.08% weights of the first layer, and it is 85.75% weights of the whole network. For the second and third layers, 89.15% and 19.87% connections are pruned, respectively. In order to reveal pruning effect more intuitively, we plot histograms of weights distribution before and after pruning in Figure 4. Note that the number of bins in both histograms is 1000. From the figure, it is obvious that most of the small weights closed to zero are removed. we also notice huge difference between the two vertical axes, the maximum count in Figure 4a,b are 4500 and 90, respectively. Moreover, it leaves not only large weights but also a small amount of weights near zero after pruning.
For LeNet-5 model, detailed pruning results of each layer are shown in Table 5. Overall, 93.09% weights are pruned in the proposed DENNC. In LeNet-5, there are 92.44% weights in fully connected layers and 93.94% weights are removed of them. Among them, 96.03%, 93.06% and 19.41% connections of the three full connection layers were removed, respectively. For two convolutional layers, the proposed method prunes 10.18% and 77.49% weights, respectively. We also plot histograms of weights distribution before and after pruning for LeNet-5. In Figure 5, the weights approximate a normal distribution of zero mean, and most of the weights are in the range of [−0.4, 0.4] before pruning. After pruning, only a few small weights are retained, especially for the weights between [−0.2, 0.2]. From the above results, the proposed method is obviously effective for compressing LeNet-5.

4.2.3. AlexNet on CIFAR10

AlexNet is a deep neural network which has five convolutional layers and three fully connected layers. We present detailed pruning results of each layer in Table 6. Totally speaking, 96.57% weights of AlexNet are pruned. For five convolutional layers, the proposed DENNC removes 24.85%, 28.28%, 35.89%, 55.54% and 27.04% weights, respectively. While overall pruning ratio for convolutional layers is about 40%, the weights of the convolutional layers are only 4.24% of the total. Most of the weights are in fully connected layers, especially in the first and second linear layers. For three fully connected layers, 99.03% connections are removed in total, where the pruning ratio of the first layer is 99.08%, the pruning ratio fo the second layer is 99.34% and 45.15% connections are removed in the third layer. Furthermore, we plot histograms of weights distribution before and after pruning in Figure 6, which aims to show pruning result more vividly. From Figure 6, we can know that most of the weights are in range [−0.15, 0.15] before pruning, and the count of weights which approximate zero is around 3 × 10 4 . After pruning, the maximum count in Figure 7b is around 800, which is very small compared with that of before pruning. It is obvious that the proposed method prunes a huge amount of weight. Corresponding to our pruning strategy, there is a small amount of weights close to zero after pruning. In a word, the proposed DENNC is work well for pruning AlexNet on CIFAR10.

4.2.4. VGG16 on CIFAR10

VGG16 model is a very classical deep model with very a large amount of weight parameter. In this part, we prune VGG16 model on CIFAR10 and the results of each layer are shown in Table 7. Because of enough number of convolutional layers, the amount of convolutional layer weight parameters is not much different from that of the fully connected layers. From the total pruning result, 83.35% weights of convolutional layers are pruned, 98.73% connections of fully connected layer are removed and the proposed method prunes 92.03% weights in total. From the Table 7, we can know that most of the weights of convolutional are in layers of Conv9 to Conv13, and the pruning ratio of these layers is 75.43%, 93.40%, 96.57%, 95.17% and 96.98%, respectively. For fully connected layer, the pruning ratio of each layer is 94.60%, 99.36% and 87.19%, respectively. Moreover, we also plot histograms of weights distribution of VGG16 model in Figure 7. The figure shows that most of the weights are in range [−0.1, 0.1] before pruning, and the maximum count in Figure 7a is approximate to 1.5 × 10 4 . After pruning, the maximum count in Figure 7b is around 1000. Comparing Figure 7a,b, it is clear that a large amount of weight are removed especially for the weight which is close to zero. However, there are still few small weights which are close to zero, it corresponds to the strategy of recovery connection fine-tuning.

4.3. Ablation Studies

In order to analyze the sensitivity of balance parameter and the effectiveness of used strategy, we make ablation studies as follow. The ablation studies involve two experiments, parametric sensitivity analysis and fine-tuning strategy analysis.

4.3.1. Parametric Sensitivity Analysis

According to definition above, the fitness function includes two terms, and these two terms are contradictory usually. Thence, the balance parameter λ would be influential for neural network pruning. We design following experiment for studying parametric sensitivity. In this experiment, we separately discuss the pruning effect in the case of λ equals to 0.1, 0.2, 0.4, 0.6, 0.8, 1, 2, 4, 6, 8 and 10. The detailed results are shown in Table 8 and Figure 8.
In Table 8, there are three indices to evaluate pruning performance. The fitness can provide an overall estimation for weight pruning, and the accuracy and sparsity measure two important characteristics of weight pruning, respectively. From the table, we can know that the parameter λ does affect the result of neural network weight pruning. Where we apply different λ , there are significant differences on these three indices especially on the accuracy and sparsity. And the fitness becomes larger with increasing parameter λ in general. In order to show the influence of the parameter λ more intuitively, we plot two curves about the accuracy and sparsity in Figure 8. In Figure 8, blue curve represents the accuracy and orange curve means the sparsity. It is clear that as the parameter λ increases, the accuracy is decreasing, but the sparsity is increasing. It is aligned with Equation (14), the larger the λ , the greater the weight of the sparse term, therefore the sparsity of pruned model is increasing with increased λ . Moreover, we can know that when λ is in the range [0, 1], the sparsity changes drastically from 0.86 to 0.94, while when λ is in the range [1, 10], the sparsity only increases from 0.94 to 0.98. And the changes of the accuracy is smoother in general. Thus, it can be seen that the sparsity is more sensitive to λ than the accuracy.

4.3.2. Fine-Tuning Strategy Analysis

In proposed DENNC, we adjust the fine-tuning strategy to obtain better model capacity. In previous iterative pruning methods, fine-tuning is used to update remained weights. However, we use fine-tuning to update not only remained weights but also some removed weights. In this part, we will analyze the effectiveness of recovery connections fine-tuning strategy.
Firstly, we analyze the influence of recovery connections fine-tuning strategy for pruning fitness. As shown in Figure 9, blue histogram (DENNC) denotes pruning with recovery connections fine-tuning strategy and orange histogram (DENNC-N) represents pruning with normal fine-tuning strategy. The horizontal axis denotes iteration and vertical axis indicates fitness of pruned LeNet-300-100. From the figure, the fitness decreases with the increasing iteration, and the fitness of DENNC is less than that of DENNC-N in the most iteration.
Secondly, in order to explore the impact of fine-tuning strategy more intuitively, especially for model accuracy, we show two accuracy curves of the pruned models in Figure 10. In Figure 10, we use orange and blue curves represent the accuracy of pruned LeNet-300-100 with and without recovery connections fine-tuning strategy, respectively. We can know that the pruned model using recovery connections fine-tuning is alway has better accuracy. It is in accord with the purpose for which we designed this strategy. Therefore, we can infer that the recovery connections fine-tuning strategy is efficient to improve model accuracy.
Finally, we plot weight distribution of pruned LeNet-300-100 with and without recovery connection fine-tuning strategy in Figure 11 to analyze the difference between these two pruned model. From the figure, it is obvious that recovery connections fine-tuning strategy could reserve more small weights. Figure 11a shows that there is a certain number of weights near zero, but it does not exist in Figure 11b. Even though we assume that the smaller the weight, the less important the corresponding connection, it does not mean that small weights are useless. Sometimes small weights could bring more subtle features which is also important for improving model performance.

5. Conclusions and Future Works

In this paper, we proposed a differential evolutionary weight pruning method to compress neural networks. It is based on iterative pruning framework, consists of two phases, model pruning and model fine-tuning, respectively. In model pruning phase, we analyzes the pruning sensitivity of each layer by differential evolutionary approach, and then the sensitivity is used to guide the pruning process. In model fine-tuning phase, the connections that have been removed are also considered for proper recovery, which is able to improve model capacity. Experimental results demonstrate that our method can efficiently compress model, there is least 10× compressing ratio for each model, especially 29× compressing ratio in AlexNet. Compared with similar threshold pruning methods, out method performs better than these comparison methods overall. Moreover, we also do ablation studies for parametric sensitivity analysis and fine-tuning strategy analysis. Experiments show parameter λ is able to reflect the pruning result, and better model can be obtained by using fine-tuning strategy.
There are still some unresolved issues in this paper, for example, the time cost of the proposed method is expensive and with obtained sparse neural network it is also hard to accelerate inference. Therefore, in future work, we want to speed up the proposed method with parallel computing. Furthermore, we will also focus on neural network structural pruning with heuristic optimization.

Author Contributions

Conceptualization, T.W. and X.L.; methodology, T.W.; software, T.W. and X.L.; validation, T.W., X.L. and N.L.; formal analysis, T.W.; investigation, T.W.; resources, N.L.; data curation, J.S.; writing—original draft preparation, T.W.; writing—review and editing, X.L.; visualization, J.S.; supervision, D.Z.; project administration, D.Z.; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by National Natural Science Foundation of China (Grant No. 62076204), the National Natural Science Foundation of Shaanxi Province under Grantnos. 2018JQ6003 and 2018JQ6030, the China Postdoctoral Science Foundation (Grant Nos. 2017M613204 and 2017M623246).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Canziani, A.; Paszke, A.; Culurciello, E. An analysis of deep neural network models for practical applications. arXiv 2016, arXiv:1605.07678. [Google Scholar]
  2. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2012; pp. 1097–1105. [Google Scholar]
  3. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  4. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  5. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
  6. LeCun, Y.; Denker, J.S.; Solla, S.A. Optimal brain damage. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 1990; pp. 598–605. [Google Scholar]
  7. Hassibi, B.; Stork, D.G. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 1993; pp. 164–171. [Google Scholar]
  8. Jorgensen, T.D.; Haynes, B.P.; Norlund, C.C. Pruning artificial neural networks using neural complexity measures. Int. J. Neural Syst. 2008, 18, 389–403. [Google Scholar] [CrossRef] [PubMed]
  9. Molchanov, P.; Mallya, A.; Tyree, S.; Frosio, I.; Kautz, J. Importance Estimation for Neural Network Pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11264–11272. [Google Scholar]
  10. Liu, Z.; Sun, M.; Zhou, T.; Huang, G.; Darrell, T. Rethinking the value of network pruning. arXiv 2018, arXiv:1810.05270. [Google Scholar]
  11. Lee, N.; Ajanthan, T.; Torr, P.H. SNIP: Single-shot network pruning based on connection sensitivity. arXiv 2018, arXiv:1810.02340. [Google Scholar]
  12. Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2015; pp. 1135–1143. [Google Scholar]
  13. Dong, X.; Chen, S.; Pan, S. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; pp. 4857–4867. [Google Scholar]
  14. Castellano, G.; Fanelli, A.M.; Pelillo, M. An iterative pruning algorithm for feedforward neural networks. IEEE Trans. Neural Netw. 1997, 8, 519–531. [Google Scholar] [CrossRef] [PubMed]
  15. Stepniewski, S.W.; Keane, A.J. Pruning backpropagation neural networks using modern stochastic optimisation techniques. Neural Comput. Appl. 1997, 5, 76–98. [Google Scholar] [CrossRef] [Green Version]
  16. Anwar, S.; Hwang, K.; Sung, W. Structured pruning of deep convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. 2017, 13, 32:1–32:18. [Google Scholar] [CrossRef] [Green Version]
  17. He, Y.; Zhang, X.; Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1389–1397. [Google Scholar]
  18. Reed, R. Pruning algorithms—A survey. IEEE Trans. Neural Netw. 1993, 4, 740–747. [Google Scholar] [CrossRef] [PubMed]
  19. Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning convolutional neural networks for resource efficient transfer learning. arXiv 2016, arXiv:1611.06440. [Google Scholar]
  20. Storn, R.; Price, K. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
  21. Price, K.; Storn, R.M.; Lampinen, J.A. Differential Evolution: A Practical Approach to Global Optimization; Springer Science & Business Media: Boston, NY, USA, 2006. [Google Scholar]
  22. Qin, A.K.; Huang, V.L.; Suganthan, P.N. Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans. Evol. Comput. 2009, 13, 398–417. [Google Scholar] [CrossRef]
  23. Denil, M.; Shakibi, B.; Dinh, L.; Ranzato, M.; De Freitas, N. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2013; pp. 2148–2156. [Google Scholar]
  24. Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
  25. Hu, H.; Peng, R.; Tai, Y.W.; Tang, C.K. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv 2016, arXiv:1607.03250. [Google Scholar]
  26. Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning filters for efficient convnets. arXiv 2016, arXiv:1608.08710. [Google Scholar]
  27. Chen, W.; Wilson, J.; Tyree, S.; Weinberger, K.; Chen, Y. Compressing neural networks with the hashing trick. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2285–2294. [Google Scholar]
  28. Dettmers, T. 8-bit approximations for parallelism in deep learning. arXiv 2015, arXiv:1511.04561. [Google Scholar]
  29. Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; Narayanan, P. Deep learning with limited numerical precision. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1737–1746. [Google Scholar]
  30. Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
  31. Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. Fitnets: Hints for thin deep nets. arXiv 2014, arXiv:1412.6550. [Google Scholar]
  32. Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4133–4141. [Google Scholar]
  33. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
  34. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical report; Citeseer: University Park, PA, USA, 2009. [Google Scholar]
  35. Srinivas, S.; Babu, R.V. Data-free parameter pruning for deep neural networks. arXiv 2015, arXiv:1507.06149. [Google Scholar]
  36. Wu, T.; Shi, J.; Zhou, D.; Lei, Y.; Gong, M. A Multi-objective Particle Swarm Optimization for Neural Networks Pruning. In Proceedings of the 2019 IEEE Congress on Evolutionary Computation, Wellington, New Zealand, 10–13 June 2019; pp. 570–577. [Google Scholar]
Figure 1. The iteration pruning and fine-tuning framework of differential evolutionary neural network compression. For input dense neural network, we analyze the layer’s pruning sensitivity by differential evolutionary and then pruning this network. Next, we fine-tune the previously pruned networks with the strategy of properly recovering pruned connections.The recovered connections is shown as red dotted line. Iterating through above operations until the stop criteria is satisfied.
Figure 1. The iteration pruning and fine-tuning framework of differential evolutionary neural network compression. For input dense neural network, we analyze the layer’s pruning sensitivity by differential evolutionary and then pruning this network. Next, we fine-tune the previously pruned networks with the strategy of properly recovering pruned connections.The recovered connections is shown as red dotted line. Iterating through above operations until the stop criteria is satisfied.
Sensors 21 00880 g001
Figure 2. The fitness curves of differential evolutionary neural network compression framework (DENNC) under 100 generations on LeNet-5 pruning task.
Figure 2. The fitness curves of differential evolutionary neural network compression framework (DENNC) under 100 generations on LeNet-5 pruning task.
Sensors 21 00880 g002
Figure 3. The fitness curve of DENNC under 30 iterations on LeNet-5 pruning.
Figure 3. The fitness curve of DENNC under 30 iterations on LeNet-5 pruning.
Sensors 21 00880 g003
Figure 4. The weight distributions of LeNet-300-100 before and after pruning. The number of bins in both histograms is 1000.
Figure 4. The weight distributions of LeNet-300-100 before and after pruning. The number of bins in both histograms is 1000.
Sensors 21 00880 g004
Figure 5. The weight distributions of LeNet5 before and after pruning. The number of bins in both histograms is 500.
Figure 5. The weight distributions of LeNet5 before and after pruning. The number of bins in both histograms is 500.
Sensors 21 00880 g005
Figure 6. The weight distributions of AlexNet before and after pruning. The number of bins in both histograms is 10,000.
Figure 6. The weight distributions of AlexNet before and after pruning. The number of bins in both histograms is 10,000.
Sensors 21 00880 g006
Figure 7. The weight distributions of VGG16 model before and after pruning. The number of bins in both histograms is 10,000.
Figure 7. The weight distributions of VGG16 model before and after pruning. The number of bins in both histograms is 10,000.
Sensors 21 00880 g007
Figure 8. Changes in various indexes under different λ .
Figure 8. Changes in various indexes under different λ .
Sensors 21 00880 g008
Figure 9. Changes in fitness with and without the pruning strategy of recovery connections for LeNet-300-100. The horizontal axis denotes iteration of DENNC, and the vertical axis is fitness.
Figure 9. Changes in fitness with and without the pruning strategy of recovery connections for LeNet-300-100. The horizontal axis denotes iteration of DENNC, and the vertical axis is fitness.
Sensors 21 00880 g009
Figure 10. The accuracy of pruned LeNet-300-100 with and without recovery connections fine-tuning. The orange curve indicates the accuracy of pruned model with recovery connections fine-tuning, and the blue curve means the pruning result without recovery connections fine-tuning. The accuracy of DENNC is better than that of DENNC-N.
Figure 10. The accuracy of pruned LeNet-300-100 with and without recovery connections fine-tuning. The orange curve indicates the accuracy of pruned model with recovery connections fine-tuning, and the blue curve means the pruning result without recovery connections fine-tuning. The accuracy of DENNC is better than that of DENNC-N.
Sensors 21 00880 g010
Figure 11. The weight distributions of LeNet-300-100 with and without recovery connections fine-tuning. The bins of these two histograms is 10,000.
Figure 11. The weight distributions of LeNet-300-100 with and without recovery connections fine-tuning. The bins of these two histograms is 10,000.
Sensors 21 00880 g011
Table 1. The simple test results on the sensitivity of the neural network layer to pruning. The test neural network is LeNet-300-100, and which is a fully connected network. There are different pruning ratio of each layer which are shown in the first three columns, and the last column is the accuracy of the pruned networks.
Table 1. The simple test results on the sensitivity of the neural network layer to pruning. The test neural network is LeNet-300-100, and which is a fully connected network. There are different pruning ratio of each layer which are shown in the first three columns, and the last column is the accuracy of the pruned networks.
Pruning Ratio of Layer1Pruning Ratio of Layer2Pruning Ratio of Layer3Accuracy
0.80.80.80.6701
0.820.680.10.9386
0.840.50.40.9294
0.860.350.30.9050
0.880.20.20.8764
0.900.0500.8399
Table 2. Detailed neural networks information.
Table 2. Detailed neural networks information.
ModelLeNet-300-100LeNet-5AlexNetVGG16
DatasetMNISTMNISTCIFAR10CIFAR10
TypeFCNCNNCNNCNN
Accuracy0.97740.99050.90040.8757
structure(784,300)8@(5,5)24@(3,3)16@(3,3)
(300,100)16@(5,5)64@(5,5)16@(3,3)
(100,10)(,120)96@(3,3)32@(3,3)
(120,84)96@(3,3)32@(3,3)
(84,10)64@(5,5)64@(3,3)
(,1024)64@(3,3)
(1024,1024)64@(3,3)
(1024,10)128@(3,3)
128@(3,3)
128@(3,3)
128@(3,3)
128@(3,3)
128@(3,3)
(,1024)
(1024,1024)
(1024,10)
Table 3. Overall results of proposed method and comparison methods. The bold numbers mean the best results.
Table 3. Overall results of proposed method and comparison methods. The bold numbers mean the best results.
ModelMethodError %# WPruned W %Memory (M)CR
LeNet-300-100original2.26266,610-3.08-
Naivecut2.84106,93759.891.462.49
Iterative pruning2.3726,87489.920.349.92
MONNP2.244,36483.360.576.01
DENNC2.0710,96295.890.1424.33
LeNet-5original0.9545,278-0.55-
Naivecut1.7119,19857.50.262.35
Iterative pruning1.72382691.550.0511.83
MONNP0.91850381.220.115.32
DENNC1.41344493.090.0414.47
AlexNetoriginal9.965,488,106-41.92-
Naivecut11.661,153,60078.989.574.76
Iterative pruning11.51275,50394.982.1919.92
MONNP11.41748,57886.365.997.33
DENNC11.5919212096.571.4729.15
VGG16original12.432,112,730-37.29-
Naivecut12.94972,27853.9818.832.17
Iterative pruning16.79211,69689.983.949.98
MONNP16.42312,89585.195.936.75
DENNC13.16168,48992.033.0812.55
Table 4. The detailed pruning results of each layer for LeNet-300-100 on MNIST.
Table 4. The detailed pruning results of each layer for LeNet-300-100 on MNIST.
Layer# Original W# Remained WPruned W %
Linear1235,500688697.08
Linear230,100326789.15
Linear3101080919.87
Total266,61010,96295.89
Table 5. The detailed pruning results of each layer for LeNet5 on MNIST.
Table 5. The detailed pruning results of each layer for LeNet5 on MNIST.
Layer# Original W# Remained WPruned W %
Conv120818710.18
Conv2321672477.49
Linear130,840122596.03
Linear210,16462493.86
Linear385068519.41
Total45,278344493.09
Table 6. The detailed pruning results of each layer for AlexNet on CIFAR10.
Table 6. The detailed pruning results of each layer for AlexNet on CIFAR10.
Layer# Original W# Remained WPruned W %
Conv167250524.85
Conv238,46427,58628.28
Conv355,39235,51435.89
Conv483,04036,91855.54
Conv555,36040,38927.04
Linear14,195,32838,70999.08
Linear21,049,600687699.34
Linear310,250562345.15
Total5,488,106192,12096.57
Table 7. The detailed pruning results of each layer for VGG16 model on CIFAR10.
Table 7. The detailed pruning results of each layer for VGG16 model on CIFAR10.
Layer# Original W# Remained WPruned W %
Conv144832427.59
Conv22320126845.33
Conv34640309933.20
Conv49248405756.13
Conv518,49615,20017.82
Conv636,92822,60638.78
Conv736,92821,31542.28
Conv873,85622,78869.15
Conv9147,58436,25775.43
Conv10147,584974693.40
Conv11147,584506696.57
Conv12147,584713095.17
Conv13147,584445596.98
Linear1132,096713994.60
Linear21,049,600672499.36
Linear310,250131387.19
Total2,112,730168,48992.03
Table 8. The pruning results of LeNet-5 under different parameters.
Table 8. The pruning results of LeNet-5 under different parameters.
λ FitnessAccuracySparsity
0.10.023820.99010.8608
0.20.029780.98960.9031
0.40.045820.98710.9177
0.60.055160.98570.9319
0.80.068660.98430.9338
10.07900.98370.9373
20.14510.97910.9379
40.25180.97260.9439
60.25160.96860.9633
80.26100.96780.9714
100.28220.96480.9753
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wu, T.; Li, X.; Zhou, D.; Li, N.; Shi, J. Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks. Sensors 2021, 21, 880. https://doi.org/10.3390/s21030880

AMA Style

Wu T, Li X, Zhou D, Li N, Shi J. Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks. Sensors. 2021; 21(3):880. https://doi.org/10.3390/s21030880

Chicago/Turabian Style

Wu, Tao, Xiaoyang Li, Deyun Zhou, Na Li, and Jiao Shi. 2021. "Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks" Sensors 21, no. 3: 880. https://doi.org/10.3390/s21030880

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop