【WACV 2021】神经网络整体剪枝：Holistic Filter Pruning for Efficient Deep Neural Networks

阅读量：

WACV 2021

论文地址：
主要问题：
主要思路：
具体实现：
- 基本符号：
- 指示函数：
- 剪枝损失：
实验结果：
联系作者：
我的公众号：

论文地址：

https://arxiv.org/abs/2009.08169

主要问题：

当前的剪枝算法多被视为一种启发式方法，其显著性分析往往需要大量人工干预；此外，逐层剪枝以及迭代剪枝和再训练方案均存在局限性，因为这些方法未能充分考虑全局信息，可能导致后续迭代中被剪枝的不重要的卷积在后续阶段变得重要。

主要思路：

该算法提出了一种整体滤波器修剪方法（HFP），旨在优化深度神经网络（DNN）结构。通过调整BN层的通道缩放因子，算法能够自然引入稀疏性特征，从而避免增加额外的模型参数。同时，该方法通过梯度下降优化整体训练目标函数，为每个层单独设计剪枝策略，以满足所需模型的大小要求。

具体实现：

基本符号：

这篇文章中的 DNN 只考虑加权和（卷积或全连接层）、BN层和非线性转换层

其中加权和可以写成：

$a_{l}=w_{l} * x_{l-1}+b_{l}$

BN 层写作：

$\hat{a}_{l,c} = \begin{cases} \dfrac{a_{l,c} - \mathbb{E}[a_{l,c}]}{\sqrt{\operatorname{Var}(a_{l,c}) + \epsilon}} \cdot \gamma_{l,c} + \beta_{l,c} & \text{while training} \\ \dfrac{a_{l,c} - \mu_{l,c}}{\sqrt{\sigma_{l,c}^2 + \epsilon} \cdot \gamma_{l,c} + \beta_{l,c}} & \text{while inference} \end{cases}$

推理时我们可以将BN层折叠成前一个卷积层或全连接层，以加速计算：

The output $\hat{a}_{l}$ is computed as the product of the weight parameter $\hat{w}_{l}$ and the input term $x_{l-1}$ , plus the bias term $\hat{b}_{l}$ . The weight parameter $\hat{w}_{l}$ is determined by scaling the original weight $w_{l}$ with the factor $\gamma_{l}$ divided by the square root of the sum of the variance term $\sigma_{l}^{2}$ and a small constant $\epsilon$ . Similarly, the bias term $\hat{b}_{l}$ is derived by first centering the original bias $b_{l}$ around its mean $\mu_{l}$ , then scaling it with the same factor $\gamma_{l}$ , and finally adding the bias offset $\beta_{l}$ .

整体剪枝算法的训练目标旨在将学习任务的学习与剪枝任务的剪枝相结合，在训练过程中同时处理。

$\mathcal{L}=\mathcal{L}_{\text {learning }}+\lambda \mathcal{L}_{\text {pruning }}$

指示函数：

卷积网络中的重要性体现在BN层的通道缩放因子上，为此作者首先开发了一个基于大小的指示函数，用于判定γ的绝对值是否小于大小t：

Φ(γ, t)的值定义为：当|γ|小于等于t时，Φ(γ, t)等于0；当|γ|大于t时，Φ(γ, t)等于1。

当指示函数的输出结果为零时，相应的通道将被视为非活跃通道，并在训练结束后被移除。

基于指示函数的梯度在几乎处处为零的情况下，我们通过网络量化技术的通路估计，能够近似地评估步跃函数在反向传播过程中局部的梯度变化。

\frac{\partial \Phi(\gamma)}{\partial \gamma} = \begin{cases} -1, & \text{当 } \gamma \leq 0 \\ 1, & \text{当 } \gamma > 0 \end{cases}

先前的研究表明，缩放因子的magnitude低于 $10^{−4}$ 时可以被设为零，这不会显著影响精度，从而使得信道输出近似于BN层的偏置参数 $\beta_{l,c}$ （独立于输入）。

$\hat{a}_{l, c}=\hat{w}_{l, c} * x_{l-1}+\hat{b}_{l, c}\left|\gamma_{l, c}\right|<10^{-4} \quad \approx \beta_{l, c}$

剪枝损失：

在裁剪过程中， $\tilde{P}$ 和 $\tilde{M}$ 分别表示裁剪后模型的参数量和FLOPS，而 $P^*$ 和 $M^*$ 则代表所需的目标值。因此，裁剪后的模型与目标大小之间的差异可通过以下损失函数进行量化描述。

$\mathcal{L}_{\text {pruning }}$ 等于以下表达式： $\operatorname{relu}\left(\frac{\widetilde{P}-P^{*}}{P}\right)$ 加上 $\text { relu }\left(\frac{\widetilde{M}-M^{*}}{M}\right)$ 。其中， $\mathcal{L}_{\text {pruning }}$ 表示剪枝损失， $\widetilde{P}$ 和 $\widetilde{M}$ 分别代表估计的剪枝参数， $P^{*}$ 和 $M^{*}$ 代表最优剪枝参数， $P$ 和 $M$ 为对应的参数值。

其中：

$\begin{array}{c} \widetilde{P}=\sum_{l=1}^{L-1} P_{l} \underbrace{\left(\frac{1}{C_{l-1} C_{l}} \sum_{c=1}^{C_{l-1}} \Phi\left(\gamma_{l-1, c}\right) \sum_{c=1}^{C_{l}} \Phi\left(\gamma_{l, c}\right)\right)}_{\text {中间层网络的剪枝比率}} \\ +P_{L} \underbrace{\left(\frac{1}{C_{L-1}} \sum_{c=1}^{C_{L-1}} \Phi\left(\gamma_{L-1, c}\right)\right)}_{\text {最后一个层的剪枝比率}} \end{array}$

并且：

$\begin{aligned} \widetilde{M}=\sum_{l=1}^{L-1} M_{l} \underbrace{\left(\frac{1}{C_{l-1} C_{l}} \sum_{c=1}^{C_{l-1}} \Phi\left(\gamma_{l-1, c}\right) \sum_{c=1}^{C_{l}} \Phi\left(\gamma_{l, c}\right)\right)}_{\text {中间层的剪枝比例 } l} \\ +M_{L} \underbrace{\frac{1}{C_{L-1}} \sum_{c=1}^{C_{L-1}} \Phi\left(\gamma_{L-1, c}\right)}_{\text {最后一层的剪枝比例 } L} \end{aligned}$

实验结果：

联系作者：

微信号：Sharpiless

作者的其他主页：

B站：https://space.bilibili.com/470550823

：<>

AI Studio：https://aistudio.baidu.com/aistudio/personalcenter/thirdview/67156

Github：https://github.com/Sharpiless

我的公众号：

全部评论 (0)

还没有任何评论哟~

【WACV 2021】神经网络整体剪枝：Holistic Filter Pruning for Efficient Deep Neural Networks

【WACV2021】神经网络整体剪枝：HolisticFilterPruningforEfficientDeepNeuralNetworks 论文地址：主要问题：主要思路：具体实现：基本符号：...

[剪枝]Channel Pruning for Accelerating Very Deep Neural Networks

ChannelPruningforAcceleratingVeryDeepNeuralNetworks 已复现：<https://github.com/onionrain/MCTS: pythonth...

[剪枝]Channel Pruning for Accelerating Very Deep Neural Networks

[ICCV2017]ChannelPruningforAcceleratingVeryDeepNeuralNetworks arxiv:<https://arxiv.org/abs/1707.0616...

【剪枝】ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

这是南京大学与上海交通大学2017年在CVPR上发表的一篇论文，剪枝方法比较具有实操性。剪枝的思想可以被总结为一句话：（以某一标准）评估每一个神经元的重要性，移除不重要的那些神经元，再finetun...

Discrimination-aware Channel Pruning for Deep Neural Networks：基于鉴别力感知的深度神经网络剪枝

相比于笔者之前讲的几种剪枝方法，这篇文章的剪枝方法比较复杂。文章首先提出了一种假设： informativechannel,nomatterwhereitis,shouldowndiscriminat...

网络模型剪枝-论文阅读-《Channel Pruning for Accelerating Very Deep Neural Networks》

与上篇对卷积核进行剪枝不同，本篇论文主要是对特征通道进行剪枝。 Introduction 其实这一篇论文的立意和上篇类似，都是说在剪枝后不需要产生稀疏网络，不同的是，该篇论文是剪枝的冗余的特征通道来达...

网络模型剪枝-论文阅读-《Data-free Parameter Pruning for Deep Neural Networks》

与上篇论文剪枝权重连接不同，这篇论文是直接剪枝神经元，并且是datafree，就是不用重训练的，值得读一下。本篇论文的主要思想是，找到两个非常相似的神经元，删除其中的一个并使最终的输出尽量不变。

[剪枝]Pruning Filters for Efficient ConvNets

PruningFiltersforEfficientConvNets 已复现：<https://github.com/onionrain/MCTS: pythonfilterprunerexp.pya...

PRUNING-卷积神经网络剪枝

PRUNINGCONVOLUTIONALNEURALNETWORKSFORRESOURCEEFFICIENTINFERENCE\基于资源高效推理的裁剪卷积神经网络这篇论文来自NVIDIA，2017年...

《Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks》论文笔记

1\.概述这篇文章中给出了一种叫作SFP（SoftFilterPruning），它具有如下两点优点： 1）Largermodelcapacity。相比直接剪裁掉网络中的filters，再在这个基础上...

是否确定退出登录?

【WACV 2021】神经网络整体剪枝：Holistic Filter Pruning for Efficient Deep Neural Networks

论文地址：

主要问题：

主要思路：

具体实现：

基本符号：

指示函数：

剪枝损失：

实验结果：

联系作者：

我的公众号：

全部评论 (0)

相关文章推荐

【WACV 2021】神经网络整体剪枝：Holistic Filter Pruning for Efficient Deep Neural Networks

[剪枝]Channel Pruning for Accelerating Very Deep Neural Networks

[剪枝]Channel Pruning for Accelerating Very Deep Neural Networks

【剪枝】ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

Discrimination-aware Channel Pruning for Deep Neural Networks：基于鉴别力感知的深度神经网络剪枝

网络模型剪枝-论文阅读-《Channel Pruning for Accelerating Very Deep Neural Networks》

网络模型剪枝-论文阅读-《Data-free Parameter Pruning for Deep Neural Networks》

[剪枝]Pruning Filters for Efficient ConvNets

PRUNING-卷积神经网络剪枝

《Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks》论文笔记