Knowledge Distillation by On-the-Fly Native Ensemble论文解读

阅读量：

1. 网络结构：

Gate被定义为全连接网络，在分析哪种类型的网络更加重要时发挥着关键作用。现有的利用全连接网络选择网络组件重要性的方法非常普遍。例如，“三人共识顶个专家？”感觉类似于bagging方法。

**2.**损失函数：

训练时softmax都有温度T=3蒸馏，测试时就恢复T=1。

最终的Loss

第一部分代表各子网络之间的损失函数设计结果, 第二部分则用于计算教师网络的整体损失, 第三部分则衡量了各子网络与教师网络之间的KL散度.

**3.**测试情况

我的测试结果：

测试数据集：cifar100

测试条件：网络的最后一个block开始如图1的分支结构（分三个branch：m=2）

ResNet32_ori top1 bestacc: 70.69

ResNet32_ONE top1 bestacc: 73.47

ResNet32_ONE_E top1 bestacc: 75.45

ResNet110_ori top1 bestacc: 75.38

ResNet110_ONE top1 bestacc: 78.79

ResNet110_ONE_E top1 bestacc: 79.77

note: _ori标识为原始网络；
在测试过程中删除分支结构时使用的是_ONE；
在测试过程中保留分支结构时使用的是_ONE_E。

作者的测试结果：

在CIFAR-100数据集上的准确率提升了相当显著。Ensemble策略（E）相比One方案在小型网络中的提升更为明显，在大型网络中的提升则相对有限。对比实验表明，在ResNet-110原始架构的基础上引入One策略可进一步优化模型性能。

作者在imagenet****数据集上的测试结果：

基于imagenet数据集的实验中，在最后两个block处实施分支结构。观察到性能有所提升，但相较于cifar-100数据集上的优化效果更为显著。此外，通过所收集的数据样本分析可知，在新型网络架构下该方法仍展现出良好的适用性

不同数据集之间的测试效果差异性显著，未来在对各类别分类网络进行性能评估方面仍需进一步研究

全部评论 (0)

还没有任何评论哟~

Knowledge Distillation by On-the-Fly Native Ensemble论文解读

1\.网络结构： Gate为全连接网络，用来学习哪个网络更重要。目前利用全连接网络选择网络部件重要性的方法很流行。“三个臭皮匠顶个诸葛亮？”，感觉很像bagging方法。 2.损失函数：训练时sof...

在线多分支融合——Knowledge Distillation by On-the-Fly Native Ensemble

KnowledgeDistillationbyOntheFlyNativeEnsemble这篇文章基于给定的基础网络（如Resnet等），通过在网络深层次构造多分支结构，且每个分支作为学生网络，能够融...

论文解读：Decoupled Knowledge Distillation

1\.论文基本信息论文：DecoupledKnowledgeDistillation 地址：https://arxiv.org/pdf/2203.08679.pdf 代码：https://githu...

[论文阅读]Sequence-Level Knowledge Distillation

文章目录前言摘要一、Introduction 二、Distillation 2.1KnowledgeDistillation 2.2KnowledgeDistillationforNMT 2.2...

《Preservation of the Global Knowledge by Not-True Self Knowledge Distillation in Federated Learning》

《PreservationoftheGlobalKnowledgebyNotTrueSelfKnowledgeDistillationinFederatedLearning》 Abstract 1⃣️...

[读论文][backbone][DiffKD] Knowledge Diffusion for Distillation

DiffKD 摘要 TherepresentationgapbetweenteacherandstudentisanemergingtopicinknowledgedistillationKD. To...

On-the-Fly Conversion

目录一、OntheFlyConversion 二、Verilog设计本文是介绍基4SRT算法前的第五篇补充文章《Ontheflyconversion》。

【论文阅读】Substitute Model Generation for Black-Box Adversarial Attack Based on Knowledge Distillation

摘要尽管深度卷积神经网络（CNN）在许多计算机视觉任务中表现良好，但当它受到对抗性攻击的扰动时，其分类机制非常脆弱。在本文中，我们提出了一种利用知识蒸馏生成黑盒CNN模型的替代模型的新算法。所提出的...

Focal and Global Knowledge Distillation for Detectors--FGD论文解读

论文：FocalandGlobalKnowledgeDistillationforDetectors 论文：https://arxiv.org/abs/2111.11837 代码：https://gi...

react-native 报错 Changing onViewableItemsChanged on the fly is not supported

对于这个错基本网上的方案都是this指向问题，由于本人用的hooks，来写类似抖音的页面，所以我这里必须用useRef接收一下，下面是写法 exportdefaultfunctionHomeListP...

是否确定退出登录?

Knowledge Distillation by On-the-Fly Native Ensemble论文解读

全部评论 (0)

相关文章推荐

Knowledge Distillation by On-the-Fly Native Ensemble论文解读

在线多分支融合——Knowledge Distillation by On-the-Fly Native Ensemble

论文解读：Decoupled Knowledge Distillation

[论文阅读]Sequence-Level Knowledge Distillation

《Preservation of the Global Knowledge by Not-True Self Knowledge Distillation in Federated Learning》

[读论文][backbone][DiffKD] Knowledge Diffusion for Distillation

On-the-Fly Conversion

【论文阅读】Substitute Model Generation for Black-Box Adversarial Attack Based on Knowledge Distillation

Focal and Global Knowledge Distillation for Detectors--FGD论文解读

react-native 报错 Changing onViewableItemsChanged on the fly is not supported