【KD】Correlation Congruence for Knowledge Distillation

阅读量：

Paper： Correlation Congruence for Knowledge Distillation

1, Motivation：

通常情况下KD的teacher模型的特征空间没考虑类内类间的分布,student模型也将缺少我们期望的类内类间的分布特性。

Usually, the embedding space of teacher possesses the characteristic that intra-class instances cohere together while inter-class instances separate from each other. But its counterpart of student model trained by instance congruence would lack such desired characteristic.
****

2，Contribution:

提出相关一致性知识蒸馏（CCKD），它不仅关注实例一致性，而且关注相关一致性。（instance congruence通过mini-batch的PK或聚类实现。correlation congruence通过样本I,J直接的相关性损失函数的约束实现实现。）
将mini-batch中的相关性计算直接转成mini-batch的的大矩阵进行，减少计算量。
采用不同的mini-batch sampler strategies.
在CIFAR-100, ImageNet-1K, person reidentification and face recognition进行实验。

3，论文框架：

3.3. Correlation Congruence

4，实验结果：

可以看到加约束的，intra-class距离更大

5，Setting：

On CIFar-100, ImageNet-1K and MSMT17, Original Knowledge distillation (KD) [15] and cross-entropy (CE) are chosen as the baselines. For face recognition, ArcFace loss [5] and L 2-mimic loss [21, 23] are adopt. We compare CCKD with several state-of-the-art distillation related methods, including attention transfer (AT) [37], deel mutual learning (DML) [39] and conditional adversarial network (Adv) [35]. For attention transfer, we add it for last two blocks as suggested in [37]. For adversarial training, the discriminator consists of FC(128 × 64) + BN + ReLU + FC (64 × 2) + Sigmoid activation layers, and we adopt BinaryCrossEntropy loss to train it.

ResNet-50 is used as the teacher network and ResNet-18 as student network. The dimension of the feature representation is set to 256. We set the weight decay to 5 e - 4, batch size to 40, and use stochastic gradient descent with momentum. The learning rate is set as 0.0003, then divided by 10 at 45, 60 epochs, totally 90 epochs.

全部评论 (0)

还没有任何评论哟~

【KD】Correlation Congruence for Knowledge Distillation

Paper：CorrelationCongruenceforKnowledgeDistillation 1,Motivation：通常情况下KD的teacher模型的特征空间没考虑类内类间的分布,s...

【样本间关系知识蒸馏】CVPR 2019：Correlation Congruence for Knowledge Distillation

【样本间关系知识蒸馏】CVPR2019：CorrelationCongruenceforKnowledgeDistillation 论文地址：代码地址：主要问题：主要思路：具体实现：基本符号...

【KD】2022 CVPR Decoupled Knowledge Distillation

目录 1研究摘要 2研究动机 2.1符号定义 2.2重新推导KDLoss 3启发式探索 3.1单独使用TCKD/NCKD训练 3.2TCKD：传递样本难度相关的知识 3.3NCKD：被抑制的重要成分 ...

【KD】2022 TPAMI Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Clf

目录 1、简介 2、知识蒸馏为什么有效？因为有老师给你划“重点” Preface Introduction Hypothesis&Metrics 信息损失与知识点假设与评估 Experiments ...

【KD】2022 KDD FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks

目录 0.简介 1.摘要 2.模型介绍 2.1节点级知识蒸馏 2.2结构级知识蒸馏 2.3优化过程 3.实验 3.1数据集 3.2基线模型 3.3实验结果 3.3.1结果分析 3.3.2消融研究 4....

【KD】2022 NeurIPS Respecting Transfer Gap in Knowledge Distillation

目录 1引言 2KD的转移差距 2.1教师知识的不平衡 2.2因果推断视角的解释 3方法：IPWD 3.1KD的逆概率加权 3.2倾向得分 4实验 1\.图像分类CIFAR100： 2\.图像分类Im...

【数据挖掘】知识蒸馏（Knowledge Distillation, KD）

1\.概念知识蒸馏（KnowledgeDistillation,KD）是一种模型压缩和知识迁移技术，旨在将大型复杂模型（称为教师模型）中的知识传递给一个较小的模型（称为学生模型），以减少计算成本，同...

2023 Curriculum Temperature for Knowledge Distillation

论文地址：https://arxiv.org/abs/2211.16231 代码地址：https://github.com/zhengli97/CTKD 1研究动机与研究思路研究动机：大多数现有的蒸...

Structured Knowledge Distillation for Semantic Segmentation

本文通过知识蒸馏的思想利用复杂网络（Teacher）来训练简单网络（Student），目的是为了让简单的网络能够达到和复杂网络相同的分割结果。为了得到两个网络相同的结果就要保证两个网络在训练过程中的一...

《Structured Knowledge Distillation for Semantic Segmentation》

摘要本文研究了利用大网络训练小语义分割网络的知识蒸馏策略，从简单的像素点蒸馏方案开始，将蒸馏方案应用于图像分类，并分别对每个像素进行知识蒸馏。此外，本文提出将结构化知识从大网络提取到小网络。

是否确定退出登录?

【KD】Correlation Congruence for Knowledge Distillation

Paper： Correlation Congruence for Knowledge Distillation

1, Motivation：

2，Contribution:

3，论文框架：

4，实验结果 ：

5，Setting：

全部评论 (0)

相关文章推荐

【KD】Correlation Congruence for Knowledge Distillation

【样本间关系知识蒸馏】CVPR 2019：Correlation Congruence for Knowledge Distillation

【KD】2022 CVPR Decoupled Knowledge Distillation

【KD】2022 TPAMI Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Clf

【KD】2022 KDD FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks

【KD】2022 NeurIPS Respecting Transfer Gap in Knowledge Distillation

【数据挖掘】知识蒸馏（Knowledge Distillation, KD）

2023 Curriculum Temperature for Knowledge Distillation

Structured Knowledge Distillation for Semantic Segmentation

《Structured Knowledge Distillation for Semantic Segmentation》

4，实验结果：