【CTR】Deep Interest Evolution Network for Click-Through Rate Prediction (AAAI‘19)

阅读量：

DIEN (AAAI’19)

这篇还是阿里的工作，进一步改进DIN，不仅要建模出用户行为的多兴趣，还要建模出时序关系。用两层GRU来建模多兴趣时序关系 ，引入了细粒度的辅助loss，在2018年算是工业界落地最复杂的模型了：

两层GRU分别是Interest Extractor Layer和Interest Evolving Layer。再介绍之前我们先回顾一下GRU是怎么更新的：
$\begin{aligned} &\mathbf{u}_{t}=\sigma\left(W^{u} \mathbf{i}_{t}+U^{u} \mathbf{h}_{t-1}+\mathbf{b}^{u}\right) \\ &\mathbf{r}_{t}=\sigma\left(W^{r} \mathbf{i}_{t}+U^{r} \mathbf{h}_{t-1}+\mathbf{b}^{r}\right) \\ &\tilde{\mathbf{h}}_{t}=\tanh \left(W^{h} \mathbf{i}_{t}+\mathbf{r}_{t} \circ U^{h} \mathbf{h}_{t-1}+\mathbf{b}^{h}\right) \\ &\mathbf{h}_{t}=\left(\mathbf{1}-\mathbf{u}_{t}\right) \circ \mathbf{h}_{t-1}+\mathbf{u}_{t} \circ \tilde{\mathbf{h}}_{t ;} \end{aligned}$

Interest Extractor Layer

用户 $1-T$ 时刻交互过的item的embedding，输到GRU里，输出每个时刻的hidden state，这样第一层GRU的输出就是 $T$ 个hidden state。但是这里有一个Auxiliary Loss (上图左部)，强调了时序关系。

GRU处理序列信息有一个缺点就是，预测输出主要和最后一个hidden state强相关，中间的history hidden state不能得到有效的监督信号。因此设计一个辅助loss用来学习用户序列行为，当前hidden state应该能预测出下一个item。具体操作是当前GRU提取出的hidden state 和序列中下一个点击的item的embedding做内积应为1、和随机负采样一个item的embedding做内积应为0。具体loss计算：
$\begin{aligned} L_{a u x}=-& \frac{1}{N}\left(\sum_{i=1}^{N} \sum_{t} \log \sigma\left(\mathbf{h}_{t}^{i}, \mathbf{e}_{b}^{i}[t+1]\right)\right. \left.+\log \left(1-\sigma\left(\mathbf{h}_{t}^{i}, \hat{\mathbf{e}}_{b}^{i}[t+1]\right)\right) \right) \end{aligned}$
最终的损失函数：
$L=L_{\text {target }}+\alpha * L_{a u x}$

Interest Evolving Layer

输入是上一层GRU的输出 $\mathbf{i}_{t}^{\prime}=\mathbf{h}_{t}$ ，输出是这层GRU最后一个hidden state $\mathbf{h}_{T}^{\prime}$ ，和其他特征拼接在一起送入全连接层。

直接这样做就平平无奇了，还记得DIN的加权平均吗，这里Interest Evolving就是加权GRU。具体细节我们先加权，再加权GRU。

权

和DIN类似，第一层GRU的输出和当前候选item的embedding做一个attention计算：
$a_{t}=\frac{\exp \left(\mathbf{h}_{t} W \mathbf{e}_{a}\right)}{\sum_{j=1}^{T} \exp \left(\mathbf{h}_{j} W \mathbf{e}_{a}\right)}$
那怎么用这个attention权重呢，由易到难文章提出了三种方法：

加权GRU

GRU with attentional input（AIGRU）: 最简单的方法，直接将attention系数和第二层GRU的输入相乘：

$\mathbf{i}_{t}^{\prime}=\mathbf{h}_{t} * a_{t}$

Attention based GRU（AGRU）：借用问答领域文章提到的一种方法，直接将attention系数来替换GRU的update gate，直接对hidden state进行更新：

$\mathbf{h}_{t}^{\prime}=\left(1-a_{t}\right) * \mathbf{h}_{t-1}^{\prime}+a_{t} * \tilde{\mathbf{h}}_{t}^{\prime}$

GRU with attentional update gate（AUGRU）：虽然AGRU使用attention系数来直接控制hidden state的更新，但是是用一个标量 $\alpha_t$ 来代替一个向量 $u_t$ ，也就是忽略了不同维度重要的区别。文章提出增加attentional update gate的GRU结构实现attention机制和GRU的缝合：

$\begin{aligned} \tilde{\mathbf{u}}_{t}^{\prime} &=a_{t} * \mathbf{u}_{t}^{\prime} \\ \mathbf{h}_{t}^{\prime} &=\left(1-\tilde{\mathbf{u}}_{t}^{\prime}\right) \circ \mathbf{h}_{t-1}^{\prime}+\tilde{\mathbf{u}}_{t}^{\prime} \circ \tilde{h}_{i}^{\prime} \end{aligned}$

小结

数据集上的实验结果、线上A/B Test实验结果显示，DIEN模型设计的第一层GRU的细粒度Loss、第二层AUGRU都有显著提升。

作者在知乎讨论中也提到，这样复杂的设计是因为淘宝有大量超长用户行为序列，当序列长度超过100时，信息会被淹没，attention基本上和avg pooling没什么区别

全部评论 (0)

还没有任何评论哟~

【CTR】Deep Interest Evolution Network for Click-Through Rate Prediction (AAAI‘19)

DIENAAAI’19 这篇还是阿里的工作，进一步改进DIN，不仅要建模出用户行为的多兴趣，还要建模出时序关系。用两层GRU来建模多兴趣时序关系，引入了细粒度的辅助loss，在2018年算是工业界落地...

Deep Interest Network for Click-Through Rate Prediction

ABSTRACT 现有CTR常用的DNN架构中将用户特征表示为一个固定长度的embedding向量。固定长度会导致网络很难从用户的历史行为中学习到用户的多种兴趣。文中提出了DIN网络来解决这个问题，该...

KDD18 DIN Deep Interest Network for Click-Through Rate Prediction

注意力机制顾名思义，就是模型在预测的时候，对用户不同行为的注意力是不一样的，“相关”的行为历史看重一些，“不相关”的历史甚至可以忽略。那么这样的思想反应到模型中也是直观的。

阿里CTR预估三部曲（2）：Deep Interest Evolution Network for Click-Through Rate Prediction简析

阿里CTR预估三部曲（2）：DeepInterestEvolutionNetworkforClickThroughRatePrediction Introduction 作者提出以前的CTR预估方法都...

论文阅读：《Deep Interest Network for Click-Through Rate Prediction》

论文阅读：DeepInterestNetworkforClickThroughRatePrediction 摘要 1介绍 2相关工作 3背景 4深度兴趣网络 4.1特征表达 4.2基本模型Embedd...

阿里CTR预估三部曲（1）：Deep Interest Network for Click-Through Rate Prediction简析

阿里CTR预估三部曲（1）：DeepInterestNetworkforClickThroughRatePrediction Introduction CTR预估是工业应用中的重要任务，最近兴起的模型...

推荐系统论文：Deep Interest Network for Click-Through Rate Prediction

论文地址：DeepInterestNetworkforClickThroughRatePrediction 论文由阿里广告技术团队提出，是一篇实践性很强的文章。

论文 MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction

MiNet:MixedInterestNetworkforCrossDomainClickThroughRatePrediction 论文 CIKM2020 阿里应用场景跨域点击率预测问题例如：...

深度多兴趣网络,Deep Multi-Interest Network for Click-through Rate Prediction （DMIN）

深度多兴趣网络,DeepMultiInterestNetworkforClickthroughRatePrediction（DMIN）论文下载地址目的整体思路实现细节论文下载地址目的根据...

阿里经典推荐论文《Deep Interest Network for Click-Through Rate Prediction》理解

ZhouG,ZhuX,SongC,etal.Deepinterestnetworkforclickthroughrateprediction[C]//Proceedingsofthe24thACMSI...

是否确定退出登录?

【CTR】Deep Interest Evolution Network for Click-Through Rate Prediction (AAAI‘19)

DIEN (AAAI’19)

Interest Extractor Layer

Interest Evolving Layer

权

加权GRU

小结

全部评论 (0)

相关文章推荐

【CTR】Deep Interest Evolution Network for Click-Through Rate Prediction (AAAI‘19)

Deep Interest Network for Click-Through Rate Prediction

KDD18 DIN Deep Interest Network for Click-Through Rate Prediction

阿里CTR预估三部曲（2）：Deep Interest Evolution Network for Click-Through Rate Prediction简析

论文阅读：《Deep Interest Network for Click-Through Rate Prediction》

阿里CTR预估三部曲（1）：Deep Interest Network for Click-Through Rate Prediction简析

推荐系统论文：Deep Interest Network for Click-Through Rate Prediction

论文 MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction

深度多兴趣网络,Deep Multi-Interest Network for Click-through Rate Prediction （DMIN）

阿里经典推荐论文《Deep Interest Network for Click-Through Rate Prediction》理解