Advertisement

Deep & Cross Network for Ad Click Predictions【论文记录】

阅读量:

基于 Wide&Deep 模型 的框架,在 Wide 部分进行了优化升级,将原有的 Wide 部分替换成 Cross 网络结构。这种设计使得模型能够自动完成特征选择过程,在无需人工干预特征选择过程的前提下实现了更好的性能表现。与传统 Wide&Deep 方法相比,在交叉网络架构下实现了参数共享机制的同时保持了计算效率。

1 摘要

DNNs implicitly generate all interactions, which do not necessarily efficiently learn all types of cross features.

此处各类别并不明确。

这些类别可能涉及低阶与高阶特征。

文中指出DCN具备某种程度受限的特征交互能力(即具有有界度数的特征交互能力)。

DCN utilizes feature crossing through each layer, does not necessitate manual feature engineering, and incurs minimal additional complexity in the DNN model

这里是要对比 Wide&Deep 的人为选取特征,但其实就是 DNN 的工作

2 介绍

Recognizing frequently predictive features while also investigating unseen or rare cross features is central to achieving accurate predictions.

在该系统中反复提取的关键特征可能对应于 Wide&Deep 中人工选择的关键性特征。

2.1 相关工作

Deep Crossing不仅建立在剩余网络的基础上,并且通过整合各类输入实现了自动生成特征的能力.

DNNs are capable of approximating any function within specified smoothness conditions when provided with a sufficient number of hidden units or hidden layers. Within the scenario where there are sufficiently many hidden units or hidden layers, DNNs are able to approximate any function with arbitrary precision.

In the Kaggle competition, the manually constructed features in many winning solutions are low-degree, presented in an explicit manner and proven effective. This has provided insights into developing models capable of learning bounded-degree feature interactions more efficiently and clearly than a universal deep neural network (DNN). In Kaggle's competitions, manually crafted features in winning solutions are typically low-degree, presented in explicit forms and demonstrated to be highly effective. These observations have highlighted the potential for creating models that can learn finite-order feature interactions more effectively and explicitly compared to universal deep neural networks (DNNs).

Wide&Deep模型 采用交叉特征放入线性模型,并与DNN 共同训练;本文旨在自动生成特征而非人工筛选。

2.2 主要工作

DCN 具有能力有效地捕捉有限度特征交互的效果,并且能够学习复杂的高阶非线性特征互动关系;它无需依赖人工设计特征工程方案或进行繁琐的手工搜索操作;此外该方法在计算资源方面表现优异运行效率高能耗低

DCN achieves lower logloss than a DNN where the number of parameters is nearly an order of magnitude fewer.

3 网络结构

Deep&Cross网络结构

3.1 Embedding and Stacking layer

我们可以使用一种嵌入方法来将这些二元特征转化为实数值的密集向量(通常称为嵌入向量)。 为了减少维度,在采用一种嵌入方法的情况下,默认会将这些二元特征转化为实数值的密集向量(通称为嵌入向量)。

In conclusion, we combine the embedding vectors alongside the normalized dense features x_{dense} to produce a unified vector:

3.2 Cross Network

每个交叉层结构中:

\mathbf{x}_{l+1}=\mathbf{x}_{0}\cdot\boldsymbol{\omega}_{{w}}^{T}\cdot\boldsymbol{\omega}_{{b}}+\boldsymbol{\omega}_{{x}}=\varphi(\boldsymbol{\omega}_{{w}},\boldsymbol{\omega}_{{b}},\boldsymbol{\omega}_{{x}})

其中\varphi:\Re^{d}\times\Re^{d}\times\Re^{d}\rightarrow\Re^{d}是一个非线性变换函数,并且满足\|\varphi(\boldsymbol{\omega}_{{w}},\boldsymbol{\omega}_{{b}},\boldsymbol{\omega}_{{x}})\| = \|(\boldsymbol{\omega}_{{w}},\boldsymbol{\omega}_{{b}},\boldsymbol{\omega}_{{x}})\|;此外,在每一层之间通过映射函数f:\Re^{d}\rightarrow\Re^{d}来拟合当前节点与上一层节点之间的差异关系

交叉层

Let L_c represent the number of cross layers, and let d stand for the input dimension. The count of parameters involved in the cross network is given by d \times L_c \times 2 \tag{3}

In fact, the cross network includes all crossing terms ranging in degree from one to l+1.

The time and space complexity for a cross network scales linearly with respect to the input dimension. The efficiency stems from the rank-one characteristic of x_0x^T_l, which permits generation of all interaction terms without computation or storage of full matrix.

3.3 Combination Layer

融合层通过整合来自两个子网络的信息,并将整合后的特征传递到标准logits层进行计算。

The loss function consists of the log loss and a L_2 regularization term in addition to.

4 相关工作

The cross network inherits the core concept of parameter sharing from the FM model and subsequently builds upon this foundation to create a more complex architecture.

Both models possess each feature with some parameters that are unrelated to other features, and the weight of a cross term constitutes a specific combination of corresponding parameters.

这个交叉网络的想法来自于 FM,把特征交互的阶数延伸

Parameter sharing simultaneously improves the model's efficiency and enables it to achieve generalization over unknown feature interactions while being more robust to noise.

5 总结

The Deep & Cross Network is adept at handling both extensive sparse and dense feature sets, simultaneously learning explicit cross-feature representations with bounded degrees alongside conventional deep learning architectures.

The degree level of crossing features is incremented by one unit at every cross layer.


  1. Gregory Valiant. 2014. Learning polynomials with neural networks. (2014) ↩︎

该研究由Andreas Veit、Michael J Wilber和Serge Belongie于2016年完成,并探讨了残差网络的行为特征。他们的发现表明这种行为与由较浅层网络组成的集成模型相似,并收入Advances in Neural Information Processing Systems 29一书

全部评论 (0)

还没有任何评论哟~