读《DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization》
组会讲的论文,AAAI-19的一个工作,记一记
任务:文本摘要生成
方法:
-
Extractive
基于分类,判断文章每一句话属不属于摘要 -
Abstractive
基于生成,常用的框架是encoder-decoder
Predominantly, most deep summarization models are aimed at establishing a direct mapping between documents and their summaries. Rather than attempting this, our DeepChannel is designed to calculate a channel probability that can assess the prominence or importance of any given document-summary pair.
传统的做法主要依赖于神经网络来直接建模文章与摘要之间的映射关系。相比之下,论文的核心思路在于构建一个条件概率模型P(D|S),用于描述文章D在摘要S背景下的可能性分布,并基于此概率模型进行提取性的工作。
符号定义:
We represent a document-summary pair as (D,S). 其中D由|D|个句子组成[d_1, d_2, ⋯ ,d_{\lvert D\rvert }]. 类似地,S由|S|个句子组成[s_1, s_2, ⋯ ,s_{\lvert S\rvert }]. 即第i个句子中的单词序列可表示为d_i = [\textit{word}_{w,i1},\textit{word}_{w,i2},…,\textit{word}_{w,i,\lvert d_i \rvert }], 其中\textit{word}_{w,i,j}表示d_i中的第j个单词. 同样地,我们有s_j = [\textit{word}_{w,j1},\textit{word}_{w,j2},…,\textit{word}_{w,j,\lvert s_j \rvert }].
先验的假设:
- We believe that sentences within a document exhibit conditional independence.
- Another assumption posits that varying degrees of contribution by different summary sentences influence the generation of d_i.
条件概率模型:

公式描述:

对比学习:

Penalization Term:
We believe that a reasonable level of attention should meet the following two requirements.
A powerful i-th document sentence should prioritize its most relevant summary sentences. 理想的理想情况下, attention矩阵应当是one-hot形式,即满足ATA中非对角线元素均为零. 为此,我们应对其被赋予正惩罚值的位置予以特别关注
每个总结句都至关重要,并且每一个总结句都应受到来自某些文档句子的关注和考虑。
这样得到最终的损失函数:

NN部分得到后,给出贪婪抽取的算法:

注意的地方:
- 采用文档或句子编码的方式,则共享相同的GRU参数集合。
- 一个BP过程由一个文档和两个句子驱动,则需要共享相同的GRU参数集合。
