BLEU: a Method for Automatic Evaluation of Machine Translation
发布时间
阅读量:
阅读量
bleu 是用来衡量机器翻译结果好坏的指标
最初的版本:
为了计算候选词汇中的关键词在参考中的出现频率,请将该关键词在参考中的出现次数除以总次数,并将其结果定义为precision
但可能会出现以下情况:

此时

显然这个计算方法是存在bug的
改良版:

可以解决大部分问题但是


所以加入短句惩罚
为此处我们定义一个概念: 当candidate doc的长度与任何一个reference doc的长度相等时即称这种状态为最佳匹配状态在此状态下无需对翻译结果施加惩罚;而当candidate doc的长度与所有reference doc的长度都不相等时则需引入一个参考长度(记作Reflen并将其标记为Reflen同时定义candidate doc的长度为cc)在此基础上计算出相应的惩罚因子即遵循以下公式
final equation:





Wn is the n-gram's weight
for example:w=[0.25,0.25,0.25,0.25]
**

)**
BP is brevity_penalty
r is the length of reference sentence
c is the length of candidate sentence
if c>r BP equal to 1
else BP equal to exp(1-r/c)
BLEU源码
from __future__ import division
import math
import os
import collections
from nltk.tokenize import word_tokenize
from nltk import Counter
from nltk.util import ngrams
def bleu(candidate, references, weights):
p_ns = (
_modified_precision(candidate, references, i)
for i, _ in enumerate(weights, start=1)
)
try:
s = math.fsum(w * math.log(p_n) for w, p_n in zip(weights, p_ns))
except ValueError:
# some p_ns is 0
return 0
bp = _brevity_penalty(candidate, references)
return bp * math.exp(s)
def _modified_precision(candidate, references, n):
counts = Counter(ngrams(candidate, n))
if not counts:
return 0
max_counts = {}
for reference in references:
reference_counts = Counter(ngrams(reference, n))
for ngram in counts:
max_counts[ngram] = max(max_counts.get(ngram, 0), reference_counts[ngram])
clipped_counts = dict((ngram, min(count, max_counts[ngram])) for ngram, count in counts.items())
return sum(clipped_counts.values()) / sum(counts.values())
def _brevity_penalty(candidate, references): #简短惩罚BP
c = len(candidate)
ref_lens = (len(reference) for reference in references)
r = min(ref_lens, key=lambda ref_len: (abs(ref_len - c), ref_len))
if c > r:
return 1
else:
return math.exp(1 - r / c)
测试:
"""Calculate BLEU score (Bilingual Evaluation Understudy)
3. :param candidate: a candidate sentence
:type candidate: list(str)
:param references: reference sentences
:type references: list(list(str))
:param weights: weights for unigrams, bigrams, trigrams and so on
:type weights: list(float)"""
weights = [0.25, 0.25, 0.25, 0.25]
candidate1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
'ensures', 'that', 'the', 'military', 'always',
'obeys', 'the', 'commands', 'of', 'the', 'party']
candidate2 = ['It', 'is', 'to', 'insure', 'the', 'troops',
'forever', 'hearing', 'the', 'activity', 'guidebook','that', 'party', 'direct']
reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
'ensures', 'that', 'the', 'military', 'will', 'forever',
'heed', 'Party', 'commands']
reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
'guarantees', 'the', 'military', 'forces', 'always',
'being', 'under', 'the', 'command', 'of', 'the',
'Party']
reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
'army', 'always', 'to', 'heed', 'the', 'directions',
'of', 'the', 'party']
bleu(candidate1, [reference1, reference2, reference3], weights)
全部评论 (0)
还没有任何评论哟~
