Advertisement

BLEU: a Method for Automatic Evaluation of Machine Translation

阅读量:

bleu 是用来衡量机器翻译结果好坏的指标

最初的版本:

为了计算候选词汇中的关键词在参考中的出现频率,请将该关键词在参考中的出现次数除以总次数,并将其结果定义为precision

但可能会出现以下情况:

此时

p=rac{7}{7}=1

显然这个计算方法是存在bug的

改良版:

可以解决大部分问题但是

p=1

所以加入短句惩罚

为此处我们定义一个概念: 当candidate doc的长度与任何一个reference doc的长度相等时即称这种状态为最佳匹配状态在此状态下无需对翻译结果施加惩罚;而当candidate doc的长度与所有reference doc的长度都不相等时则需引入一个参考长度(记作Reflen并将其标记为Reflen同时定义candidate doc的长度为cc)在此基础上计算出相应的惩罚因子即遵循以下公式

final equation:

Pi=rac{Wmin}{Wsum}
Wmin=min,len
Wsum=len

Wn is the n-gram's weight

for example:w=[0.25,0.25,0.25,0.25]

**

BLEU=BP*\(0.25log p_{1}+0.25log p_{2}+0.25log p_{3}+0.25log p_{4}

)**

BP is brevity_penalty

r is the length of reference sentence

c is the length of candidate sentence

if c>r BP equal to 1

else BP equal to exp(1-r/c)

BLEU源码

复制代码
 from __future__ import division

    
 import math
    
 import os
    
 import collections
    
 from nltk.tokenize import word_tokenize
    
 from nltk import Counter
    
 from nltk.util import ngrams
    
 def bleu(candidate, references, weights):
    
     p_ns = (
    
     _modified_precision(candidate, references, i)
    
     for i, _ in enumerate(weights, start=1)
    
     )
    
     try:
    
     s = math.fsum(w * math.log(p_n) for w, p_n in zip(weights, p_ns))
    
     except ValueError:
    
     # some p_ns is 0
    
     return 0
    
     bp = _brevity_penalty(candidate, references)
    
     return bp * math.exp(s)
    
 def _modified_precision(candidate, references, n):
    
     counts = Counter(ngrams(candidate, n))
    
     if not counts:
    
     return 0
    
     max_counts = {}
    
     for reference in references:
    
     reference_counts = Counter(ngrams(reference, n))
    
     for ngram in counts:
    
         max_counts[ngram] = max(max_counts.get(ngram, 0), reference_counts[ngram])
    
  
    
     clipped_counts = dict((ngram, min(count, max_counts[ngram])) for ngram, count in counts.items())
    
  
    
     return sum(clipped_counts.values()) / sum(counts.values())
    
  
    
  
    
 def _brevity_penalty(candidate, references):  #简短惩罚BP
    
     
    
     c = len(candidate)
    
     ref_lens = (len(reference) for reference in references)
    
     r = min(ref_lens, key=lambda ref_len: (abs(ref_len - c), ref_len))
    
     if c > r:
    
     return 1
    
     else:
    
     return math.exp(1 - r / c)

测试:

复制代码
  """Calculate BLEU score (Bilingual Evaluation Understudy)

    
   3.     :param candidate: a candidate sentence
    
     :type candidate: list(str)
    
     :param references: reference sentences
    
     :type references: list(list(str))
    
     :param weights: weights for unigrams, bigrams, trigrams and so on
    
     :type weights: list(float)"""
    
 weights = [0.25, 0.25, 0.25, 0.25]
    
 candidate1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
    
          'ensures', 'that', 'the', 'military', 'always',
    
         'obeys', 'the', 'commands', 'of', 'the', 'party']
    
  
    
 candidate2 = ['It', 'is', 'to', 'insure', 'the', 'troops',
    
          'forever', 'hearing', 'the', 'activity', 'guidebook','that', 'party', 'direct']
    
  
    
 reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
    
           'ensures', 'that', 'the', 'military', 'will', 'forever',
    
           'heed', 'Party', 'commands']
    
 reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
    
           'guarantees', 'the', 'military', 'forces', 'always',
    
          'being', 'under', 'the', 'command', 'of', 'the',
    
           'Party']
    
 reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
    
           'army', 'always', 'to', 'heed', 'the', 'directions',
    
         'of', 'the', 'party']
    
  
    
 bleu(candidate1, [reference1, reference2, reference3], weights)

全部评论 (0)

还没有任何评论哟~