deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
Author(s) -
Michel Galley,
Chris Brockett,
Alessandro Sordoni,
Yangfeng Ji,
Michael Auli,
Chris Quirk,
Margaret Mitchell,
Jianfeng Gao,
Bill Dolan
Publication year - 2015
Language(s) - English
Resource type - Conference proceedings
DOI - 10.3115/v1/p15-2073
Subject(s) - metric (unit) , discriminative model , volume (thermodynamics) , computational linguistics , association (psychology) , computer science , cognitive science , artificial intelligence , natural language processing , library science , linguistics , philosophy , engineering , physics , epistemology , psychology , operations management , quantum mechanics
We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [−1, +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses, ∆BLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in terms of both Spearman’s ρ and Kendall’s τ .
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom