Research Library

open-access-imgOpen AccessDCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Author(s)
Wendi Cui,
Jiaxin Zhang,
Zhuohang Li,
Lopez Damien,
Kamalika Das,
Bradley Malin,
Sricharan Kumar
Publication year2024
Evaluating the quality and variability of text generated by Large LanguageModels (LLMs) poses a significant, yet unresolved research challenge.Traditional evaluation methods, such as ROUGE and BERTScore, which measuretoken similarity, often fail to capture the holistic semantic equivalence. Thisresults in a low correlation with human judgments and intuition, which isespecially problematic in high-stakes applications like healthcare and financewhere reliability, safety, and robust decision-making are highly critical. Thiswork proposes DCR, an automated framework for evaluating and improving theconsistency of LLM-generated texts using a divide-conquer-reasoning approach.Unlike existing LLM-based evaluators that operate at the paragraph level, ourmethod employs a divide-and-conquer evaluator (DCE) that breaks down theparagraph-to-paragraph comparison between two generated responses intoindividual sentence-to-paragraph comparisons, each evaluated based onpredefined criteria. To facilitate this approach, we introduce an automaticmetric converter (AMC) that translates the output from DCE into aninterpretable numeric score. Beyond the consistency evaluation, we furtherpresent a reason-assisted improver (RAI) that leverages the analytical reasonswith explanations identified by DCE to generate new responses aimed at reducingthese inconsistencies. Through comprehensive and systematic empirical analysis,we show that our approach outperforms state-of-the-art methods by a largemargin (e.g., +19.3% and +24.3% on the SummEval dataset) in evaluating theconsistency of LLM generation across multiple benchmarks in semantic, factual,and summarization consistency tasks. Our approach also substantially reducesnearly 90% of output inconsistencies, showing promise for effectivehallucination mitigation.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here