z-logo
Premium
ELWARD : Empowering Language Model With World Insights and Human‐Aligned Reward Design
Author(s) -
Du Yongping,
Li Siyuan,
Yan Rui,
Hou Ying,
Han Honggui
Publication year - 2025
Publication title -
expert systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.365
H-Index - 38
eISSN - 1468-0394
pISSN - 0266-4720
DOI - 10.1111/exsy.70055
ABSTRACT Large language models (LLMs) have made significant progress in many tasks, but they may also generate biased or misleading outputs. Alignment techniques address this issue by refining models to reflect human values, but high‐quality preference datasets are limited. This study introduces a method to train a high‐performance reward model (RM) by integrating open knowledge with human feedback. We construct the Open Knowledge and Human Feedback (OK‐HF) dataset, comprising 39.8 million open preference data entries and 30,000 human feedback entries. The dual‐stage aligning strategy is proposed to combine preference pre‐training with domain adaptation, leveraging multi‐objective optimization to enhance learning from both preference data and fine‐grained human feedback. The Open Knowledge and Human‐feedback Reward Model (OKH‐RM), designed with the dual‐stage aligning strategy on the OK‐HF dataset, demonstrates exceptional performance in aligning LLMs with human preferences. The experimental results show that OKH‐RM outperforms Llama2‐RM, Qwen‐RM and Ultra‐RM, particularly achieving an accuracy of 85.93% on the Stanford SHP dataset. The model has shown advanced capabilities in detecting low‐quality repetitive responses and mitigating biases related to response length.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom