Open AccessICE-GRT: Instruction Context Enhancement by Generative Reinforcement based TransformersOpen Access
Author(s)
Chen Zheng,
Ke Sun,
Da Tang,
Yukun Ma,
Yuyu Zhang,
Chenguang Xi,
Xun Zhou
Publication year2024
The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMAencounter limitations in domain-specific tasks, with these models often lackingdepth and accuracy in specialized areas, and exhibiting a decrease in generalcapabilities when fine-tuned, particularly analysis ability in small sizedmodels. To address these gaps, we introduce ICE-GRT, utilizing ReinforcementLearning from Human Feedback (RLHF) grounded in Proximal Policy Optimization(PPO), demonstrating remarkable ability in in-domain scenarios withoutcompromising general task performance. Our exploration of ICE-GRT highlightsits understanding and reasoning ability to not only generate robust answers butalso to provide detailed analyses of the reasons behind the answer. Thiscapability marks a significant progression beyond the scope of SupervisedFine-Tuning models. The success of ICE-GRT is dependent on several crucialfactors, including Appropriate Data, Reward Size Scaling, KL-Control, AdvantageNormalization, etc. The ICE-GRT model exhibits state-of-the-art performance indomain-specific tasks and across 12 general Language tasks against equivalentsize and even larger size LLMs, highlighting the effectiveness of our approach.We provide a comprehensive analysis of the ICE-GRT, underscoring thesignificant advancements it brings to the field of LLM.
Language(s)English
Seeing content that should not be on Zendy? Contact us.
To access your conversation history and unlimited prompts, please
Prompt 0/10