NILE:大型语言模型中的内部一致性对齐
NILE: Internal Consistency Alignment in Large Language Models
December 21, 2024
作者: Minda Hu, Qiyuan Zhang, Yufei Wang, Bowei He, Hongru Wang, Jingyan Zhou, Liangyou Li, Yasheng Wang, Chen Ma, Irwin King
cs.AI
摘要
作为增强LLMs与人类意图一致性的关键步骤,
指令微调(IFT)对数据集质量有很高的要求。然而,
现有的IFT数据集通常包含与LLMs从预训练阶段学习到的内部知识不一致的知识,
这可能严重影响IFT的有效性。为解决这一问题,我们引入了NILE(内部一致性对齐)框架,
旨在优化IFT数据集以进一步释放LLMs的能力。NILE通过引出目标预训练LLM的
内部知识,与指令数据相对应。利用内部知识修订IFT数据集中的答案。此外,
我们提出了一种新颖的内部一致性过滤(ICF)方法来过滤训练样本,
确保其与LLM的内部知识高度一致。我们的实验表明,与NILE对齐的IFT数据集
显著提升了LLM在多个LLM能力评估数据集上的性能,Arena-Hard提升高达66.6%,
Alpaca-Eval V2提升68.5%。进一步分析证实NILE框架的每个组件都有助于这些
显著性能改进,并提供了有力证据,即与预训练内部知识一致的数据集对于
最大化LLM潜力至关重要。
English
As a crucial step to enhance LLMs alignment with human intentions,
Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However,
existing IFT datasets often contain knowledge that is inconsistent with LLMs'
internal knowledge learned from the pre-training phase, which can greatly
affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal
consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock
LLMs' capability further. NILE operates by eliciting target pre-trained LLM's
internal knowledge corresponding to instruction data. The internal knowledge is
leveraged to revise the answer in IFT datasets. Additionally, we propose a
novel Internal Consistency Filtering (ICF) method to filter training samples,
ensuring its high consistency with LLM's internal knowledge. Our experiments
demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across
multiple LLM ability evaluation datasets, achieving up to 66.6% gain on
Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each
component of the NILE}framework contributes to these substantial performance
improvements, and provides compelling evidence that dataset consistency with
pre-trained internal knowledge is pivotal for maximizing LLM potential.Summary
AI-Generated Summary