NILE:大型語言模型中的內部一致性對齊

NILE: Internal Consistency Alignment in Large Language Models

December 21, 2024
作者: Minda Hu, Qiyuan Zhang, Yufei Wang, Bowei He, Hongru Wang, Jingyan Zhou, Liangyou Li, Yasheng Wang, Chen Ma, Irwin King
cs.AI

摘要

作為增強LLMs與人類意圖一致性的重要步驟,指令微調(IFT)對數據集質量有較高要求。然而,現有的IFT數據集通常包含與LLMs在預訓練階段學習的內部知識不一致的知識,這可能嚴重影響IFT的效果。為解決此問題,我們引入了NILE(iNternal consIstency aLignmEnt)框架,旨在優化IFT數據集以進一步發揮LLMs的能力。NILE通過引出目標預訓練LLM與指令數據相對應的內部知識來運作。利用內部知識來修改IFT數據集中的答案。此外,我們提出了一種新的內部一致性過濾(ICF)方法來過濾訓練樣本,確保其與LLM的內部知識高度一致。我們的實驗表明,NILE對齊的IFT數據集顯著提升了LLM在多個LLM能力評估數據集上的性能,分別在Arena-Hard上達到了66.6%的增益,在Alpaca-Eval V2上達到了68.5%。進一步分析證實了NILE框架的每個組件都有助於這些顯著的性能改進,並提供了令人信服的證據,即與預訓練內部知識一致的數據集一致性對於最大化LLM潛力至關重要。
English
As a crucial step to enhance LLMs alignment with human intentions, Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However, existing IFT datasets often contain knowledge that is inconsistent with LLMs' internal knowledge learned from the pre-training phase, which can greatly affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock LLMs' capability further. NILE operates by eliciting target pre-trained LLM's internal knowledge corresponding to instruction data. The internal knowledge is leveraged to revise the answer in IFT datasets. Additionally, we propose a novel Internal Consistency Filtering (ICF) method to filter training samples, ensuring its high consistency with LLM's internal knowledge. Our experiments demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across multiple LLM ability evaluation datasets, achieving up to 66.6% gain on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE}framework contributes to these substantial performance improvements, and provides compelling evidence that dataset consistency with pre-trained internal knowledge is pivotal for maximizing LLM potential.

Summary

AI-Generated Summary

PDF82December 24, 2024