NILE: 대규모 언어 모델에서의 내부 일관성 정렬

초록

LLM(Large Language Model)이 인간의 의도와 조화를 이루기 위한 중요한 단계로서, 지시 Fine-Tuning(IFT)은 데이터셋 품질에 높은 수요가 있다. 그러나 기존의 IFT 데이터셋은 종종 LLM의 사전 훈련 단계에서 학습한 내부 지식과 일치하지 않는 지식을 포함하고 있어, IFT의 효과에 큰 영향을 미칠 수 있다. 이 문제를 해결하기 위해, 우리는 NILE(iNternal consIstency aLignmEnt) 프레임워크를 소개하며, 이는 IFT 데이터셋을 최적화하여 LLM의 능력을 더욱 발휘할 수 있도록 하는 것을 목표로 한다. NILE은 목표 사전 훈련된 LLM의 내부 지식을 유도하여 지시 데이터에 해당하는 내부 지식을 활용하여 IFT 데이터셋에서 답변을 수정하는 방식으로 작동한다. 게다가, 우리는 훈련 샘플을 필터링하여 LLM의 내부 지식과 높은 일관성을 보장하는 새로운 내부 일관성 필터링(ICF) 방법을 제안한다. 우리의 실험은 NILE에 맞춘 IFT 데이터셋이 다양한 LLM 능력 평가 데이터셋에서 LLM 성능을 현저히 향상시키며, Arena-Hard에서 최대 66.6%의 향상과 Alpaca-Eval V2에서 68.5%의 향상을 달성함을 보여준다. 추가적인 분석은 NILE 프레임워크의 각 구성 요소가 이러한 상당한 성능 향상에 기여하고, 사전 훈련된 내부 지식과의 데이터셋 일관성이 LLM 잠재력을 극대화하는 데 중요하다는 강력한 증거를 제공한다.

English

As a crucial step to enhance LLMs alignment with human intentions, Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However, existing IFT datasets often contain knowledge that is inconsistent with LLMs' internal knowledge learned from the pre-training phase, which can greatly affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock LLMs' capability further. NILE operates by eliciting target pre-trained LLM's internal knowledge corresponding to instruction data. The internal knowledge is leveraged to revise the answer in IFT datasets. Additionally, we propose a novel Internal Consistency Filtering (ICF) method to filter training samples, ensuring its high consistency with LLM's internal knowledge. Our experiments demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across multiple LLM ability evaluation datasets, achieving up to 66.6% gain on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE}framework contributes to these substantial performance improvements, and provides compelling evidence that dataset consistency with pre-trained internal knowledge is pivotal for maximizing LLM potential.

NILE: 대규모 언어 모델에서의 내부 일관성 정렬

NILE: Internal Consistency Alignment in Large Language Models

초록

Summary

Support

Support