LOGO -- 透過高效偏好優化的長文本對齊
LOGO -- Long cOntext aliGnment via efficient preference Optimization
October 24, 2024
作者: Zecheng Tang, Zechen Sun, Juntao Li, Qiaoming Zhu, Min Zhang
cs.AI
摘要
長文本模型(LCMs)已展現出處理長輸入序列(甚至超過 100M 個標記)的便利和有效潛力。隨著重大進展,最近的研究指出,LCMs 能夠準確地定位上下文中的標記級重要信息。然而,這些 LCMs 的生成性能遠未令人滿意,可能導致錯位的回應,如幻覺。為了增強 LCMs 的生成能力,現有研究已調查了數據大小和質量對於預訓練和指導調整的影響。儘管取得了有意義的改進,以往的方法在效果或效率方面均存在不足。在本文中,我們介紹了 LOGO(通過高效偏好優化實現長文本對齊),這是一種培訓策略,首先引入了長文本對齊的偏好優化。為了克服由於長序列引起的 GPU 內存限制問題,LOGO 使用了一種無參考的偏好優化策略,並採用了一種位置綜合方法來構建培訓數據。通過在單個 8timesA800 GPU 機器上僅使用 0.3B 數據進行 16 小時的培訓,LOGO 使 Llama-3-8B-Instruct-80K 模型能夠在現實中的長文本任務中實現與 GPT-4 可比的性能,同時保留模型在其他任務(如語言建模和 MMLU)上的原始能力。此外,LOGO 還可以擴展模型的上下文窗口大小,同時增強其生成性能。
English
Long-context models(LCMs) have shown great potential in processing long input
sequences(even more than 100M tokens) conveniently and effectively. With
significant progress, recent research has pointed out that LCMs can accurately
locate token-level salient information within the context. Yet, the generation
performance of these LCMs is far from satisfactory and might result in
misaligned responses, such as hallucinations. To enhance the generation
capability of LCMs, existing works have investigated the effects of data size
and quality for both pre-training and instruction tuning. Though achieving
meaningful improvement, previous methods fall short in either effectiveness or
efficiency. In this paper, we introduce LOGO(Long cOntext aliGnment via
efficient preference Optimization), a training strategy that first introduces
preference optimization for long-context alignment. To overcome the GPU
memory-bound issue caused by the long sequence, LOGO employs a reference-free
preference optimization strategy and adopts a position synthesis method to
construct the training data. By training with only 0.3B data on a single
8timesA800 GPU machine for 16 hours, LOGO allows the Llama-3-8B-Instruct-80K
model to achieve comparable performance with GPT-4 in real-world long-context
tasks while preserving the model's original capabilities on other tasks, e.g.,
language modeling and MMLU. Moreover, LOGO can extend the model's context
window size while enhancing its generation performance.Summary
AI-Generated Summary