大型語言模型能夠在長文本推理中自我改進。
Large Language Models Can Self-Improve in Long-context Reasoning
November 12, 2024
作者: Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai Lam
cs.AI
摘要
大型語言模型(LLMs)在處理長文本方面取得了顯著進展,但仍然在長文本推理方面遇到困難。現有方法通常涉及使用合成數據對LLMs進行微調,這取決於人類專家的標註或像GPT-4這樣的先進模型,因此限制了進一步的發展。為解決這個問題,我們研究了LLMs在長文本推理中自我改進的潛力,並提出了\ours,這是專門為此目的設計的方法。這種方法很直接:我們為每個問題採樣多個輸出,用最小貝葉斯風險對它們進行評分,然後根據這些輸出應用監督微調或基於偏好的優化。對幾個領先的LLMs進行了大量實驗,證明了\ours的有效性,對於Llama-3.1-8B-Instruct來說,絕對改進了4.2個點。此外,與依賴人類專家或先進模型生成的數據的先前方法相比,\ours實現了更優異的性能。我們預計這項工作將為長文本情境中的自我改進技術開辟新途徑,這對LLMs的持續發展至關重要。
English
Large language models (LLMs) have achieved substantial progress in processing
long contexts but still struggle with long-context reasoning. Existing
approaches typically involve fine-tuning LLMs with synthetic data, which
depends on annotations from human experts or advanced models like GPT-4, thus
restricting further advancements. To address this issue, we investigate the
potential for LLMs to self-improve in long-context reasoning and propose \ours,
an approach specifically designed for this purpose. This approach is
straightforward: we sample multiple outputs for each question, score them with
Minimum Bayes Risk, and then apply supervised fine-tuning or preference
optimization based on these outputs. Extensive experiments on several leading
LLMs demonstrate the effectiveness of \ours, with an absolute improvement of
4.2 points for Llama-3.1-8B-Instruct. Furthermore, \ours achieves superior
performance compared to prior approaches that depend on data produced by human
experts or advanced models. We anticipate that this work will open new avenues
for self-improvement techniques in long-context scenarios, which are essential
for the continual advancement of LLMs.Summary
AI-Generated Summary