语言模型可以自我延长以生成长文本。

摘要

最近对大型语言模型（LLMs）的进展显著增强了它们处理长文本的能力，但在生成长且对齐的输出方面仍存在显著差距。这一限制源自训练中的差距，即预训练缺乏长文本生成的有效指导，而后训练数据主要包括短查询-响应对。当前的方法，如指导回译和行为模仿，面临数据质量、版权问题以及专有模型使用限制等挑战。本文介绍了一种创新的迭代训练框架，名为Self-Lengthen，它仅利用LLMs的内在知识和技能，无需辅助数据或专有模型。该框架由生成器和扩展器两个角色组成。生成器生成初始响应，然后由扩展器分割和扩展。这一过程产生了一个新的、更长的响应，用于迭代地训练生成器和扩展器。通过这一过程，模型逐渐被训练以处理越来越长的响应。在基准测试和人类评估实验中，我们发现当应用于Qwen2和LLaMA3等顶尖开源LLMs时，Self-Lengthen在长文本生成方面优于现有方法。我们的代码可以在https://github.com/QwenLM/Self-Lengthen 上公开获取。

English

Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to process long contexts, yet a notable gap remains in generating long, aligned outputs. This limitation stems from a training gap where pre-training lacks effective instructions for long-text generation, and post-training data primarily consists of short query-response pairs. Current approaches, such as instruction backtranslation and behavior imitation, face challenges including data quality, copyright issues, and constraints on proprietary model usage. In this paper, we introduce an innovative iterative training framework called Self-Lengthen that leverages only the intrinsic knowledge and skills of LLMs without the need for auxiliary data or proprietary models. The framework consists of two roles: the Generator and the Extender. The Generator produces the initial response, which is then split and expanded by the Extender. This process results in a new, longer response, which is used to train both the Generator and the Extender iteratively. Through this process, the models are progressively trained to handle increasingly longer responses. Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation, when applied to top open-source LLMs such as Qwen2 and LLaMA3. Our code is publicly available at https://github.com/QwenLM/Self-Lengthen.

语言模型可以自我延长以生成长文本。

Language Models can Self-Lengthen to Generate Long Texts

摘要

Summary

Support

Support