最小調整以解鎖具有高質量數據的LLMs的長輸出，數據為關鍵。

摘要

隨著大型語言模型快速演進以支援更長的上下文，它們在生成較長輸出方面的能力存在明顯的不均衡。最近的研究表明，這種不平衡的主要原因可能來自於在對齊訓練期間缺乏長輸出的數據。鑒於這一觀察，人們試圖通過填補這一差距的數據重新對齊基礎模型，從而使這些模型能夠在指示時生成較長的輸出。在本文中，我們探討了在調整模型以生成長輸出時數據質量的影響，以及從人類對齊（指示或聊天）模型的起點開始進行調整的可能性。通過精心策劃數據，我們展示了在我們調整的模型中，只需少量訓練數據實例和計算即可實現類似的性能改進。此外，我們通過將我們的調整配方應用於幾個模型來評估這種方法的泛化能力。我們的研究結果表明，儘管不同模型在開箱即用時生成長輸出的能力存在差異，但我們使用高質量數據和輕量計算來調整它們的方法在我們實驗的所有模型中都持續顯著改進。我們已經公開了我們策劃的用於調整長文寫作能力的數據集，模型調整和評估的實施，以及經過微調的模型，所有這些都可以公開訪問。

English

As large language models rapidly evolve to support longer context, there is a notable disparity in their capability to generate output at greater lengths. Recent study suggests that the primary cause for this imbalance may arise from the lack of data with long-output during alignment training. In light of this observation, attempts are made to re-align foundation models with data that fills the gap, which result in models capable of generating lengthy output when instructed. In this paper, we explore the impact of data-quality in tuning a model for long output, and the possibility of doing so from the starting points of human-aligned (instruct or chat) models. With careful data curation, we show that it possible to achieve similar performance improvement in our tuned models, with only a small fraction of training data instances and compute. In addition, we assess the generalizability of such approaches by applying our tuning-recipes to several models. our findings suggest that, while capacities for generating long output vary across different models out-of-the-box, our approach to tune them with high-quality data using lite compute, consistently yields notable improvement across all models we experimented on. We have made public our curated dataset for tuning long-writing capability, the implementations of model tuning and evaluation, as well as the fine-tuned models, all of which can be openly-accessed.

最小調整以解鎖具有高質量數據的LLMs的長輸出，數據為關鍵。

Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key

摘要

Summary

Support

Support