模型能否從範例中學習技能組合？

摘要

隨著大型語言模型（LLMs）變得日益先進，它們展現組合泛化的能力——即在訓練期間未曾遇到的新方式結合所學技能的能力——引起了廣泛關注。這種泛化類型，特別是在訓練數據之外的情境中，也在研究人工智慧安全和對齊方面引起了極大興趣。最近的一項研究引入了SKILL-MIX評估，其中模型被要求撰寫一段短段落，展示特定k元語言技能的應用。儘管小型模型在k=3時難以撰寫，但像GPT-4這樣的大型模型在k=5和6時表現相當不錯。在本文中，我們採用了類似於SKILL-MIX的設置來評估較小模型從示例中學習組合泛化的能力。利用多樣的語言技能，包括修辭、文學、推理、心靈理論和常識，我們使用GPT-4生成展示k技能隨機子集的文本樣本。在這些結合技能文本上對7B和13B參數模型進行後續微調，並增加k的值，揭示了以下發現：（1）在組合k=2和3技能的訓練後，模型在撰寫具有k=4和5技能的文本時表現出明顯改善，儘管模型在訓練期間從未見過這樣的示例。（2）當技能類別分為訓練組和保留組時，模型在測試期間明顯改善了撰寫具有保留技能的文本，儘管在微調期間只見過訓練技能，這說明了即使是以前未見過的技能，訓練方法的有效性。這項研究還表明，將技能豐富（可能是合成的）文本納入訓練中可以顯著增強模型的組合能力。

English

As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the SKILL-MIX evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified k-tuple of language skills. While small models struggled with composing even with k=3, larger models like GPT-4 performed reasonably well with k=5 and 6. In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills -- including rhetorical, literary, reasoning, theory of mind, and common sense -- GPT-4 was used to generate text samples that exhibit random subsets of k skills. Subsequent fine-tuning of 7B and 13B parameter models on these combined skill texts, for increasing values of k, revealed the following findings: (1) Training on combinations of k=2 and 3 skills results in noticeable improvements in the ability to compose texts with k=4 and 5 skills, despite models never having seen such examples during training. (2) When skill categories are split into training and held-out groups, models significantly improve at composing texts with held-out skills during testing despite having only seen training skills during fine-tuning, illustrating the efficacy of the training approach even with previously unseen skills. This study also suggests that incorporating skill-rich (potentially synthetic) text into training can substantially enhance the compositional capabilities of models.

模型能否從範例中學習技能組合？

Can Models Learn Skill Composition from Examples?

摘要

Summary

Support

Support