ChatPaper.aiChatPaper

模型能否從範例中學習技能組合?

Can Models Learn Skill Composition from Examples?

September 29, 2024
作者: Haoyu Zhao, Simran Kaur, Dingli Yu, Anirudh Goyal, Sanjeev Arora
cs.AI

摘要

隨著大型語言模型(LLMs)變得日益先進,它們展現組合泛化的能力——即在訓練期間未曾遇到的新方式結合所學技能的能力——引起了廣泛關注。這種泛化類型,特別是在訓練數據之外的情境中,也在研究人工智慧安全和對齊方面引起了極大興趣。最近的一項研究引入了SKILL-MIX評估,其中模型被要求撰寫一段短段落,展示特定k元語言技能的應用。儘管小型模型在k=3時難以撰寫,但像GPT-4這樣的大型模型在k=5和6時表現相當不錯。 在本文中,我們採用了類似於SKILL-MIX的設置來評估較小模型從示例中學習組合泛化的能力。利用多樣的語言技能,包括修辭、文學、推理、心靈理論和常識,我們使用GPT-4生成展示k技能隨機子集的文本樣本。在這些結合技能文本上對7B和13B參數模型進行後續微調,並增加k的值,揭示了以下發現:(1)在組合k=2和3技能的訓練後,模型在撰寫具有k=4和5技能的文本時表現出明顯改善,儘管模型在訓練期間從未見過這樣的示例。(2)當技能類別分為訓練組和保留組時,模型在測試期間明顯改善了撰寫具有保留技能的文本,儘管在微調期間只見過訓練技能,這說明了即使是以前未見過的技能,訓練方法的有效性。這項研究還表明,將技能豐富(可能是合成的)文本納入訓練中可以顯著增強模型的組合能力。
English
As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the SKILL-MIX evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified k-tuple of language skills. While small models struggled with composing even with k=3, larger models like GPT-4 performed reasonably well with k=5 and 6. In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills -- including rhetorical, literary, reasoning, theory of mind, and common sense -- GPT-4 was used to generate text samples that exhibit random subsets of k skills. Subsequent fine-tuning of 7B and 13B parameter models on these combined skill texts, for increasing values of k, revealed the following findings: (1) Training on combinations of k=2 and 3 skills results in noticeable improvements in the ability to compose texts with k=4 and 5 skills, despite models never having seen such examples during training. (2) When skill categories are split into training and held-out groups, models significantly improve at composing texts with held-out skills during testing despite having only seen training skills during fine-tuning, illustrating the efficacy of the training approach even with previously unseen skills. This study also suggests that incorporating skill-rich (potentially synthetic) text into training can substantially enhance the compositional capabilities of models.

Summary

AI-Generated Summary

PDF102November 13, 2024