模型能否從範例中學習技能組合?
Can Models Learn Skill Composition from Examples?
September 29, 2024
作者: Haoyu Zhao, Simran Kaur, Dingli Yu, Anirudh Goyal, Sanjeev Arora
cs.AI
摘要
隨著大型語言模型(LLMs)變得日益先進,它們展現組合泛化的能力——即在訓練期間未曾遇到的新方式結合所學技能的能力——引起了廣泛關注。這種泛化類型,特別是在訓練數據之外的情境中,也在研究人工智慧安全和對齊方面引起了極大興趣。最近的一項研究引入了SKILL-MIX評估,其中模型被要求撰寫一段短段落,展示特定k元語言技能的應用。儘管小型模型在k=3時難以撰寫,但像GPT-4這樣的大型模型在k=5和6時表現相當不錯。
在本文中,我們採用了類似於SKILL-MIX的設置來評估較小模型從示例中學習組合泛化的能力。利用多樣的語言技能,包括修辭、文學、推理、心靈理論和常識,我們使用GPT-4生成展示k技能隨機子集的文本樣本。在這些結合技能文本上對7B和13B參數模型進行後續微調,並增加k的值,揭示了以下發現:(1)在組合k=2和3技能的訓練後,模型在撰寫具有k=4和5技能的文本時表現出明顯改善,儘管模型在訓練期間從未見過這樣的示例。(2)當技能類別分為訓練組和保留組時,模型在測試期間明顯改善了撰寫具有保留技能的文本,儘管在微調期間只見過訓練技能,這說明了即使是以前未見過的技能,訓練方法的有效性。這項研究還表明,將技能豐富(可能是合成的)文本納入訓練中可以顯著增強模型的組合能力。
English
As large language models (LLMs) become increasingly advanced, their ability
to exhibit compositional generalization -- the capacity to combine learned
skills in novel ways not encountered during training -- has garnered
significant attention. This type of generalization, particularly in scenarios
beyond training data, is also of great interest in the study of AI safety and
alignment. A recent study introduced the SKILL-MIX evaluation, where models are
tasked with composing a short paragraph demonstrating the use of a specified
k-tuple of language skills. While small models struggled with composing even
with k=3, larger models like GPT-4 performed reasonably well with k=5 and
6.
In this paper, we employ a setup akin to SKILL-MIX to evaluate the capacity
of smaller models to learn compositional generalization from examples.
Utilizing a diverse set of language skills -- including rhetorical, literary,
reasoning, theory of mind, and common sense -- GPT-4 was used to generate text
samples that exhibit random subsets of k skills. Subsequent fine-tuning of 7B
and 13B parameter models on these combined skill texts, for increasing values
of k, revealed the following findings: (1) Training on combinations of k=2
and 3 skills results in noticeable improvements in the ability to compose
texts with k=4 and 5 skills, despite models never having seen such examples
during training. (2) When skill categories are split into training and held-out
groups, models significantly improve at composing texts with held-out skills
during testing despite having only seen training skills during fine-tuning,
illustrating the efficacy of the training approach even with previously unseen
skills. This study also suggests that incorporating skill-rich (potentially
synthetic) text into training can substantially enhance the compositional
capabilities of models.Summary
AI-Generated Summary