強大的模型並不一定是進行指導調整的優秀教師。
Stronger Models are NOT Stronger Teachers for Instruction Tuning
November 11, 2024
作者: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha Poovendran
cs.AI
摘要
指令調校已被廣泛採用,以確保大型語言模型(LLMs)有效地遵循使用者指示。LLMs的指令遵循能力主要依賴於用於調校的指令數據集。最近,合成指令數據集已經出現作為一種經濟上可行的解決方案,以為LLMs提供多樣且高質量的指令。然而,現有方法通常假設較大或較強的模型對於指令調校是更好的教師,因此簡單地將這些模型作為合成指令的回應生成器。在本文中,我們挑戰這個常見的假設。我們在五個基本模型和二十個回應生成器上進行了廣泛的實驗,發現較大和較強的模型未必是較小模型的更好教師。我們將這種現象稱為較大模型的悖論。我們觀察到現有的指標無法精確預測回應生成器的效果,因為它們忽略了教師和被微調的基本模型之間的兼容性。因此,我們開發了一個新的指標,名為兼容性調整獎勵(CAR),來衡量回應生成器的效果。我們在五個基本模型上的實驗表明,CAR優於幾乎所有基準。
English
Instruction tuning has been widely adopted to ensure large language models
(LLMs) follow user instructions effectively. The resulting
instruction-following capabilities of LLMs heavily rely on the instruction
datasets used for tuning. Recently, synthetic instruction datasets have emerged
as an economically viable solution to provide LLMs diverse and high-quality
instructions. However, existing approaches typically assume that larger or
stronger models are stronger teachers for instruction tuning, and hence simply
adopt these models as response generators to the synthetic instructions. In
this paper, we challenge this commonly-adopted assumption. Our extensive
experiments across five base models and twenty response generators reveal that
larger and stronger models are not necessarily stronger teachers of smaller
models. We refer to this phenomenon as the Larger Models' Paradox. We observe
that existing metrics cannot precisely predict the effectiveness of response
generators since they ignore the compatibility between teachers and base models
being fine-tuned. We thus develop a novel metric, named as
Compatibility-Adjusted Reward (CAR) to measure the effectiveness of response
generators. Our experiments across five base models demonstrate that CAR
outperforms almost all baselines.Summary
AI-Generated Summary