透過微調來預測新興能力
Predicting Emergent Capabilities by Finetuning
November 25, 2024
作者: Charlie Snell, Eric Wallace, Dan Klein, Sergey Levine
cs.AI
摘要
在現代大型語言模型(LLM)擴展中的一個基本開放挑戰是對新興能力的理解不足。特別是,已知語言模型預訓練損失在計算量的函數中具有高度可預測性。然而,下游能力卻遠不及預測,有時甚至表現出新興性的跳躍,這使得預測未來模型的能力變得具有挑戰性。在這項研究中,我們首先提出了新興性預測的任務:在當前具有隨機少量樣本準確度的LLM的情況下,我們能否預測未來模型(GPT-N+1)在該任務上是否會具有非平凡的準確度?然後,我們為這個問題發現了一個簡單的見解:對給定任務進行LLM微調可以將新興性發生的規模點轉向能力較差的模型。為了實現這一見解,我們可以對LLM進行不同數量的數據微調,並擬合一個預測新興性何時發生的參數函數(即“新興性定律”)。我們使用四個標準的自然語言處理基準來驗證這種方法,其中大規模的開源LLM已經展現出新興性(MMLU、GSM8K、CommonsenseQA和CoLA)。僅使用小規模LLM,我們發現,在某些情況下,我們可以準確預測使用多達4倍計算量訓練的模型是否已經出現新興性。最後,我們提出了兩個實際應用新興性預測的案例研究。
English
A fundamental open challenge in modern LLM scaling is the lack of
understanding around emergent capabilities. In particular, language model
pretraining loss is known to be highly predictable as a function of compute.
However, downstream capabilities are far less predictable -- sometimes even
exhibiting emergent jumps -- which makes it challenging to anticipate the
capabilities of future models. In this work, we first pose the task of
emergence prediction: given access to current LLMs that have random few-shot
accuracy on a task, can we predict whether future models (GPT-N+1) will have
non-trivial accuracy on that task? We then discover a simple insight for this
problem: finetuning LLMs on a given task can shift the point in scaling at
which emergence occurs towards less capable models. To operationalize this
insight, we can finetune LLMs with varying amounts of data and fit a parametric
function that predicts when emergence will occur (i.e., "emergence laws"). We
validate this approach using four standard NLP benchmarks where large-scale
open-source LLMs already demonstrate emergence (MMLU, GSM8K, CommonsenseQA, and
CoLA). Using only small-scale LLMs, we find that, in some cases, we can
accurately predict whether models trained with up to 4x more compute have
emerged. Finally, we present a case study of two realistic uses for emergence
prediction.Summary
AI-Generated Summary