通過高效計算的模型階梯建立任務擴展定律

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

December 5, 2024
作者: Akshita Bhagia, Jiacheng Liu, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi
cs.AI

摘要

我們發展了任務擴展定律和模型梯度,以預測預訓練語言模型(LMs)在過度訓練情況下的個別任務表現。標準的語言建模損失的冪定律無法準確地模擬任務表現。因此,我們利用了一種兩步預測方法:首先使用模型和數據大小來預測特定任務的損失,然後使用該任務損失來預測任務表現。我們訓練了一組小規模的“梯度”模型,收集數據點以擬合兩個預測步驟的參數化函數,並為兩個目標模型進行預測:一個訓練到4T標記的7B模型和一個訓練到5T標記的13B模型。訓練梯度模型僅耗費目標模型計算量的1%。在四個以排名分類格式編寫的多選任務中,我們可以預測兩個目標模型的準確度,誤差範圍在2個絕對值內。我們在另外四個任務上有較高的預測誤差(平均絕對誤差為6.9),發現這些任務通常具有較高的任務指標變異性。我們還發現,使用更少的計算量來訓練更少的梯度模型往往會使預測變差。最後,我們通過實證表明,我們的設計選擇和兩步方法在建立擴展定律方面具有卓越的性能。
English
We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting. Standard power laws for language modeling loss cannot accurately model task performance. Therefore, we leverage a two-step prediction approach: first use model and data size to predict a task-specific loss, and then use this task loss to predict task performance. We train a set of small-scale "ladder" models, collect data points to fit the parameterized functions of the two prediction steps, and make predictions for two target models: a 7B model trained to 4T tokens and a 13B model trained to 5T tokens. Training the ladder models only costs 1% of the compute used for the target models. On four multiple-choice tasks written in ranked classification format, we can predict the accuracy of both target models within 2 points of absolute error. We have higher prediction error on four other tasks (average absolute error 6.9) and find that these are often tasks with higher variance in task metrics. We also find that using less compute to train fewer ladder models tends to deteriorate predictions. Finally, we empirically show that our design choices and the two-step approach lead to superior performance in establishing scaling laws.

Summary

AI-Generated Summary

PDF22December 7, 2024