ChatPaper.aiChatPaper

BenTo:具有上下文可轉移性的基準任務簡化

BenTo: Benchmark Task Reduction with In-Context Transferability

October 17, 2024
作者: Hongyu Zhao, Ming Li, Lichao Sun, Tianyi Zhou
cs.AI

摘要

評估大型語言模型(LLMs)成本高昂:需要在各種任務的大規模基準測試中生成和檢查LLM輸出。本文探討如何有效地減少用於評估LLMs的任務,而不影響評估質量。我們的研究顯示,任務的可轉移性和相關性提供了關鍵信息,可通過優化設施位置函數來識別最具代表性的任務子集。我們提出了一種實際高效的度量標準,用於通過上下文學習(ICL)估算兩個任務之間的可轉移性。通過分析成對的可轉移性,我們可以將現代LLM基準測試(例如MMLU或FLAN)中的任務減少到5%,同時僅對原始基準測試的評估造成不到4%的差異。與先前的研究相比,我們的方法無需訓練,無需梯度,僅需要ICL,極其高效。
English
Evaluating large language models (LLMs) is costly: it requires the generation and examination of LLM outputs on a large-scale benchmark of various tasks. This paper investigates how to efficiently reduce the tasks used to benchmark LLMs without affecting the evaluation quality. Our study reveals that task transferability and relevance provide critical information to identify the most representative subset of tasks via optimizing a facility location function. We propose a practically efficient metric for estimating the transferability between two tasks via in-context learning (ICL). By analyzing the pairwise transferability, we can reduce tasks in a modern LLM benchmark (e.g., MMLU or FLAN) to 5% while inducing only a <4% difference to the evaluation on the original benchmark. Compared to prior works, our method is training-free, gradient-free, and highly efficient requiring ICL only.

Summary

AI-Generated Summary

PDF203November 16, 2024