重新探討具有長上下文語言模型的上下文學習

Revisiting In-Context Learning with Long Context Language Models

December 22, 2024
作者: Jinheon Baek, Sun Jae Lee, Prakhar Gupta, Geunseob, Oh, Siddharth Dalmia, Prateek Kolhar
cs.AI

摘要

在上下文學習(ICL)中,語言模型根據輸入上下文中提供的示例進行預測。以往,上下文窗口大小對可以展示的示例數量施加了限制,使示例選擇技術對於識別最大效果示例集至關重要。然而,最近出現的長上下文語言模型(LCLMs)顯著增加了可以包含在上下文中的示例數量,引發了一個重要問題,即在多樣本情況下,ICL的表現是否仍對樣本選擇方法敏感。為了回答這個問題,我們通過對涵蓋4個任務的18個數據集進行廣泛實驗,重新審視了這些方法在LCLMs背景下的應用。令人驚訝的是,我們觀察到,複雜的示例選擇技術並未比簡單的隨機樣本選擇方法帶來顯著改進。相反,我們發現LCLMs的出現從選擇最有效示例的挑戰基本上轉變為收集足夠的示例以填充上下文窗口。具體而言,在某些數據集中,包含所有可用示例並未充分利用上下文窗口;然而,通過將上下文中的示例與簡單的數據增強方法相結合,我們將ICL的性能顯著提高了5%。
English
In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs) has significantly increased the number of examples that can be included in context, raising an important question of whether ICL performance in a many-shot regime is still sensitive to the method of sample selection. To answer this, we revisit these approaches in the context of LCLMs through extensive experiments on 18 datasets spanning 4 tasks. Surprisingly, we observe that sophisticated example selection techniques do not yield significant improvements over a simple random sample selection method. Instead, we find that the advent of LCLMs has fundamentally shifted the challenge of ICL from that of selecting the most effective examples to that of collecting sufficient examples to fill the context window. Specifically, in certain datasets, including all available examples does not fully utilize the context window; however, by augmenting the examples in context with a simple data augmentation approach, we substantially improve ICL performance by 5%.

Summary

AI-Generated Summary

PDF292December 24, 2024