利用长上下文语言模型重新审视上下文学习

摘要

在上下文学习（ICL）中，语言模型基于提供在其输入上下文中的示例进行预测的技术。以往，上下文窗口大小限制了可以显示的示例数量，因此示例选择技术对于识别最有效的示例集至关重要。然而，最近出现的长上下文语言模型（LCLMs）显著增加了可以包含在上下文中的示例数量，引发了一个重要问题，即在大规模示例情况下，ICL的性能是否仍然对样本选择方法敏感。为了回答这个问题，我们通过对涵盖4个任务的18个数据集进行广泛实验，在LCLMs的背景下重新审视这些方法。令人惊讶的是，我们观察到，复杂的示例选择技术并未比简单的随机样本选择方法带来显著改进。相反，我们发现LCLMs的出现已经从选择最有效示例的挑战转变为收集足够填充上下文窗口的示例的挑战。具体来说，在某些数据集中，包含所有可用示例并不能充分利用上下文窗口；然而，通过将上下文中的示例与简单的数据增强方法相结合，我们将ICL的性能显著提高了5%。

English

In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs) has significantly increased the number of examples that can be included in context, raising an important question of whether ICL performance in a many-shot regime is still sensitive to the method of sample selection. To answer this, we revisit these approaches in the context of LCLMs through extensive experiments on 18 datasets spanning 4 tasks. Surprisingly, we observe that sophisticated example selection techniques do not yield significant improvements over a simple random sample selection method. Instead, we find that the advent of LCLMs has fundamentally shifted the challenge of ICL from that of selecting the most effective examples to that of collecting sufficient examples to fill the context window. Specifically, in certain datasets, including all available examples does not fully utilize the context window; however, by augmenting the examples in context with a simple data augmentation approach, we substantially improve ICL performance by 5%.

利用长上下文语言模型重新审视上下文学习

Revisiting In-Context Learning with Long Context Language Models

摘要

Summary

Support