긴 문맥 언어 모델을 활용한 문맥 학습 재방문

초록

In-Context Learning (ICL)은 언어 모델이 입력 문맥에서 제공된 예시를 기반으로 예측을 수행하는 기술입니다. 이전에는 문맥 창 크기가 제한으로 작용하여 보여줄 수 있는 예시의 수에 한계가 있었기 때문에, 예시 선택 기술이 매우 중요했습니다. 그러나 최근에 등장한 Long Context Language Models (LCLMs)의 등장으로 문맥에 포함될 수 있는 예시의 수가 크게 증가하면서, 다수의 예시를 다루는 경우에 ICL 성능이 여전히 샘플 선택 방법에 민감한지에 대한 중요한 질문이 제기되었습니다. 이에 대한 답변을 얻기 위해, 우리는 LCLMs의 맥락에서 이러한 접근 방식을 재방문하며 4가지 작업을 포함한 18개 데이터셋에 대한 포괄적인 실험을 통해 이를 조사했습니다. 놀랍게도, 세련된 예시 선택 기술이 간단한 무작위 샘플 선택 방법보다 현저한 향상을 가져오지 않음을 관찰했습니다. 대신, LCLMs의 등장으로 인해 ICL의 과제가 가장 효과적인 예시를 선택하는 것에서 문맥 창을 채우기 위한 충분한 예시를 수집하는 것으로 근본적으로 변화되었음을 발견했습니다. 특히, 특정 데이터셋에서는 모든 가능한 예시를 포함해도 문맥 창을 완전히 활용하지 못하는 것으로 나타났으나, 간단한 데이터 증강 접근 방식을 사용하여 문맥에서 예시를 보강함으로써 ICL 성능을 5% 향상시킬 수 있었습니다.

English

In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs) has significantly increased the number of examples that can be included in context, raising an important question of whether ICL performance in a many-shot regime is still sensitive to the method of sample selection. To answer this, we revisit these approaches in the context of LCLMs through extensive experiments on 18 datasets spanning 4 tasks. Surprisingly, we observe that sophisticated example selection techniques do not yield significant improvements over a simple random sample selection method. Instead, we find that the advent of LCLMs has fundamentally shifted the challenge of ICL from that of selecting the most effective examples to that of collecting sufficient examples to fill the context window. Specifically, in certain datasets, including all available examples does not fully utilize the context window; however, by augmenting the examples in context with a simple data augmentation approach, we substantially improve ICL performance by 5%.

긴 문맥 언어 모델을 활용한 문맥 학습 재방문

Revisiting In-Context Learning with Long Context Language Models

초록

Summary

Support

Support