문맥 속 학습과 오큼의 면도날

초록

기계 학습의 목표는 일반화입니다. No Free Lunch 이론은 일반화에 대한 이론적 보증을 얻을 수 없다는 것을 명시하지만, 실제로는 훈련 데이터를 가장 잘 설명하는 간단한 모델이 가장 잘 일반화되는 것을 관찰합니다: Occam's razor라 불리는 원리입니다. 간단한 모델이 필요하다는 점에도 불구하고, 대부분의 현재의 기계 학습 접근 방식은 훈련 오차를 최소화하며, 최대한 간단함을 규제나 구조 설계를 통해 간접적으로 촉진합니다. 여기서 우리는 Occam's razor와 맥락 학습 사이의 연결을 제시합니다: Transformer와 같은 일부 시퀀스 모델의 신흥 능력인 추론 시 과거 관측으로부터 학습하는 능력입니다. 특히, 우리는 맥락 학습자를 훈련하는 데 사용되는 다음 토큰 예측 손실이 사전적 코딩이라 불리는 데이터 압축 기술과 직접 동등하며, 이 손실을 최소화하는 것이 훈련 오차와 함께 모델의 복잡성을 함께 최소화하는 것을 의미한다는 것을 보여줍니다. 우리의 이론과 실험 결과는 맥락 학습에 대한 규범적 설명을 제공할 뿐만 아니라 현재의 맥락 학습 방법의 단점을 명확히 하고, 개선 방안을 제안합니다. 우리는 이를 지원하기 위해 사용한 코드를 https://github.com/3rdCore/PrequentialCode 에서 제공합니다.

English

The goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

문맥 속 학습과 오큼의 면도날

In-context learning and Occam's razor

초록

Support