ICL密碼:通過替換密碼量化上下文學習中的“學習”程度
ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers
April 28, 2025
作者: Zhouxiang Fang, Aayush Mishra, Muhan Gao, Anqi Liu, Daniel Khashabi
cs.AI
摘要
近期研究表明,上下文學習(In-Context Learning, ICL)以雙重模式運作,即任務檢索(從預訓練中回憶已學習的模式)和任務學習(在推理時通過示範進行「學習」)。然而,區分這兩種模式仍是一個具有挑戰性的目標。我們引入了ICL CIPHERS,這是一類基於經典密碼學中替換密碼的任務重構方法。在此方法中,上下文輸入中的一部分詞彙被替換為其他(不相關的)詞彙,使得英文句子在人眼看來難以理解。然而,設計上存在一個潛在的、固定的替換模式,使其可逆。這種雙射(可逆)密碼確保了任務在抽象意義上仍是一個定義良好的任務,儘管經過了轉換。一個有趣的問題是,大型語言模型(LLMs)能否解決具有雙射映射的ICL CIPHERS,這需要解碼潛在的密碼。我們展示了LLMs在解決具有雙射映射的ICL CIPHERS時,比非雙射(不可逆)基線表現更好,為量化ICL中的「學習」提供了一種新穎的方法。儘管這一差距較小,但在四個數據集和六個模型上均保持一致。最後,我們檢視了LLMs的內部表徵,並發現了它們解碼加密輸入能力的證據。
English
Recent works have suggested that In-Context Learning (ICL) operates in dual
modes, i.e. task retrieval (remember learned patterns from pre-training) and
task learning (inference-time ``learning'' from demonstrations). However,
disentangling these the two modes remains a challenging goal. We introduce ICL
CIPHERS, a class of task reformulations based on substitution ciphers borrowed
from classic cryptography. In this approach, a subset of tokens in the
in-context inputs are substituted with other (irrelevant) tokens, rendering
English sentences less comprehensible to human eye. However, by design, there
is a latent, fixed pattern to this substitution, making it reversible. This
bijective (reversible) cipher ensures that the task remains a well-defined task
in some abstract sense, despite the transformations. It is a curious question
if LLMs can solve ICL CIPHERS with a BIJECTIVE mapping, which requires
deciphering the latent cipher. We show that LLMs are better at solving ICL
CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline,
providing a novel approach to quantify ``learning'' in ICL. While this gap is
small, it is consistent across the board on four datasets and six models.
Finally, we examine LLMs' internal representations and identify evidence in
their ability to decode the ciphered inputs.Summary
AI-Generated Summary