ICL密碼：通過替換密碼量化上下文學習中的“學習”程度

摘要

近期研究表明，上下文學習（In-Context Learning, ICL）以雙重模式運作，即任務檢索（從預訓練中回憶已學習的模式）和任務學習（在推理時通過示範進行「學習」）。然而，區分這兩種模式仍是一個具有挑戰性的目標。我們引入了ICL CIPHERS，這是一類基於經典密碼學中替換密碼的任務重構方法。在此方法中，上下文輸入中的一部分詞彙被替換為其他（不相關的）詞彙，使得英文句子在人眼看來難以理解。然而，設計上存在一個潛在的、固定的替換模式，使其可逆。這種雙射（可逆）密碼確保了任務在抽象意義上仍是一個定義良好的任務，儘管經過了轉換。一個有趣的問題是，大型語言模型（LLMs）能否解決具有雙射映射的ICL CIPHERS，這需要解碼潛在的密碼。我們展示了LLMs在解決具有雙射映射的ICL CIPHERS時，比非雙射（不可逆）基線表現更好，為量化ICL中的「學習」提供了一種新穎的方法。儘管這一差距較小，但在四個數據集和六個模型上均保持一致。最後，我們檢視了LLMs的內部表徵，並發現了它們解碼加密輸入能力的證據。

English

Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ``learning'' from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs can solve ICL CIPHERS with a BIJECTIVE mapping, which requires deciphering the latent cipher. We show that LLMs are better at solving ICL CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline, providing a novel approach to quantify ``learning'' in ICL. While this gap is small, it is consistent across the board on four datasets and six models. Finally, we examine LLMs' internal representations and identify evidence in their ability to decode the ciphered inputs.

ICL密碼：通過替換密碼量化上下文學習中的“學習”程度

ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers

摘要

Summary

Support

Support