LLM显微镜：揭示标点符号在Transformer上下文记忆中的隐秘作用

摘要

我们提出了量化大型语言模型（LLMs）如何编码和存储上下文信息的方法，揭示了通常被视为次要的标记（如限定词、标点符号）承载着出乎意料的高上下文价值。值得注意的是，移除这些标记——尤其是停用词、冠词和逗号——会持续降低模型在MMLU和BABILong-4k任务上的表现，即便仅移除无关的标记也是如此。我们的分析还显示，上下文化与线性度之间存在强相关性，其中线性度衡量了从一层嵌入到下一层嵌入的变换能被单一线性映射近似表示的程度。这些发现强调了填充标记在维持上下文中的潜在重要性。为进一步探索，我们推出了LLM-Microscope，一个开源工具包，用于评估标记级别的非线性、检验上下文记忆、可视化中间层的贡献（通过改进的Logit Lens方法），以及测量表示的内在维度。该工具包揭示了看似微不足道的标记对于长距离理解的关键作用。

English

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens -- especially stopwords, articles, and commas -- consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis also shows a strong correlation between contextualization and linearity, where linearity measures how closely the transformation from one layer's embeddings to the next can be approximated by a single linear mapping. These findings underscore the hidden importance of filler tokens in maintaining context. For further exploration, we present LLM-Microscope, an open-source toolkit that assesses token-level nonlinearity, evaluates contextual memory, visualizes intermediate layer contributions (via an adapted Logit Lens), and measures the intrinsic dimensionality of representations. This toolkit illuminates how seemingly trivial tokens can be critical for long-range understanding.

LLM显微镜：揭示标点符号在Transformer上下文记忆中的隐秘作用

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

摘要

Summary

Support