LaTIM：测量Mamba模型中潜在令牌间交互

摘要

状态空间模型（SSMs），如Mamba，已成为长上下文序列建模中替代Transformer的高效方案。然而，尽管其应用日益广泛，SSMs仍缺乏那些对理解和改进基于注意力的架构至关重要的可解释性工具。尽管近期研究为Mamba的内部机制提供了洞见，但它们并未明确分解各token的贡献，导致在理解Mamba如何跨层选择性处理序列方面存在空白。在本研究中，我们提出了LaTIM，一种针对Mamba-1和Mamba-2的新型token级分解方法，实现了细粒度的可解释性。我们广泛评估了该方法在机器翻译、复制及基于检索的生成等多种任务中的表现，证明了其在揭示Mamba的token间交互模式方面的有效性。

English

State space models (SSMs), such as Mamba, have emerged as an efficient alternative to transformers for long-context sequence modeling. However, despite their growing adoption, SSMs lack the interpretability tools that have been crucial for understanding and improving attention-based architectures. While recent efforts provide insights into Mamba's internal mechanisms, they do not explicitly decompose token-wise contributions, leaving gaps in understanding how Mamba selectively processes sequences across layers. In this work, we introduce LaTIM, a novel token-level decomposition method for both Mamba-1 and Mamba-2 that enables fine-grained interpretability. We extensively evaluate our method across diverse tasks, including machine translation, copying, and retrieval-based generation, demonstrating its effectiveness in revealing Mamba's token-to-token interaction patterns.

LaTIM：测量Mamba模型中潜在令牌间交互

LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models

摘要

Summary

Support

Support