LaTIM:测量Mamba模型中潜在令牌间交互
LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models
February 21, 2025
作者: Hugo Pitorro, Marcos Treviso
cs.AI
摘要
状态空间模型(SSMs),如Mamba,已成为长上下文序列建模中替代Transformer的高效方案。然而,尽管其应用日益广泛,SSMs仍缺乏那些对理解和改进基于注意力的架构至关重要的可解释性工具。尽管近期研究为Mamba的内部机制提供了洞见,但它们并未明确分解各token的贡献,导致在理解Mamba如何跨层选择性处理序列方面存在空白。在本研究中,我们提出了LaTIM,一种针对Mamba-1和Mamba-2的新型token级分解方法,实现了细粒度的可解释性。我们广泛评估了该方法在机器翻译、复制及基于检索的生成等多种任务中的表现,证明了其在揭示Mamba的token间交互模式方面的有效性。
English
State space models (SSMs), such as Mamba, have emerged as an efficient
alternative to transformers for long-context sequence modeling. However,
despite their growing adoption, SSMs lack the interpretability tools that have
been crucial for understanding and improving attention-based architectures.
While recent efforts provide insights into Mamba's internal mechanisms, they do
not explicitly decompose token-wise contributions, leaving gaps in
understanding how Mamba selectively processes sequences across layers. In this
work, we introduce LaTIM, a novel token-level decomposition method for both
Mamba-1 and Mamba-2 that enables fine-grained interpretability. We extensively
evaluate our method across diverse tasks, including machine translation,
copying, and retrieval-based generation, demonstrating its effectiveness in
revealing Mamba's token-to-token interaction patterns.Summary
AI-Generated Summary