LM2:大内存模型
LM2: Large Memory Models
February 9, 2025
作者: Jikun Kang, Wenqi Wu, Filippos Christianos, Alex J. Chan, Fraser Greenlee, George Thomas, Marvin Purtorab, Andy Toulis
cs.AI
摘要
本文介绍了大内存模型(LM2),这是一种仅包含解码器的Transformer架构,增加了一个辅助内存模块,旨在解决标准Transformer在多步推理、关系论证和合成分布在长上下文中的信息方面的局限性。LM2采用了一个内存模块,作为一个上下文表示存储库,通过交叉注意力与输入标记进行交互,并通过门控机制进行更新。为了保留Transformer的通用功能,LM2保持了原始信息流,同时整合了一个补充的内存路径。在BABILong基准测试上的实验结果表明,LM2模型在各项任务上的表现均优于记忆增强的RMT模型37.1%,以及基准Llama-3.2模型86.3%。LM2在多跳推理、数值推理和大上下文问答方面表现出卓越能力。在MMLU数据集上,它比一个预训练的普通模型提高了5.0%,表明其内存模块不会降低在通用任务上的性能。此外,在我们的分析中,我们探讨了内存的可解释性、内存模块的有效性以及测试时的行为。我们的发现强调了显式内存在增强Transformer架构中的重要性。
English
This paper introduces the Large Memory Model (LM2), a decoder-only
Transformer architecture enhanced with an auxiliary memory module that aims to
address the limitations of standard Transformers in multi-step reasoning,
relational argumentation, and synthesizing information distributed over long
contexts. The proposed LM2 incorporates a memory module that acts as a
contextual representation repository, interacting with input tokens via cross
attention and updating through gating mechanisms. To preserve the Transformers
general-purpose capabilities, LM2 maintains the original information flow while
integrating a complementary memory pathway. Experimental results on the
BABILong benchmark demonstrate that the LM2model outperforms both the
memory-augmented RMT model by 37.1% and the baseline Llama-3.2 model by 86.3%
on average across tasks. LM2 exhibits exceptional capabilities in multi-hop
inference, numerical reasoning, and large-context question-answering. On the
MMLU dataset, it achieves a 5.0% improvement over a pre-trained vanilla model,
demonstrating that its memory module does not degrade performance on general
tasks. Further, in our analysis, we explore the memory interpretability,
effectiveness of memory modules, and test-time behavior. Our findings emphasize
the importance of explicit memory in enhancing Transformer architectures.Summary
AI-Generated Summary