LM2：大内存模型

摘要

本文介绍了大内存模型（LM2），这是一种仅包含解码器的Transformer架构，增加了一个辅助内存模块，旨在解决标准Transformer在多步推理、关系论证和合成分布在长上下文中的信息方面的局限性。LM2采用了一个内存模块，作为一个上下文表示存储库，通过交叉注意力与输入标记进行交互，并通过门控机制进行更新。为了保留Transformer的通用功能，LM2保持了原始信息流，同时整合了一个补充的内存路径。在BABILong基准测试上的实验结果表明，LM2模型在各项任务上的表现均优于记忆增强的RMT模型37.1%，以及基准Llama-3.2模型86.3%。LM2在多跳推理、数值推理和大上下文问答方面表现出卓越能力。在MMLU数据集上，它比一个预训练的普通模型提高了5.0%，表明其内存模块不会降低在通用任务上的性能。此外，在我们的分析中，我们探讨了内存的可解释性、内存模块的有效性以及测试时的行为。我们的发现强调了显式内存在增强Transformer架构中的重要性。

English

This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation repository, interacting with input tokens via cross attention and updating through gating mechanisms. To preserve the Transformers general-purpose capabilities, LM2 maintains the original information flow while integrating a complementary memory pathway. Experimental results on the BABILong benchmark demonstrate that the LM2model outperforms both the memory-augmented RMT model by 37.1% and the baseline Llama-3.2 model by 86.3% on average across tasks. LM2 exhibits exceptional capabilities in multi-hop inference, numerical reasoning, and large-context question-answering. On the MMLU dataset, it achieves a 5.0% improvement over a pre-trained vanilla model, demonstrating that its memory module does not degrade performance on general tasks. Further, in our analysis, we explore the memory interpretability, effectiveness of memory modules, and test-time behavior. Our findings emphasize the importance of explicit memory in enhancing Transformer architectures.

LM2：大内存模型

LM2: Large Memory Models

摘要

Summary

Support