MoM：基于记忆混合机制的线性序列建模

摘要

线性序列建模方法，如线性注意力、状态空间建模和线性循环神经网络（RNNs），通过降低训练和推理的复杂度，显著提升了效率。然而，这些方法通常将整个输入序列压缩为单一固定大小的记忆状态，导致在需要大量记忆检索的下游任务中表现欠佳。受神经科学启发，特别是大脑在维持强大长期记忆同时减轻“记忆干扰”的能力，我们提出了一种名为记忆混合（Mixture-of-Memories, MoM）的新架构。MoM利用多个独立的记忆状态，并通过路由网络将输入令牌定向至特定记忆状态。这一方法极大地提升了整体记忆容量，同时最小化了记忆干扰。因此，MoM在需要大量记忆检索的任务上表现卓越，超越了现有的线性序列建模技术。尽管引入了多个记忆状态，每个记忆状态的计算仍保持线性复杂度，使得MoM在训练时保留线性复杂度优势，在推理时保持常数复杂度。我们的实验结果表明，MoM在下游语言任务，尤其是需要大量记忆检索的任务上，显著优于当前的线性序列模型，甚至达到了与Transformer模型相媲美的性能。代码已发布于https://github.com/OpenSparseLLMs/MoM，并作为https://github.com/OpenSparseLLMs/Linear-MoE的一部分发布。

English

Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, offer significant efficiency improvements by reducing the complexity of training and inference. However, these methods typically compress the entire input sequence into a single fixed-size memory state, which leads to suboptimal performance on recall-intensive downstream tasks. Drawing inspiration from neuroscience, particularly the brain's ability to maintain robust long-term memory while mitigating "memory interference", we introduce a novel architecture called Mixture-of-Memories (MoM). MoM utilizes multiple independent memory states, with a router network directing input tokens to specific memory states. This approach greatly enhances the overall memory capacity while minimizing memory interference. As a result, MoM performs exceptionally well on recall-intensive tasks, surpassing existing linear sequence modeling techniques. Despite incorporating multiple memory states, the computation of each memory state remains linear in complexity, allowing MoM to retain the linear-complexity advantage during training, while constant-complexity during inference. Our experimental results show that MoM significantly outperforms current linear sequence models on downstream language tasks, particularly recall-intensive tasks, and even achieves performance comparable to Transformer models. The code is released at https://github.com/OpenSparseLLMs/MoM and is also released as a part of https://github.com/OpenSparseLLMs/Linear-MoE.

MoM：基于记忆混合机制的线性序列建模

MoM: Linear Sequence Modeling with Mixture-of-Memories

摘要

Summary

Support