MoM:基于记忆混合机制的线性序列建模
MoM: Linear Sequence Modeling with Mixture-of-Memories
February 19, 2025
作者: Jusen Du, Weigao Sun, Disen Lan, Jiaxi Hu, Yu Cheng
cs.AI
摘要
线性序列建模方法,如线性注意力、状态空间建模和线性循环神经网络(RNNs),通过降低训练和推理的复杂度,显著提升了效率。然而,这些方法通常将整个输入序列压缩为单一固定大小的记忆状态,导致在需要大量记忆检索的下游任务中表现欠佳。受神经科学启发,特别是大脑在维持强大长期记忆同时减轻“记忆干扰”的能力,我们提出了一种名为记忆混合(Mixture-of-Memories, MoM)的新架构。MoM利用多个独立的记忆状态,并通过路由网络将输入令牌定向至特定记忆状态。这一方法极大地提升了整体记忆容量,同时最小化了记忆干扰。因此,MoM在需要大量记忆检索的任务上表现卓越,超越了现有的线性序列建模技术。尽管引入了多个记忆状态,每个记忆状态的计算仍保持线性复杂度,使得MoM在训练时保留线性复杂度优势,在推理时保持常数复杂度。我们的实验结果表明,MoM在下游语言任务,尤其是需要大量记忆检索的任务上,显著优于当前的线性序列模型,甚至达到了与Transformer模型相媲美的性能。代码已发布于https://github.com/OpenSparseLLMs/MoM,并作为https://github.com/OpenSparseLLMs/Linear-MoE的一部分发布。
English
Linear sequence modeling methods, such as linear attention, state space
modeling, and linear RNNs, offer significant efficiency improvements by
reducing the complexity of training and inference. However, these methods
typically compress the entire input sequence into a single fixed-size memory
state, which leads to suboptimal performance on recall-intensive downstream
tasks. Drawing inspiration from neuroscience, particularly the brain's ability
to maintain robust long-term memory while mitigating "memory interference", we
introduce a novel architecture called Mixture-of-Memories (MoM). MoM utilizes
multiple independent memory states, with a router network directing input
tokens to specific memory states. This approach greatly enhances the overall
memory capacity while minimizing memory interference. As a result, MoM performs
exceptionally well on recall-intensive tasks, surpassing existing linear
sequence modeling techniques. Despite incorporating multiple memory states, the
computation of each memory state remains linear in complexity, allowing MoM to
retain the linear-complexity advantage during training, while
constant-complexity during inference. Our experimental results show that MoM
significantly outperforms current linear sequence models on downstream language
tasks, particularly recall-intensive tasks, and even achieves performance
comparable to Transformer models. The code is released at
https://github.com/OpenSparseLLMs/MoM and is also released as a part of
https://github.com/OpenSparseLLMs/Linear-MoE.Summary
AI-Generated Summary