MoM: 혼합 메모리 기반 선형 시퀀스 모델링

초록

선형 어텐션, 상태 공간 모델링, 선형 RNN과 같은 선형 시퀀스 모델링 방법은 학습 및 추론의 복잡도를 줄여 상당한 효율성 개선을 제공합니다. 그러나 이러한 방법들은 일반적으로 전체 입력 시퀀스를 단일 고정 크기의 메모리 상태로 압축하므로, 회상 중심의 다운스트림 작업에서 최적의 성능을 발휘하지 못합니다. 우리는 신경과학, 특히 뇌가 "메모리 간섭"을 완화하면서도 강력한 장기 기억을 유지하는 능력에서 영감을 받아, Mixture-of-Memories (MoM)라는 새로운 아키텍처를 제안합니다. MoM은 여러 독립적인 메모리 상태를 활용하며, 라우터 네트워크가 입력 토큰을 특정 메모리 상태로 안내합니다. 이 접근 방식은 메모리 간섭을 최소화하면서도 전체 메모리 용량을 크게 향상시킵니다. 그 결과, MoM은 회상 중심 작업에서 기존의 선형 시퀀스 모델링 기술을 능가하는 탁월한 성능을 보여줍니다. 여러 메모리 상태를 통합했음에도 불구하고, 각 메모리 상태의 계산은 선형 복잡도를 유지하므로, MoM은 학습 중에는 선형 복잡도의 이점을 유지하고 추론 중에는 상수 복잡도를 유지할 수 있습니다. 우리의 실험 결과는 MoM이 다운스트림 언어 작업, 특히 회상 중심 작업에서 현재의 선형 시퀀스 모델을 크게 능가하며, Transformer 모델과도 비슷한 성능을 달성함을 보여줍니다. 코드는 https://github.com/OpenSparseLLMs/MoM 및 https://github.com/OpenSparseLLMs/Linear-MoE의 일부로 공개되었습니다.

English

Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, offer significant efficiency improvements by reducing the complexity of training and inference. However, these methods typically compress the entire input sequence into a single fixed-size memory state, which leads to suboptimal performance on recall-intensive downstream tasks. Drawing inspiration from neuroscience, particularly the brain's ability to maintain robust long-term memory while mitigating "memory interference", we introduce a novel architecture called Mixture-of-Memories (MoM). MoM utilizes multiple independent memory states, with a router network directing input tokens to specific memory states. This approach greatly enhances the overall memory capacity while minimizing memory interference. As a result, MoM performs exceptionally well on recall-intensive tasks, surpassing existing linear sequence modeling techniques. Despite incorporating multiple memory states, the computation of each memory state remains linear in complexity, allowing MoM to retain the linear-complexity advantage during training, while constant-complexity during inference. Our experimental results show that MoM significantly outperforms current linear sequence models on downstream language tasks, particularly recall-intensive tasks, and even achieves performance comparable to Transformer models. The code is released at https://github.com/OpenSparseLLMs/MoM and is also released as a part of https://github.com/OpenSparseLLMs/Linear-MoE.

MoM: 혼합 메모리 기반 선형 시퀀스 모델링

MoM: Linear Sequence Modeling with Mixture-of-Memories

초록

Support