Taipan: 효율적이고 표현력 있는 상태 공간 언어 모델과 선택적 주의력

초록

자연어 처리(NLP)에서 효율적인 장거리 문맥 언어 모델링은 여전히 중요한 과제입니다. 트랜스포머가 언어 작업을 주도하고 있지만, 훈련 중 이차적 계산 복잡성과 추론 중 선형으로 증가하는 메모리 비용으로 인해 장거리 시퀀스에 어려움을 겪습니다. 최근 상태 공간 모델(SSM)인 맘바와 같은 모델은 상수 메모리 사용량을 제공하지만, 상세한 문맥 검색이 필요한 작업에서 성능이 부족합니다. 저희는 맘바-2와 선택적 주의 층(SAL)을 결합한 혁신적인 하이브리드 아키텍처인 타이판을 소개합니다. 이러한 SAL은 장거리 상호작용이 필요한 토큰을 식별하고, 중요하지 않은 특징을 제거한 후 주의 모듈을 사용하여 표현을 보강합니다. 이 접근 방식은 메모리 집약적인 작업에서 트랜스포머와 유사한 성능을 제공하면서 맘바의 효율성을 균형있게 유지합니다. 주의 예산을 제한함으로써, 타이판은 계산 효율성을 유지하면서 최대 100만 토큰의 문맥 길이에 대한 정확한 예측을 확장합니다. 저희 실험은 다양한 규모와 작업에서 타이판의 우수한 성능을 입증하며, 효율적인 장거리 문맥 언어 모델링에 대한 유망한 해결책을 제공합니다.

English

Efficient long-context language modeling remains a significant challenge in Natural Language Processing (NLP). While Transformers dominate language tasks, they struggle with long sequences due to quadratic computational complexity in training and linearly scaling memory costs during inference. Recent State Space Models (SSMs) such as Mamba offer alternatives with constant memory usage, but they underperform in tasks requiring extensive in-context retrieval. We introduce Taipan, a novel hybrid architecture that combines Mamba-2 with Selective Attention Layers (SALs). These SALs identify tokens requiring long-range interactions, remove less important features, and then augment their representations using the attention module. This approach balances Mamba's efficiency with Transformer-like performance in memory-intensive tasks. By constraining the attention budget, Taipan extends accurate predictions to context lengths of up to 1 million tokens while preserving computational efficiency. Our experiments demonstrate Taipan's superior performance across various scales and tasks, offering a promising solution for efficient long-context language modeling.

Taipan: 효율적이고 표현력 있는 상태 공간 언어 모델과 선택적 주의력

Taipan: Efficient and Expressive State Space Language Models with Selective Attention

초록

Support