초 희박 메모리 네트워크

초록

Transformer 모델의 성능은 매개변수 수와 계산 복잡성과 지수적으로 관련이 있다는 것이 널리 인정되고 있습니다. Mixture of Experts (MoE)와 같은 방법은 매개변수 수와 계산 복잡성을 분리하지만, 고비용의 메모리 액세스로 인해 추론에서 여전히 도전에 직면하고 있습니다. 본 연구는 이러한 제한 사항을 해결하기 위해 대규모의 초 희소 메모리 레이어를 통합한 UltraMem을 소개합니다. 저희 방법은 모델 성능을 유지하면서 추론 대기 시간을 크게 줄입니다. 또한 이 새로운 아키텍처의 스케일링 법칙을 조사하여, 유리한 스케일링 특성을 보이며 전통적인 모델을 능가함을 입증합니다. 실험에서는 최대 2000만 개의 메모리 슬롯을 갖는 네트워크를 훈련시켰습니다. 결과는 저희 방법이 주어진 계산 예산 내에서 최첨단 추론 속도와 모델 성능을 달성한다는 것을 보여줍니다.

English

It is widely acknowledged that the performance of Transformer models is exponentially related to their number of parameters and computational complexity. While approaches like Mixture of Experts (MoE) decouple parameter count from computational complexity, they still face challenges in inference due to high memory access costs. This work introduces UltraMem, incorporating large-scale, ultra-sparse memory layer to address these limitations. Our approach significantly reduces inference latency while maintaining model performance. We also investigate the scaling laws of this new architecture, demonstrating that it not only exhibits favorable scaling properties but outperforms traditional models. In our experiments, we train networks with up to 20 million memory slots. The results show that our method achieves state-of-the-art inference speed and model performance within a given computational budget.

초 희박 메모리 네트워크

Ultra-Sparse Memory Network

초록

Summary

Support