超稀疏記憶網絡
Ultra-Sparse Memory Network
November 19, 2024
作者: Zihao Huang, Qiyang Min, Hongzhi Huang, Defa Zhu, Yutao Zeng, Ran Guo, Xun Zhou
cs.AI
摘要
廣泛認為,Transformer 模型的表現與其參數數量和計算複雜度呈指數關係。雖然像是專家混合(MoE)這樣的方法將參數數量與計算複雜度解耦,但在推論方面仍面臨高內存訪問成本的挑戰。本研究引入了 UltraMem,將大規模、超稀疏記憶層融入以應對這些限制。我們的方法顯著降低了推論延遲,同時保持模型表現。我們還研究了這種新架構的擴展規律,證明它不僅具有良好的擴展特性,而且優於傳統模型。在我們的實驗中,我們訓練了具有多達 2000 萬個記憶槽的網絡。結果顯示,我們的方法在給定的計算預算內實現了最先進的推論速度和模型表現。
English
It is widely acknowledged that the performance of Transformer models is
exponentially related to their number of parameters and computational
complexity. While approaches like Mixture of Experts (MoE) decouple parameter
count from computational complexity, they still face challenges in inference
due to high memory access costs. This work introduces UltraMem, incorporating
large-scale, ultra-sparse memory layer to address these limitations. Our
approach significantly reduces inference latency while maintaining model
performance. We also investigate the scaling laws of this new architecture,
demonstrating that it not only exhibits favorable scaling properties but
outperforms traditional models. In our experiments, we train networks with up
to 20 million memory slots. The results show that our method achieves
state-of-the-art inference speed and model performance within a given
computational budget.Summary
AI-Generated Summary