超稀疏記憶網絡

摘要

廣泛認為，Transformer 模型的表現與其參數數量和計算複雜度呈指數關係。雖然像是專家混合（MoE）這樣的方法將參數數量與計算複雜度解耦，但在推論方面仍面臨高內存訪問成本的挑戰。本研究引入了 UltraMem，將大規模、超稀疏記憶層融入以應對這些限制。我們的方法顯著降低了推論延遲，同時保持模型表現。我們還研究了這種新架構的擴展規律，證明它不僅具有良好的擴展特性，而且優於傳統模型。在我們的實驗中，我們訓練了具有多達 2000 萬個記憶槽的網絡。結果顯示，我們的方法在給定的計算預算內實現了最先進的推論速度和模型表現。

English

It is widely acknowledged that the performance of Transformer models is exponentially related to their number of parameters and computational complexity. While approaches like Mixture of Experts (MoE) decouple parameter count from computational complexity, they still face challenges in inference due to high memory access costs. This work introduces UltraMem, incorporating large-scale, ultra-sparse memory layer to address these limitations. Our approach significantly reduces inference latency while maintaining model performance. We also investigate the scaling laws of this new architecture, demonstrating that it not only exhibits favorable scaling properties but outperforms traditional models. In our experiments, we train networks with up to 20 million memory slots. The results show that our method achieves state-of-the-art inference speed and model performance within a given computational budget.

超稀疏記憶網絡

Ultra-Sparse Memory Network

摘要

Summary

Support