精彩的矩陣:結合以打造更高效和有效的基礎模型架構

Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture

December 16, 2024
作者: Jingze Shi, Bingheng Wu
cs.AI

摘要

為了使基礎模型更有效率和有效,我們的想法是結合序列轉換和狀態轉換。首先,我們證明了在狀態空間對偶算法中使用旋轉位置嵌入的可行性,這將混合二次因果自注意力和狀態空間對偶的困惑度降低了超過4%,以確保結合序列轉換統一位置編碼。其次,我們提出了動態遮罩注意力,它在更具挑戰性的多查詢聯想回憶任務中保持100%的準確性,相較於二次因果自注意力和狀態空間對偶,提高了超過150%,以確保結合序列轉換有選擇性地過濾相關信息。第三,我們設計了跨領域專家混合,使得擁有超過1024位專家的專家檢索計算速度比專家混合快8到10倍,以確保結合狀態轉換快速檢索混合。最後,我們總結了這些矩陣算法,可以構建基礎模型:奇妙矩陣,它可以成為流行模型架構的競爭對手。
English
In order to make the foundation model more efficient and effective, our idea is combining sequence transformation and state transformation. First, we prove the availability of rotary position embedding in the state space duality algorithm, which reduces the perplexity of the hybrid quadratic causal self-attention and state space duality by more than 4%, to ensure that the combining sequence transformation unifies position encoding. Second, we propose dynamic mask attention, which maintains 100% accuracy in the more challenging multi-query associative recall task, improving by more than 150% compared to quadratic causal self-attention and state space duality, to ensure that the combining sequence transformation selectively filters relevant information. Third, we design cross domain mixture of experts, which makes the computational speed of expert retrieval with more than 1024 experts 8 to 10 times faster than the mixture of experts, to ensure that the combining state transformation quickly retrieval mixture. Finally, we summarize these matrix algorithms that can form the foundation model: Wonderful Matrices, which can be a competitor to popular model architectures.

Summary

AI-Generated Summary

PDF62December 17, 2024