精彩的矩阵:结合以打造更高效和有效的基础模型架构
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture
December 16, 2024
作者: Jingze Shi, Bingheng Wu
cs.AI
摘要
为了使基础模型更加高效和有效,我们的想法是结合序列转换和状态转换。首先,我们证明了在状态空间对偶算法中引入旋转位置嵌入的可行性,通过将混合二次因果自注意力和状态空间对偶的困惑度降低了超过4%,以确保结合序列转换统一位置编码。其次,我们提出了动态掩码注意力,对更具挑战性的多查询联想回溯任务保持了100%的准确性,相较于二次因果自注意力和状态空间对偶提高了超过150%,以确保结合序列转换有选择地过滤相关信息。第三,我们设计了跨领域专家混合,使得检索超过1024个专家的专家检索计算速度比专家混合快8到10倍,以确保结合状态转换快速检索混合。最后,我们总结了这些矩阵算法,可以构建基础模型:奇妙矩阵,可以成为流行模型架构的竞争对手。
English
In order to make the foundation model more efficient and effective, our idea
is combining sequence transformation and state transformation. First, we prove
the availability of rotary position embedding in the state space duality
algorithm, which reduces the perplexity of the hybrid quadratic causal
self-attention and state space duality by more than 4%, to ensure that the
combining sequence transformation unifies position encoding. Second, we propose
dynamic mask attention, which maintains 100% accuracy in the more challenging
multi-query associative recall task, improving by more than 150% compared to
quadratic causal self-attention and state space duality, to ensure that the
combining sequence transformation selectively filters relevant information.
Third, we design cross domain mixture of experts, which makes the computational
speed of expert retrieval with more than 1024 experts 8 to 10 times faster than
the mixture of experts, to ensure that the combining state transformation
quickly retrieval mixture. Finally, we summarize these matrix algorithms that
can form the foundation model: Wonderful Matrices, which can be a competitor to
popular model architectures.Summary
AI-Generated Summary