傅立叶位置嵌入:增强注意力的周期性扩展以实现长度泛化。
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
December 23, 2024
作者: Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Ning Ding, Youbang Sun, Biqing Qi, Yuchen Fan, Xue Kai Zhu, Bowen Zhou
cs.AI
摘要
通过改进旋转位置嵌入(RoPE)来扩展语言模型(LMs)的上下文长度已经成为一种趋势。尽管现有研究主要解决了RoPE在注意力机制中的局限性,但本文提供了对LMs几乎所有部分的分析,揭示了它们对基于RoPE注意力的长度泛化的不利影响。利用离散信号处理理论,我们展示了RoPE通过隐式实现非均匀离散傅立叶变换来实现周期性注意力。然而,这种周期性受到了谱损伤的影响,其原因包括:1)注意力之外的线性层和激活函数;2)由时域截断带来的训练不足的频率分量。基于我们的观察,我们提出了傅立叶位置嵌入(FoPE),它增强了注意力的频域特性,以改善其周期性扩展和长度泛化。FoPE构建傅立叶级数,并将破坏性频率分量归零,增加模型对频谱损伤的鲁棒性。跨越各种模型规模的实验表明,在不同上下文窗口中,与RoPE和ALiBi相比,FoPE在一项大海捞针任务中能够保持更稳定的困惑度和更一致的准确性。几项分析和消融进一步支持了我们的方法和理论建模。
English
Extending the context length of Language Models (LMs) by improving Rotary
Position Embedding (RoPE) has become a trend. While existing works mainly
address RoPE's limitations within attention mechanism, this paper provides an
analysis across nearly all parts of LMs, uncovering their adverse effects on
length generalization for RoPE-based attention. Using Discrete Signal
Processing theory, we show that RoPE enables periodic attention by implicitly
achieving Non-Uniform Discrete Fourier Transform. However, this periodicity is
undermined by the spectral damage caused by: 1) linear layers and activation
functions outside of attention; 2) insufficiently trained frequency components
brought by time-domain truncation. Building on our observations, we propose
Fourier Position Embedding (FoPE), which enhances attention's frequency-domain
properties to improve both its periodic extension and length generalization.
FoPE constructs Fourier Series and zero-outs the destructive frequency
components, increasing model robustness against the spectrum damage.
Experiments across various model scales show that, within varying context
windows, FoPE can maintain a more stable perplexity and a more consistent
accuracy in a needle-in-haystack task compared to RoPE and ALiBi. Several
analyses and ablations bring further support to our method and theoretical
modeling.Summary
AI-Generated Summary