傅立叶位置嵌入:增强注意力的周期性扩展以实现长度泛化。

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

December 23, 2024
作者: Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Ning Ding, Youbang Sun, Biqing Qi, Yuchen Fan, Xue Kai Zhu, Bowen Zhou
cs.AI

摘要

通过改进旋转位置嵌入(RoPE)来扩展语言模型(LMs)的上下文长度已经成为一种趋势。尽管现有研究主要解决了RoPE在注意力机制中的局限性,但本文提供了对LMs几乎所有部分的分析,揭示了它们对基于RoPE注意力的长度泛化的不利影响。利用离散信号处理理论,我们展示了RoPE通过隐式实现非均匀离散傅立叶变换来实现周期性注意力。然而,这种周期性受到了谱损伤的影响,其原因包括:1)注意力之外的线性层和激活函数;2)由时域截断带来的训练不足的频率分量。基于我们的观察,我们提出了傅立叶位置嵌入(FoPE),它增强了注意力的频域特性,以改善其周期性扩展和长度泛化。FoPE构建傅立叶级数,并将破坏性频率分量归零,增加模型对频谱损伤的鲁棒性。跨越各种模型规模的实验表明,在不同上下文窗口中,与RoPE和ALiBi相比,FoPE在一项大海捞针任务中能够保持更稳定的困惑度和更一致的准确性。几项分析和消融进一步支持了我们的方法和理论建模。
English
Extending the context length of Language Models (LMs) by improving Rotary Position Embedding (RoPE) has become a trend. While existing works mainly address RoPE's limitations within attention mechanism, this paper provides an analysis across nearly all parts of LMs, uncovering their adverse effects on length generalization for RoPE-based attention. Using Discrete Signal Processing theory, we show that RoPE enables periodic attention by implicitly achieving Non-Uniform Discrete Fourier Transform. However, this periodicity is undermined by the spectral damage caused by: 1) linear layers and activation functions outside of attention; 2) insufficiently trained frequency components brought by time-domain truncation. Building on our observations, we propose Fourier Position Embedding (FoPE), which enhances attention's frequency-domain properties to improve both its periodic extension and length generalization. FoPE constructs Fourier Series and zero-outs the destructive frequency components, increasing model robustness against the spectrum damage. Experiments across various model scales show that, within varying context windows, FoPE can maintain a more stable perplexity and a more consistent accuracy in a needle-in-haystack task compared to RoPE and ALiBi. Several analyses and ablations bring further support to our method and theoretical modeling.

Summary

AI-Generated Summary

PDF3926December 25, 2024