傅立葉位置嵌入:增強注意力的週期擴展以實現長度泛化
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
December 23, 2024
作者: Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Ning Ding, Youbang Sun, Biqing Qi, Yuchen Fan, Xue Kai Zhu, Bowen Zhou
cs.AI
摘要
通過改進旋轉位置嵌入(RoPE)來擴展語言模型(LMs)的上下文長度已成為一種趨勢。雖然現有的研究主要解決了注意機制內RoPE的局限性,但本文在LMs的幾乎所有部分提供了分析,揭示了它們對基於RoPE的注意力在長度泛化方面的不利影響。利用離散信號處理理論,我們展示RoPE通過隱式實現非均勻離散傅立葉變換來實現週期性注意力。然而,這種周期性受到以下因素造成的頻譜損傷的影響:1)在注意力之外的線性層和激活函數;2)由時域截斷帶來的訓練不足的頻率成分。基於我們的觀察,我們提出了傅立葉位置嵌入(FoPE),它增強了注意力的頻域特性,從而改善了其週期擴展和長度泛化。FoPE構建傅立葉級數並清除破壞性頻率成分,增加了模型對頻譜損傷的韌性。在各種模型規模上進行的實驗顯示,在不同上下文窗口中,與RoPE和ALiBi相比,FoPE在針在一堆乾草任務中能夠保持更穩定的困惑度和更一致的準確性。幾項分析和消融進一步支持我們的方法和理論建模。
English
Extending the context length of Language Models (LMs) by improving Rotary
Position Embedding (RoPE) has become a trend. While existing works mainly
address RoPE's limitations within attention mechanism, this paper provides an
analysis across nearly all parts of LMs, uncovering their adverse effects on
length generalization for RoPE-based attention. Using Discrete Signal
Processing theory, we show that RoPE enables periodic attention by implicitly
achieving Non-Uniform Discrete Fourier Transform. However, this periodicity is
undermined by the spectral damage caused by: 1) linear layers and activation
functions outside of attention; 2) insufficiently trained frequency components
brought by time-domain truncation. Building on our observations, we propose
Fourier Position Embedding (FoPE), which enhances attention's frequency-domain
properties to improve both its periodic extension and length generalization.
FoPE constructs Fourier Series and zero-outs the destructive frequency
components, increasing model robustness against the spectrum damage.
Experiments across various model scales show that, within varying context
windows, FoPE can maintain a more stable perplexity and a more consistent
accuracy in a needle-in-haystack task compared to RoPE and ALiBi. Several
analyses and ablations bring further support to our method and theoretical
modeling.Summary
AI-Generated Summary