透過情境化等變位位置編碼重新思考語言模型中的地址處理
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
January 1, 2025
作者: Jiajun Zhu, Peihao Wang, Ruisi Cai, Jason D. Lee, Pan Li, Zhangyang Wang
cs.AI
摘要
Transformer 模型依賴於基於內容和基於位置的定址機制來進行預測,但現有的位置編碼技術常常削弱了基於位置的定址的效果。許多當前的方法在注意力圖中強制實施嚴格的模式,限制了建模長距離依賴性和適應多樣任務的能力。此外,大多數位置編碼是作為通用偏差進行學習的,缺乏不同數據集內實例所需的專門化。為了解決這個問題,我們提出了一個新的框架,稱為上下文等變位置嵌入(TAPE),通過跨層次整合序列內容來增強位置嵌入。TAPE 引入了動態、上下文感知的位置編碼,克服了傳統固定模式的限制。通過實施置換和正交等變性,TAPE 確保了位置編碼在更新期間的穩定性,提高了韌性和適應性。我們的方法可以輕鬆集成到預訓練的 Transformer 模型中,提供具有最小開銷的參數高效微調。廣泛的實驗表明,與現有的位置嵌入技術相比,TAPE 在語言建模、算術推理和長篇內容檢索任務中實現了優越的性能。
English
Transformers rely on both content-based and position-based addressing
mechanisms to make predictions, but existing positional encoding techniques
often diminish the effectiveness of position-based addressing. Many current
methods enforce rigid patterns in attention maps, limiting the ability to model
long-range dependencies and adapt to diverse tasks. Additionally, most
positional encodings are learned as general biases, lacking the specialization
required for different instances within a dataset. To address this, we propose
conTextualized equivariAnt Position
Embedding (TAPE), a novel framework that enhances
positional embeddings by incorporating sequence content across layers. TAPE
introduces dynamic, context-aware positional encodings, overcoming the
constraints of traditional fixed patterns. By enforcing permutation and
orthogonal equivariance, TAPE ensures the stability of positional encodings
during updates, improving robustness and adaptability. Our method can be easily
integrated into pre-trained transformers, offering parameter-efficient
fine-tuning with minimal overhead. Extensive experiments shows that TAPE
achieves superior performance in language modeling, arithmetic reasoning, and
long-context retrieval tasks compared to existing positional embedding
techniques.Summary
AI-Generated Summary