通过上下文等变位置编码重新思考语言模型中的寻址
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
January 1, 2025
作者: Jiajun Zhu, Peihao Wang, Ruisi Cai, Jason D. Lee, Pan Li, Zhangyang Wang
cs.AI
摘要
Transformer模型依赖于基于内容和基于位置的寻址机制来进行预测,但现有的位置编码技术常常削弱了基于位置的寻址的有效性。许多当前方法在注意力图中强加了刚性模式,限制了对长距离依赖关系的建模能力,并且难以适应不同任务。此外,大多数位置编码作为通用偏差进行学习,缺乏数据集中不同实例所需的专业化。为了解决这个问题,我们提出了一种新颖的框架,即上下文等变位置嵌入(TAPE),通过在各层中整合序列内容来增强位置嵌入。TAPE引入了动态的、上下文感知的位置编码,克服了传统固定模式的约束。通过强制置换和正交等变性,TAPE确保了位置编码在更新过程中的稳定性,提高了鲁棒性和适应性。我们的方法可以轻松集成到预训练的transformers中,提供参数高效的微调,且开销最小。大量实验证明,与现有的位置嵌入技术相比,TAPE在语言建模、算术推理和长文本检索任务中实现了更优越的性能。
English
Transformers rely on both content-based and position-based addressing
mechanisms to make predictions, but existing positional encoding techniques
often diminish the effectiveness of position-based addressing. Many current
methods enforce rigid patterns in attention maps, limiting the ability to model
long-range dependencies and adapt to diverse tasks. Additionally, most
positional encodings are learned as general biases, lacking the specialization
required for different instances within a dataset. To address this, we propose
conTextualized equivariAnt Position
Embedding (TAPE), a novel framework that enhances
positional embeddings by incorporating sequence content across layers. TAPE
introduces dynamic, context-aware positional encodings, overcoming the
constraints of traditional fixed patterns. By enforcing permutation and
orthogonal equivariance, TAPE ensures the stability of positional encodings
during updates, improving robustness and adaptability. Our method can be easily
integrated into pre-trained transformers, offering parameter-efficient
fine-tuning with minimal overhead. Extensive experiments shows that TAPE
achieves superior performance in language modeling, arithmetic reasoning, and
long-context retrieval tasks compared to existing positional embedding
techniques.Summary
AI-Generated Summary