언어 모델에서 주소 지정을 재고하기: 맥락화된 등변 위치 부호화를 통해

초록

트랜스포머는 예측을 위해 내용 기반 및 위치 기반 주소 지정 메커니즘에 의존하지만 기존의 위치 인코딩 기술은 종종 위치 기반 주소 지정의 효과를 약화시킵니다. 현재 많은 방법은 주의 맵에서 엄격한 패턴을 강요하여 장거리 종속성을 모델링하고 다양한 작업에 적응하는 능력을 제한합니다. 또한 대부분의 위치 인코딩은 일반적인 편향으로 학습되어 데이터 집합 내 다른 인스턴스에 필요한 특수화가 부족합니다. 이를 해결하기 위해 우리는 시퀀스 내용을 계층별로 통합하여 위치 인코딩을 향상시키는 새로운 프레임워크인 TAPE(컨텍스트화된 동질 위치 임베딩)을 제안합니다. TAPE은 동적이고 컨텍스트 인식 위치 인코딩을 도입하여 전통적인 고정된 패턴의 제약을 극복합니다. 순열 및 직교 동질성을 강제함으로써 TAPE은 위치 인코딩의 안정성을 보장하고 강화 및 적응성을 향상시킵니다. 우리의 방법은 사전 훈련된 트랜스포머에 쉽게 통합될 수 있어 매개 변수 효율적인 미세 조정을 제공하며 추가 비용이 적습니다. 광범위한 실험 결과, TAPE가 기존의 위치 인코딩 기술과 비교하여 언어 모델링, 산술 추론 및 장거리 컨텍스트 검색 작업에서 우수한 성능을 달성함을 보여줍니다.

English

Transformers rely on both content-based and position-based addressing mechanisms to make predictions, but existing positional encoding techniques often diminish the effectiveness of position-based addressing. Many current methods enforce rigid patterns in attention maps, limiting the ability to model long-range dependencies and adapt to diverse tasks. Additionally, most positional encodings are learned as general biases, lacking the specialization required for different instances within a dataset. To address this, we propose conTextualized equivariAnt Position Embedding (TAPE), a novel framework that enhances positional embeddings by incorporating sequence content across layers. TAPE introduces dynamic, context-aware positional encodings, overcoming the constraints of traditional fixed patterns. By enforcing permutation and orthogonal equivariance, TAPE ensures the stability of positional encodings during updates, improving robustness and adaptability. Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead. Extensive experiments shows that TAPE achieves superior performance in language modeling, arithmetic reasoning, and long-context retrieval tasks compared to existing positional embedding techniques.

언어 모델에서 주소 지정을 재고하기: 맥락화된 등변 위치 부호화를 통해

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

초록

Support