언어 모델에서 임베딩 레이어 확장하기

초록

우리는 SCONE (Scalable, Contextualized, Offloaded, N-gram Embedding)을 제안합니다. 이는 언어 모델 성능을 향상시키기 위해 입력 임베딩 레이어를 확장하는 방법으로, 레이어 크기가 확장됨에 따라 디코딩 비용이 증가하는 것을 피하기 위해 SCONE은 원래 어휘를 유지하면서 일련의 빈도가 높은 n-그램에 대한 임베딩을 도입합니다. 이러한 임베딩은 각 입력 토큰에 대한 문맥화된 표현을 제공하며 훈련 중 별도의 모델로 학습됩니다. 추론 중에는 미리 계산되어 가속기 외부 메모리에 저장되며 추론 속도에 미미한 영향을 미칩니다. SCONE은 두 가지 새로운 확장 전략을 가능하게 하며, 캐시된 n-그램 임베딩의 수를 증가시키고 해당 임베딩을 학습하는 모델을 확장함으로써, 고정된 추론 시간 FLOPS를 유지하면서 성능을 향상시킵니다. 우리는 두 측면을 함께 확장함으로써 SCONE이 다양한 말뭉치에서 19억 개의 파라미터 기준을 능가하고 추론 시간 FLOPS의 절반만 사용하는 것을 보여줍니다.

English

We propose SCONE (Scalable, Contextualized, Offloaded, N-gram Embedding), a method for extending input embedding layers to enhance language model performance as layer size scales. To avoid increased decoding costs, SCONE retains the original vocabulary while introducing embeddings for a set of frequent n-grams. These embeddings provide contextualized representation for each input token and are learned with a separate model during training. During inference, they are precomputed and stored in off-accelerator memory with minimal impact on inference speed. SCONE enables two new scaling strategies: increasing the number of cached n-gram embeddings and scaling the model used to learn them, all while maintaining fixed inference-time FLOPS. We show that scaling both aspects allows SCONE to outperform a 1.9B parameter baseline across diverse corpora, while using only half the inference-time FLOPS.

언어 모델에서 임베딩 레이어 확장하기

Scaling Embedding Layers in Language Models

초록

Summary

Support