더 똑똑하고, 더 나은, 더 빠르고, 더 오래: 빠르고 메모리 효율적이며 긴 문맥의 파인튜닝과 추론을 위한 현대 양방향 인코더

초록

BERT와 같은 인코더 전용 트랜스포머 모델은 더 큰 디코더 전용 모델에 비해 검색 및 분류 작업에 대한 우수한 성능-크기 교환을 제공합니다. 다양한 프로덕션 파이프라인의 주역이지만, BERT 이후에는 BERT에 대한 제한적인 파레토 개선이 이루어졌습니다. 본 논문에서는 ModernBERT를 소개하여 최신 모델 최적화를 인코더 전용 모델에 적용하고 이전 인코더에 비해 주요한 파레토 개선을 제공합니다. 2조 토큰으로 학습된 ModernBERT 모델은 원시 8192 시퀀스 길이를 가지며, 다양한 분류 작업 및 코드를 포함한 다양한 도메인에서 단일 및 다중 벡터 검색에 대한 최첨단 결과를 보여줍니다. 강력한 하향식 성능 뿐만 아니라 ModernBERT는 가장 빠르고 메모리 효율적인 인코더이며 일반 GPU에서 추론을 위해 설계되었습니다.

English

Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements to BERT since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. Trained on 2 trillion tokens with a native 8192 sequence length, ModernBERT models exhibit state-of-the-art results on a large pool of evaluations encompassing diverse classification tasks and both single and multi-vector retrieval on different domains (including code). In addition to strong downstream performance, ModernBERT is also the most speed and memory efficient encoder and is designed for inference on common GPUs.

더 똑똑하고, 더 나은, 더 빠르고, 더 오래: 빠르고 메모리 효율적이며 긴 문맥의 파인튜닝과 추론을 위한 현대 양방향 인코더

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

초록

Support