更聪明、更好、更快、更长：一种现代的双向编码器，用于快速、内存高效、长上下文微调和推断。

摘要

诸如BERT之类的仅编码器变压器模型在检索和分类任务中提供了很好的性能-规模权衡，相对于更大的仅解码器模型。尽管BERT是许多生产流水线的主力军，但自发布以来，对其进行Pareto改进的空间有限。本文介绍了ModernBERT，将现代模型优化引入仅编码器模型，是对旧编码器的重大Pareto改进。ModernBERT模型在训练时使用了2万亿个标记，本地序列长度为8192，展现出在大量评估中的最新结果，涵盖了各种分类任务以及不同领域（包括代码）上的单个和多向量检索。除了强大的下游性能外，ModernBERT还是速度和内存效率最高的编码器，并且专为在常见GPU上进行推断而设计。

English

Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements to BERT since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. Trained on 2 trillion tokens with a native 8192 sequence length, ModernBERT models exhibit state-of-the-art results on a large pool of evaluations encompassing diverse classification tasks and both single and multi-vector retrieval on different domains (including code). In addition to strong downstream performance, ModernBERT is also the most speed and memory efficient encoder and is designed for inference on common GPUs.

更聪明、更好、更快、更长：一种现代的双向编码器，用于快速、内存高效、长上下文微调和推断。

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

摘要

Summary

Support