更聪明、更好、更快、更长:一种现代的双向编码器,用于快速、内存高效、长上下文微调和推断。
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
December 18, 2024
作者: Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli
cs.AI
摘要
诸如BERT之类的仅编码器变压器模型在检索和分类任务中提供了很好的性能-规模权衡,相对于更大的仅解码器模型。尽管BERT是许多生产流水线的主力军,但自发布以来,对其进行Pareto改进的空间有限。本文介绍了ModernBERT,将现代模型优化引入仅编码器模型,是对旧编码器的重大Pareto改进。ModernBERT模型在训练时使用了2万亿个标记,本地序列长度为8192,展现出在大量评估中的最新结果,涵盖了各种分类任务以及不同领域(包括代码)上的单个和多向量检索。除了强大的下游性能外,ModernBERT还是速度和内存效率最高的编码器,并且专为在常见GPU上进行推断而设计。
English
Encoder-only transformer models such as BERT offer a great performance-size
tradeoff for retrieval and classification tasks with respect to larger
decoder-only models. Despite being the workhorse of numerous production
pipelines, there have been limited Pareto improvements to BERT since its
release. In this paper, we introduce ModernBERT, bringing modern model
optimizations to encoder-only models and representing a major Pareto
improvement over older encoders. Trained on 2 trillion tokens with a native
8192 sequence length, ModernBERT models exhibit state-of-the-art results on a
large pool of evaluations encompassing diverse classification tasks and both
single and multi-vector retrieval on different domains (including code). In
addition to strong downstream performance, ModernBERT is also the most speed
and memory efficient encoder and is designed for inference on common GPUs.Summary
AI-Generated Summary