ChatPaper.aiChatPaper

BitNet b1.58 2B4T 技术报告

BitNet b1.58 2B4T Technical Report

April 16, 2025
作者: Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, Furu Wei
cs.AI

摘要

我们推出BitNet b1.58 2B4T,这是首个开源的、原生1比特的大型语言模型(LLM),参数规模达20亿。该模型在4万亿token的语料库上训练完成,并经过严格评估,涵盖语言理解、数学推理、编程能力及对话表现等多个基准测试。结果表明,BitNet b1.58 2B4T在性能上与同规模领先的开源全精度LLM相当,同时在计算效率上展现出显著优势,包括大幅降低的内存占用、能耗和解码延迟。为促进进一步研究与采用,模型权重已通过Hugging Face发布,并提供了适用于GPU和CPU架构的开源推理实现。
English
We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability. Our results demonstrate that BitNet b1.58 2B4T achieves performance on par with leading open-weight, full-precision LLMs of similar size, while offering significant advantages in computational efficiency, including substantially reduced memory footprint, energy consumption, and decoding latency. To facilitate further research and adoption, the model weights are released via Hugging Face along with open-source inference implementations for both GPU and CPU architectures.

Summary

AI-Generated Summary

PDF662April 17, 2025