Moxin-7B技术报告全面开源
Fully Open Source Moxin-7B Technical Report
December 8, 2024
作者: Pu Zhao, Xuan Shen, Zhenglun Kong, Yixin Shen, Sung-En Chang, Timothy Rupprecht, Lei Lu, Enfu Nan, Changdi Yang, Yumei He, Xingchen Xu, Yu Huang, Wei Wang, Yue Chen, Yong He, Yanzhi Wang
cs.AI
摘要
最近,大型语言模型(LLMs)经历了重大转变,其受欢迎程度和能力迅速提升。主导这一演变的是像GPT-4和GPT-o1这样的专有LLMs,由于其出色的性能和多功能性,已经在人工智能社区引起了广泛关注。与此同时,诸如LLaMA和Mistral之类的开源LLMs,由于便于定制和部署模型到各种应用中,为LLMs日益增长的受欢迎程度做出了巨大贡献。尽管开源LLMs为创新和研究提供了前所未有的机会,但LLMs的商业化引发了关于透明度、可复现性和安全性的担忧。许多开源LLMs未能满足基本的透明度要求,因为它们隐瞒了像训练代码和数据这样的关键组件,有些使用限制性许可证,同时声称是“开源”的,这可能阻碍对LLMs的进一步创新。为了缓解这一问题,我们介绍了Moxin 7B,这是一个完全按照模型开放框架(MOF)开发的开源LLM,MOF是一个根据模型完整性和开放性评估AI模型的分级分类系统,遵循开放科学、开源、开放数据和开放获取的原则。我们的模型通过全面发布预训练代码和配置、训练和微调数据集,以及中间和最终检查点,实现了“开放科学”的最高MOF分类级别。实验表明,与流行的7B模型相比,我们的模型在零样本评估中表现出色,并在少样本评估中具有竞争力。
English
Recently, Large Language Models (LLMs) have undergone a significant
transformation, marked by a rapid rise in both their popularity and
capabilities. Leading this evolution are proprietary LLMs like GPT-4 and
GPT-o1, which have captured widespread attention in the AI community due to
their remarkable performance and versatility. Simultaneously, open-source LLMs,
such as LLaMA and Mistral, have made great contributions to the ever-increasing
popularity of LLMs due to the ease to customize and deploy the models across
diverse applications. Although open-source LLMs present unprecedented
opportunities for innovation and research, the commercialization of LLMs has
raised concerns about transparency, reproducibility, and safety. Many
open-source LLMs fail to meet fundamental transparency requirements by
withholding essential components like training code and data, and some use
restrictive licenses whilst claiming to be "open-source," which may hinder
further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a
fully open-source LLM developed in accordance with the Model Openness Framework
(MOF), a ranked classification system that evaluates AI models based on model
completeness and openness, adhering to principles of open science, open source,
open data, and open access. Our model achieves the highest MOF classification
level of "open science" through the comprehensive release of pre-training code
and configurations, training and fine-tuning datasets, and intermediate and
final checkpoints. Experiments show that our model achieves superior
performance in zero-shot evaluation compared with popular 7B models and
performs competitively in few-shot evaluation.Summary
AI-Generated Summary