Moxin-7B 完全開源技術報告
Fully Open Source Moxin-7B Technical Report
December 8, 2024
作者: Pu Zhao, Xuan Shen, Zhenglun Kong, Yixin Shen, Sung-En Chang, Timothy Rupprecht, Lei Lu, Enfu Nan, Changdi Yang, Yumei He, Xingchen Xu, Yu Huang, Wei Wang, Yue Chen, Yong He, Yanzhi Wang
cs.AI
摘要
近來,大型語言模型(LLMs)經歷了顯著的轉變,其受歡迎程度和功能都迅速提升。主導這一演進的是像GPT-4和GPT-o1這樣的專有LLMs,由於其卓越的性能和多功能性,在人工智慧社區中引起廣泛關注。與此同時,像LLaMA和Mistral這樣的開源LLMs,由於易於定制和部署模型到各種應用中,對LLMs日益增長的受歡迎度做出了巨大貢獻。儘管開源LLMs為創新和研究提供了前所未有的機會,但LLMs的商業化也引發了關於透明度、可重現性和安全性的擔憂。許多開源LLMs未能滿足基本的透明度要求,因為他們隱瞞了訓練代碼和數據等基本組件,有些使用限制性許可證,同時聲稱是“開源”的,這可能阻礙對LLMs的進一步創新。為了解決這個問題,我們介紹了Moxin 7B,這是一個完全按照模型開放框架(MOF)開發的開源LLM,MOF是一個根據模型完整性和開放性評估AI模型的排名分類系統,遵循開放科學、開源、開放數據和開放訪問的原則。我們的模型通過全面公開預訓練代碼和配置、訓練和微調數據集,以及中間和最終檢查點,實現了“開放科學”的最高MOF分類級別。實驗表明,我們的模型在零-shot評估中相比流行的7B模型表現出優越性能,並在少-shot評估中具有競爭力。
English
Recently, Large Language Models (LLMs) have undergone a significant
transformation, marked by a rapid rise in both their popularity and
capabilities. Leading this evolution are proprietary LLMs like GPT-4 and
GPT-o1, which have captured widespread attention in the AI community due to
their remarkable performance and versatility. Simultaneously, open-source LLMs,
such as LLaMA and Mistral, have made great contributions to the ever-increasing
popularity of LLMs due to the ease to customize and deploy the models across
diverse applications. Although open-source LLMs present unprecedented
opportunities for innovation and research, the commercialization of LLMs has
raised concerns about transparency, reproducibility, and safety. Many
open-source LLMs fail to meet fundamental transparency requirements by
withholding essential components like training code and data, and some use
restrictive licenses whilst claiming to be "open-source," which may hinder
further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a
fully open-source LLM developed in accordance with the Model Openness Framework
(MOF), a ranked classification system that evaluates AI models based on model
completeness and openness, adhering to principles of open science, open source,
open data, and open access. Our model achieves the highest MOF classification
level of "open science" through the comprehensive release of pre-training code
and configurations, training and fine-tuning datasets, and intermediate and
final checkpoints. Experiments show that our model achieves superior
performance in zero-shot evaluation compared with popular 7B models and
performs competitively in few-shot evaluation.Summary
AI-Generated Summary