완전 오픈 소스 Moxin-7B 기술 보고서

초록

최근에는 대형 언어 모델(Large Language Models, LLMs)이 현저한 변화를 겪었는데, 그 인기와 능력이 급속히 상승하고 있다. 이 진화를 주도하는 것은 GPT-4와 GPT-o1과 같은 소유 LLMs로, 놀랄만한 성능과 다용도성으로 AI 커뮤니티에서 널리 주목받고 있다. 동시에 LLaMA와 Mistral과 같은 오픈 소스 LLMs는 모델을 다양한 응용 프로그램에 맞게 사용자 정의하고 배포하기 쉬운 편리함으로 LLMs의 인기가 계속 증가하도록 큰 기여를 하고 있다. 오픈 소스 LLMs는 혁신과 연구에 전례 없는 기회를 제공하지만, LLMs의 상업화는 투명성, 재현성 및 안전성에 대한 우려를 제기했다. 많은 오픈 소스 LLMs는 훈련 코드와 데이터와 같은 필수 구성 요소를 숨기는 등 기본적인 투명성 요구 사항을 충족시키지 못하며, 일부는 "오픈 소스"임에도 불구하고 제한적인 라이선스를 사용하여 LLMs에 대한 추가 혁신을 방해할 수 있다. 이 문제를 완화하기 위해, 우리는 Model Openness Framework(MOF)에 따라 개발된 완전한 오픈 소스 LLM인 Moxin 7B를 소개한다. MOF는 AI 모델을 모델 완성도와 개방성을 기반으로 평가하는 순위 분류 체계로, 오픈 사이언스, 오픈 소스, 오픈 데이터 및 오픈 액세스 원칙을 준수한다. 우리 모델은 사전 훈련 코드와 구성, 훈련 및 세밀 조정 데이터 세트, 중간 및 최종 체크포인트를 포괄적으로 공개함으로써 "오픈 사이언스"의 최고 MOF 분류 수준을 달성했다. 실험 결과, 우리 모델은 인기 있는 7B 모델과 비교하여 제로샷 평가에서 우수한 성능을 보이며, 퓨샷 평가에서도 경쟁력 있는 성과를 거뒀다.

English

Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA and Mistral, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, and some use restrictive licenses whilst claiming to be "open-source," which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed in accordance with the Model Openness Framework (MOF), a ranked classification system that evaluates AI models based on model completeness and openness, adhering to principles of open science, open source, open data, and open access. Our model achieves the highest MOF classification level of "open science" through the comprehensive release of pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints. Experiments show that our model achieves superior performance in zero-shot evaluation compared with popular 7B models and performs competitively in few-shot evaluation.

완전 오픈 소스 Moxin-7B 기술 보고서

Fully Open Source Moxin-7B Technical Report

초록

Support