Atla Selene Mini: 일반 목적 평가 모델

초록

우리는 Atla Selene Mini를 소개합니다. 이는 최첨단 소형 언어 모델-판단자(SLMJ)입니다. Selene Mini는 일반적인 목적의 평가자로, 11개의 분포 밖 벤치마크에서 전체적인 성능에서 최고의 SLMJs 및 GPT-4o-mini를 능가합니다. 이는 절대 점수 매기기, 분류 및 쌍대 선호 작업을 포함하는 벤치마크에서 우수한 성과를 보여줍니다. 이는 RewardBench에서 가장 높은 점수를 받은 8B 생성 모델로, GPT-4o 및 전문 판단자와 같은 강력한 기준선을 능가합니다. 이를 달성하기 위해, 우리는 원칙에 입각한 데이터 선별 전략을 개발하여 공개 데이터셋을 합성으로 생성된 비평으로 보강하고, 필터링 및 데이터셋 제거를 통해 높은 품질을 보장합니다. 우리는 모델을 직접 선호 최적화(DPO) 및 지도 미세 조정(SFT) 손실을 결합하여 훈련시키고, 현실 세계 시나리오에서 뛰어난 성과를 내는 높은 프롬프트 가능한 평가자를 제작합니다. Selene Mini는 금융 및 의료 산업 데이터셋에서 전문가 평가와의 제로샷 일치가 현저히 향상되었습니다. 또한 프롬프트 형식의 변화에 강건합니다. 예비 결과는 Selene Mini가 라이브 커뮤니티 주도의 판단자 아레나에서 최고 순위의 평가자임을 나타냅니다. 우리는 모델 가중치를 HuggingFace(https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B) 및 Ollama에 공개하여 광범위한 커뮤니티 채택을 촉진합니다.

English

We introduce Atla Selene Mini, a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini is a general-purpose evaluator that outperforms the best SLMJs and GPT-4o-mini on overall performance across 11 out-of-distribution benchmarks, spanning absolute scoring, classification, and pairwise preference tasks. It is the highest-scoring 8B generative model on RewardBench, surpassing strong baselines like GPT-4o and specialized judges. To achieve this, we develop a principled data curation strategy that augments public datasets with synthetically generated critiques and ensures high quality through filtering and dataset ablations. We train our model on a combined direct preference optimization (DPO) and supervised fine-tuning (SFT) loss, and produce a highly promptable evaluator that excels in real-world scenarios. Selene Mini shows dramatically improved zero-shot agreement with human expert evaluations on financial and medical industry datasets. It is also robust to variations in prompt format. Preliminary results indicate that Selene Mini is the top-ranking evaluator in a live, community-driven Judge Arena. We release the model weights on HuggingFace (https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B) and Ollama to encourage widespread community adoption.

Atla Selene Mini: 일반 목적 평가 모델

Atla Selene Mini: A General Purpose Evaluation Model

초록

Support