Evalica를 사용한 신뢰할 수 있고 재현 가능하며 매우 빠른 리더보드

초록

자연어 처리(NLP) 기술의 신속한 발전은 지시 조정 대형 언어 모델(LLM)과 같은 기술로 현대 평가 프로토콜의 발전을 촉구한다. 우리는 신뢰성 있고 재현 가능한 모델 리더보드의 생성을 용이하게 하는 오픈 소스 툴킷인 Evalica를 소개한다. 본 논문은 그 설계를 제시하고, 성능을 평가하며, 웹 인터페이스, 명령줄 인터페이스, 그리고 Python API를 통해 그 사용성을 시연한다.

English

The rapid advancement of natural language processing (NLP) technologies, such as instruction-tuned large language models (LLMs), urges the development of modern evaluation protocols with human and machine feedback. We introduce Evalica, an open-source toolkit that facilitates the creation of reliable and reproducible model leaderboards. This paper presents its design, evaluates its performance, and demonstrates its usability through its Web interface, command-line interface, and Python API.

Evalica를 사용한 신뢰할 수 있고 재현 가능하며 매우 빠른 리더보드

Reliable, Reproducible, and Really Fast Leaderboards with Evalica

초록

Support