세대에서 판단까지: LLM으로서 판사의 기회와 도전

초록

평가와 평가는 인공지능(AI)과 자연어 처리(NLP)에서 오랫동안 중요한 과제였습니다. 그러나 전통적인 방법들, 매칭 기반이든 임베딩 기반이든, 종종 섬세한 속성을 판단하고 만족스러운 결과를 제공하는 데 한계가 있습니다. 최근 대형 언어 모델(LLMs)의 발전은 "LLM-판사" 패러다임을 영감을 주었는데, 여기서 LLMs는 다양한 작업과 응용 프로그램에서 점수 매기기, 순위 매기기 또는 선택을 수행하는 데 활용됩니다. 본 논문은 LLM 기반 판단과 평가에 대한 포괄적인 조사를 제공하여 이 신흥 분야를 발전시키기 위한 심층적인 개요를 제공합니다. 우리는 입력 및 출력 관점에서 상세한 정의를 제공함으로써 시작합니다. 그런 다음, 우리는 어떤 것을 판단할지, 어떻게 판단할지, 어디서 판단할지라는 세 가지 차원에서 LLM-판사를 탐색하기 위한 포괄적인 분류 체계를 소개합니다. 마지막으로, LLM-판사를 평가하기 위한 벤치마크를 편성하고 주요 도전 과제와 유망한 방향을 강조하여 이 유망한 연구 분야에서 가치 있는 통찰력을 제공하고 미래 연구를 영감을 주고자 합니다. LLM-판사에 관한 논문 목록 및 더 많은 자료는 https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge 및 https://llm-as-a-judge.github.io에서 찾을 수 있습니다.

English

Assessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). However, traditional methods, whether matching-based or embedding-based, often fall short of judging subtle attributes and delivering satisfactory results. Recent advancements in Large Language Models (LLMs) inspire the "LLM-as-a-judge" paradigm, where LLMs are leveraged to perform scoring, ranking, or selection across various tasks and applications. This paper provides a comprehensive survey of LLM-based judgment and assessment, offering an in-depth overview to advance this emerging field. We begin by giving detailed definitions from both input and output perspectives. Then we introduce a comprehensive taxonomy to explore LLM-as-a-judge from three dimensions: what to judge, how to judge and where to judge. Finally, we compile benchmarks for evaluating LLM-as-a-judge and highlight key challenges and promising directions, aiming to provide valuable insights and inspire future research in this promising research area. Paper list and more resources about LLM-as-a-judge can be found at https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge and https://llm-as-a-judge.github.io.

세대에서 판단까지: LLM으로서 판사의 기회와 도전

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

초록

Support