선호도 유출: LLM 판사로서의 오염 문제

초록

대형 언어 모델(LLM)을 판사로 사용하고 LLM 기반 데이터 합성은 모델 개발에서 두 가지 근본적인 LLM 주도 데이터 주석 방법으로 등장했습니다. 이들의 결합은 모델 훈련과 평가의 효율성을 크게 향상시키지만, 이 새로운 모델 개발 패러다임에 의해 가져온 잠재적인 오염에는 거의 주의가 기울어지지 않았습니다. 본 연구에서는 LLM-판사로서의 선호 누출이라는 LLM-판사에서 발생하는 오염 문제를 다룹니다. 이 문제는 합성 데이터 생성기와 LLM 기반 평가자 간의 관련성으로 인한 것입니다. 이 문제를 연구하기 위해 먼저 데이터 생성기 LLM과 판사 LLM 간의 세 가지 일반적인 관련성을 정의합니다: 동일한 모델, 상속 관계, 동일한 모델 패밀리에 속함. 광범위한 실험을 통해, 우리는 판사들이 선호 누출로 인해 관련 학생 모델에 편향되어 있는 것을 여러 LLM 기준선과 벤치마크를 통해 경험적으로 확인합니다. 추가적인 분석은 선호 누출이 이전에 식별된 편향과 비교하여 감지하기 어려운 보편적인 문제임을 시사합니다. 이러한 발견들은 선호 누출이 LLM-판사 영역에서 보편적이고 도전적인 문제임을 시사합니다. 모든 코드와 데이터는 다음에서 확인할 수 있습니다: https://github.com/David-Li0406/Preference-Leakage.

English

Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhances the efficiency of model training and evaluation, little attention has been given to the potential contamination brought by this new model development paradigm. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators. To study this issue, we first define three common relatednesses between data generator LLM and judge LLM: being the same model, having an inheritance relationship, and belonging to the same model family. Through extensive experiments, we empirically confirm the bias of judges towards their related student models caused by preference leakage across multiple LLM baselines and benchmarks. Further analysis suggests that preference leakage is a pervasive issue that is harder to detect compared to previously identified biases in LLM-as-a-judge scenarios. All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge. We release all codes and data at: https://github.com/David-Li0406/Preference-Leakage.

선호도 유출: LLM 판사로서의 오염 문제

Preference Leakage: A Contamination Problem in LLM-as-a-judge

초록

Support