偏好泄漏：LLM作为法官中的污染问题

摘要

大型语言模型（LLMs）作为评判者和基于LLM的数据合成已经成为模型开发中两种基本的LLM驱动数据标注方法。尽管它们的结合显著提高了模型训练和评估的效率，但对于这种新的模型开发范式可能带来的潜在污染问题却鲜有关注。在这项工作中，我们揭示了偏好泄漏，这是LLM作为评判者中由于合成数据生成器与基于LLM的评估者之间的相关性而引起的污染问题。为了研究这个问题，我们首先定义了数据生成器LLM和评判者LLM之间的三种常见相关性：相同模型、具有继承关系和属于相同的模型系列。通过大量实验证实了评判者对其相关学生模型的偏好泄漏在多个LLM基线和基准测试中的偏见。进一步的分析表明，相对于先前在LLM作为评判者场景中识别的偏见，偏好泄漏是一个更难以检测的普遍问题。所有这些发现都暗示了偏好泄漏在LLM作为评判者领域是一个普遍且具有挑战性的问题。我们在以下链接发布所有代码和数据：https://github.com/David-Li0406/Preference-Leakage。

English

Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhances the efficiency of model training and evaluation, little attention has been given to the potential contamination brought by this new model development paradigm. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators. To study this issue, we first define three common relatednesses between data generator LLM and judge LLM: being the same model, having an inheritance relationship, and belonging to the same model family. Through extensive experiments, we empirically confirm the bias of judges towards their related student models caused by preference leakage across multiple LLM baselines and benchmarks. Further analysis suggests that preference leakage is a pervasive issue that is harder to detect compared to previously identified biases in LLM-as-a-judge scenarios. All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge. We release all codes and data at: https://github.com/David-Li0406/Preference-Leakage.

偏好泄漏：LLM作为法官中的污染问题

Preference Leakage: A Contamination Problem in LLM-as-a-judge

摘要

Summary

Support