AntiLeak-Bench:通过自动构建带有更新的现实世界知识的基准来防止数据污染
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
December 18, 2024
作者: Xiaobao Wu, Liangming Pan, Yuxi Xie, Ruiwen Zhou, Shuai Zhao, Yubo Ma, Mingzhe Du, Rui Mao, Anh Tuan Luu, William Yang Wang
cs.AI
摘要
数据污染通过将测试数据引入新模型的训练集,阻碍了对LLM的公平评估。现有研究通过使用新收集的数据更新基准来解决这一挑战。然而,它们未能保证无污染的评估,因为新收集的数据可能包含预先存在的知识,并且它们的基准更新依赖于大量人力。为了解决这些问题,本文提出了一种自动化反泄漏基准框架AntiLeak-Bench。我们不仅仅使用新收集的数据,而是构建样本,其中明确不包含LLM训练集中的新知识,从而确保严格无污染的评估。我们进一步设计了一个完全自动化的工作流程来构建和更新我们的基准,无需人力。这显著降低了基准维护的成本,以适应新兴的LLM。通过大量实验,我们强调数据污染很可能存在于LLM的截止时间之前,并展示了AntiLeak-Bench有效地克服了这一挑战。
English
Data contamination hinders fair LLM evaluation by introducing test data into
newer models' training sets. Existing studies solve this challenge by updating
benchmarks with newly collected data. However, they fail to guarantee
contamination-free evaluation as the newly collected data may contain
pre-existing knowledge, and their benchmark updates rely on intensive human
labor. To address these issues, we in this paper propose AntiLeak-Bench, an
automated anti-leakage benchmarking framework. Instead of simply using newly
collected data, we construct samples with explicitly new knowledge absent from
LLMs' training sets, which thus ensures strictly contamination-free evaluation.
We further design a fully automated workflow to build and update our benchmark
without human labor. This significantly reduces the cost of benchmark
maintenance to accommodate emerging LLMs. Through extensive experiments, we
highlight that data contamination likely exists before LLMs' cutoff time and
demonstrate AntiLeak-Bench effectively overcomes this challenge.Summary
AI-Generated Summary