ExaGPT:基于示例的机器生成文本检测,提升人类可解释性
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
February 17, 2025
作者: Ryuto Koike, Masahiro Kaneko, Ayana Niwa, Preslav Nakov, Naoaki Okazaki
cs.AI
摘要
检测由大型语言模型(LLMs)生成的文本可能导致严重错误,例如损害学生的学术尊严。因此,LLM文本检测需要确保决策的可解释性,以帮助用户判断其预测的可靠性。当人类验证一段文本是由人撰写还是由LLM生成时,他们会直观地考察该文本与哪一方有更多相似的片段。然而,现有的可解释检测器并未与人类的决策过程保持一致,未能提供易于用户理解的证据。为填补这一空白,我们提出了ExaGPT,一种基于人类决策过程的可解释检测方法,用于验证文本来源。ExaGPT通过检查文本与数据存储中人类撰写文本或LLM生成文本的相似片段数量来进行识别。该方法能够为文本中的每个片段提供相似的片段示例作为决策依据。我们的人类评估表明,提供相似片段示例比现有的可解释方法更有效地帮助判断决策的正确性。此外,在四个领域和三种生成器上的大量实验显示,ExaGPT在1%的误报率下,准确率大幅超越先前强大的检测器,最高提升达40.9个百分点。
English
Detecting texts generated by Large Language Models (LLMs) could cause grave
mistakes due to incorrect decisions, such as undermining student's academic
dignity. LLM text detection thus needs to ensure the interpretability of the
decision, which can help users judge how reliably correct its prediction is.
When humans verify whether a text is human-written or LLM-generated, they
intuitively investigate with which of them it shares more similar spans.
However, existing interpretable detectors are not aligned with the human
decision-making process and fail to offer evidence that users easily
understand. To bridge this gap, we introduce ExaGPT, an interpretable detection
approach grounded in the human decision-making process for verifying the origin
of a text. ExaGPT identifies a text by checking whether it shares more similar
spans with human-written vs. with LLM-generated texts from a datastore. This
approach can provide similar span examples that contribute to the decision for
each span in the text as evidence. Our human evaluation demonstrates that
providing similar span examples contributes more effectively to judging the
correctness of the decision than existing interpretable methods. Moreover,
extensive experiments in four domains and three generators show that ExaGPT
massively outperforms prior powerful detectors by up to +40.9 points of
accuracy at a false positive rate of 1%.Summary
AI-Generated Summary