大语言模型的检测规避技术
Detection Avoidance Techniques for Large Language Models
March 10, 2025
作者: Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo Rodosek
cs.AI
摘要
大型语言模型的日益普及不仅带来了广泛应用,也伴随着多种风险,包括系统性传播虚假新闻的可能性。因此,开发如DetectGPT等分类系统变得至关重要。然而,这些检测器易受规避技术的影响,一系列实验证明了这一点:通过系统性地调整生成模型的温度参数,浅层学习检测器被证明是最不可靠的;通过强化学习微调生成模型,成功绕过了基于BERT的检测器;最后,重述文本使得如DetectGPT这样的零样本检测器的规避率超过90%,尽管文本与原文保持高度相似。与现有工作的对比显示,所提出的方法具有更优的性能。本文还探讨了这些发现对社会及未来研究的可能影响。
English
The increasing popularity of large language models has not only led to
widespread use but has also brought various risks, including the potential for
systematically spreading fake news. Consequently, the development of
classification systems such as DetectGPT has become vital. These detectors are
vulnerable to evasion techniques, as demonstrated in an experimental series:
Systematic changes of the generative models' temperature proofed shallow
learning-detectors to be the least reliable. Fine-tuning the generative model
via reinforcement learning circumvented BERT-based-detectors. Finally,
rephrasing led to a >90\% evasion of zero-shot-detectors like DetectGPT,
although texts stayed highly similar to the original. A comparison with
existing work highlights the better performance of the presented methods.
Possible implications for society and further research are discussed.Summary
AI-Generated Summary