通过概率可发现提取来衡量记忆化。
Measuring memorization through probabilistic discoverable extraction
October 25, 2024
作者: Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov
cs.AI
摘要
大型语言模型(LLMs)容易记忆训练数据,引发对可能提取敏感信息的担忧。目前用于衡量LLMs记忆率的方法,主要是可发现提取(Carlini等,2022),依赖于单序列贪婪抽样,可能低估了记忆的真实程度。本文引入了可发现提取的概率放松,量化在生成的样本集中提取目标序列的概率,考虑了各种抽样方案和多次尝试。这种方法通过考虑LLMs的概率性质和用户交互模式,解决了通过可发现提取报告记忆率的局限性。我们的实验证明,这种概率度量可以揭示相比通过可发现提取发现的记忆率更高的情况。我们进一步研究了不同抽样方案对可提取性的影响,提供了对LLM记忆化及其相关风险更全面和现实的评估。我们的贡献包括新的概率记忆化定义,其有效性的实证证据,以及在不同模型、大小、抽样方案和训练数据重复上的彻底评估。
English
Large language models (LLMs) are susceptible to memorizing training data,
raising concerns due to the potential extraction of sensitive information.
Current methods to measure memorization rates of LLMs, primarily discoverable
extraction (Carlini et al., 2022), rely on single-sequence greedy sampling,
potentially underestimating the true extent of memorization. This paper
introduces a probabilistic relaxation of discoverable extraction that
quantifies the probability of extracting a target sequence within a set of
generated samples, considering various sampling schemes and multiple attempts.
This approach addresses the limitations of reporting memorization rates through
discoverable extraction by accounting for the probabilistic nature of LLMs and
user interaction patterns. Our experiments demonstrate that this probabilistic
measure can reveal cases of higher memorization rates compared to rates found
through discoverable extraction. We further investigate the impact of different
sampling schemes on extractability, providing a more comprehensive and
realistic assessment of LLM memorization and its associated risks. Our
contributions include a new probabilistic memorization definition, empirical
evidence of its effectiveness, and a thorough evaluation across different
models, sizes, sampling schemes, and training data repetitions.Summary
AI-Generated Summary