通過概率可發現的提取來衡量記憶力
Measuring memorization through probabilistic discoverable extraction
October 25, 2024
作者: Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov
cs.AI
摘要
大型語言模型(LLMs)容易記憶訓練數據,引發對於可能提取敏感信息的擔憂。目前用於衡量LLMs記憶率的方法,主要是可發現提取(Carlini等,2022),依賴單序列貪婪抽樣,可能低估了記憶的真實程度。本文引入了可發現提取的概率放鬆,量化在生成樣本集合中提取目標序列的概率,考慮各種抽樣方案和多次嘗試。這種方法通過考慮LLMs的概率性質和用戶互動模式,解決了通過可發現提取報告記憶率的限制。我們的實驗表明,這種概率度量可以揭示比通過可發現提取發現的記憶率更高的情況。我們進一步研究了不同抽樣方案對可提取性的影響,提供了對LLM記憶和相關風險的更全面和現實的評估。我們的貢獻包括一個新的概率性記憶定義,其有效性的實證證據,以及在不同模型、大小、抽樣方案和訓練數據重複上的徹底評估。
English
Large language models (LLMs) are susceptible to memorizing training data,
raising concerns due to the potential extraction of sensitive information.
Current methods to measure memorization rates of LLMs, primarily discoverable
extraction (Carlini et al., 2022), rely on single-sequence greedy sampling,
potentially underestimating the true extent of memorization. This paper
introduces a probabilistic relaxation of discoverable extraction that
quantifies the probability of extracting a target sequence within a set of
generated samples, considering various sampling schemes and multiple attempts.
This approach addresses the limitations of reporting memorization rates through
discoverable extraction by accounting for the probabilistic nature of LLMs and
user interaction patterns. Our experiments demonstrate that this probabilistic
measure can reveal cases of higher memorization rates compared to rates found
through discoverable extraction. We further investigate the impact of different
sampling schemes on extractability, providing a more comprehensive and
realistic assessment of LLM memorization and its associated risks. Our
contributions include a new probabilistic memorization definition, empirical
evidence of its effectiveness, and a thorough evaluation across different
models, sizes, sampling schemes, and training data repetitions.Summary
AI-Generated Summary