UPCORE:面向均衡遗忘的效用保持型核心集选择
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
February 20, 2025
作者: Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
cs.AI
摘要
用户规范或法律框架常要求从预训练模型中移除特定信息,包括大型语言模型(LLMs)。这需要从已训练模型中删除或“遗忘”一组数据点,这通常会降低模型在其他数据点上的性能。因此,必须在移除信息与保持模型其他能力之间找到平衡,未能平衡这一取舍将导致删除效果不佳或模型无法使用。为此,我们提出了UPCORE(效用保持核心集选择),一种方法无关的数据选择框架,用于在遗忘过程中减轻附带损害。我们发现模型损害与模型在遗忘集上表示方差相关,因此我们选择性地修剪遗忘集以移除异常值,从而最小化遗忘后的模型性能下降。我们在三种标准遗忘方法上评估了UPCORE,一致实现了删除效果与模型保持之间更优的平衡。为了更好地评估这一取舍,我们引入了一个新指标,通过标准指标下的曲线下面积(AUC)来衡量。我们发现UPCORE不仅提升了标准指标和AUC,还受益于核心集与修剪点之间的正向迁移,同时减少了遗忘集对集外点的负向迁移。
English
User specifications or legal frameworks often require information to be
removed from pretrained models, including large language models (LLMs). This
requires deleting or "forgetting" a set of data points from an already-trained
model, which typically degrades its performance on other data points. Thus, a
balance must be struck between removing information and keeping the model's
other abilities intact, with a failure to balance this trade-off leading to
poor deletion or an unusable model. To this end, we propose UPCORE
(Utility-Preserving Coreset Selection), a method-agnostic data selection
framework for mitigating collateral damage during unlearning. Finding that the
model damage is correlated with the variance of the model's representations on
the forget set, we selectively prune the forget set to remove outliers, thereby
minimizing model degradation after unlearning. We evaluate UPCORE across three
standard unlearning methods consistently achieving a superior balance between
the competing objectives of deletion efficacy and model preservation. To better
evaluate this trade-off, we introduce a new metric, measuring the
area-under-the-curve (AUC) across standard metrics. We find that UPCORE
improves both standard metrics and AUC, benefitting from positive transfer
between the coreset and pruned points while reducing negative transfer from the
forget set to points outside of it.Summary
AI-Generated Summary