ChatPaper.aiChatPaper

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

February 20, 2025
Authors: Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
cs.AI

Abstract

User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model's other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model's representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. We evaluate UPCORE across three standard unlearning methods consistently achieving a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. We find that UPCORE improves both standard metrics and AUC, benefitting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.

Summary

AI-Generated Summary

Paper Overview

Core Contribution

  • UPCORE: A method-agnostic data selection framework for mitigating collateral damage during unlearning in large language models (LLMs).
  • Key Insight: Model damage during unlearning is correlated with the variance of the model’s representations on the forget set. UPCORE selectively prunes outliers to minimize degradation.
  • Innovation: Introduces a new metric, AUC (area-under-the-curve), to evaluate the trade-off between deletion efficacy and model preservation.

Research Context

  • Problem: Unlearning specific data points from pretrained models often degrades performance on unrelated data, leading to a trade-off between deletion and model utility.
  • Motivation: Legal frameworks (e.g., GDPR, CCPA) require the removal of sensitive data, but current unlearning methods struggle to balance deletion and model preservation.
  • Prior Work: Existing unlearning methods (e.g., Gradient Ascent, Refusal, NPO) often lead to unintended consequences, such as reduced model utility.

Keywords

  • Machine Unlearning, Coreset Selection, Utility Preservation, Collateral Damage, Isolation Forest, AUC Metric, LLMs, Data Privacy

Background

Research Gap

  • Challenge: Existing unlearning methods do not systematically control data attributes (e.g., variance) that drive collateral damage.
  • Question: What measurable attributes of the forget set drive collateral effects, and can they be controlled to optimize the trade-off between deletion and utility?

Technical Challenges

  • Overgeneralization: Unlearning updates often propagate beyond the forget set, affecting unrelated data.
  • Model Degradation: Unlearning can lead to significant performance loss on retained data.
  • Data Variance: High variance in the forget set correlates with increased collateral damage.

Prior Approaches

  • Exact Unlearning: Ensures the model is indistinguishable from one retrained without the forget data (computationally expensive).
  • Approximate Unlearning: Modifies model parameters to approximate exact unlearning (more efficient but less precise).
  • Model Editing: Directly modifies model weights to forget specific facts, but often leads to unintended consequences.

Methodology

Technical Architecture

  1. Hidden State Extraction: Extract hidden state representations from the model’s penultimate layer for each data point in the forget set.
  2. Isolation Forest Training: Train an Isolation Forest model to identify outliers in the forget set based on hidden state variance.
  3. Coreset Selection: Prune outliers to form a core forget set with lower variance.
  4. Unlearning: Apply unlearning algorithms (e.g., Gradient Ascent, Refusal, NPO) to the core forget set.

Implementation Details

  • Isolation Forest: An unsupervised learning technique that efficiently identifies anomalous data points by recursively partitioning the dataset.
  • Anomaly Score: Computed for each data point, with higher scores indicating greater contribution to variance.
  • Pruning Threshold: Determined by either coreset size control or proportional pruning.

Innovation Points

  • Variance Minimization: Reduces collateral damage by pruning high-variance outliers.
  • Positive Transfer: Leverages overgeneralization to extend unlearning to pruned points.
  • Method-Agnostic: Can be applied to any data-driven unlearning framework.

Results

Experimental Setup

  • Unlearning Methods: Gradient Ascent, Refusal, Negative Preference Optimization (NPO).
  • Datasets: Counterfact (prompt completion) and TriviaQA (question answering).
  • Metrics: ROUGE scores, model utility, AUC for deletion effectiveness and utility retention.

Key Findings

  • Superior Trade-off: UPCORE consistently achieves higher AUC across all unlearning methods, indicating a better balance between deletion and model utility.
  • Positive Transfer: Unlearning on the core forget set extends to pruned points, reducing collateral damage.
  • Robustness: UPCORE generalizes well to paraphrased and jailbreak prompts, maintaining high deletion efficacy.

Limitations

  • Coreset Size: Performance gains stabilize beyond 30% pruning, but further pruning may degrade forget set performance.
  • Scalability: The effectiveness of UPCORE may vary with the size and complexity of the forget set.

Conclusion

  • UPCORE: A principled framework for utility-preserving unlearning in LLMs, leveraging variance minimization to reduce collateral damage.
  • AUC Metric: Introduces a novel metric to quantify the trade-off between deletion and model utility across unlearning epochs.
  • Future Work: Explore the application of UPCORE to other machine learning models and unlearning scenarios.

Acknowledgments

  • Supported by NSF-CAREER Award 1846185, Microsoft AFMR, DARPA ECOLE, and NSF-AI Engage Institute.

Featured Papers

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu WeiFeb 27, 2024610142

Qwen2.5 Technical Report

Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan QiuDec 19, 202435211

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen ZhangJan 22, 20253485

PDF12February 24, 2025