이 문제를 푸시오! 검은색 멤버십 추론을 위한 은밀함 증가 생성

초록

검색 증강 생성 (RAG)은 대규모 언어 모델 (LLM)이 외부 지식 데이터베이스를 활용하여 모델 파라미터를 변경하지 않고 근거 있는 응답을 생성할 수 있게 합니다. 가중치 조정의 부재로 인해 모델 파라미터를 통한 정보 누출이 방지되지만, 이는 검색된 문서가 모델의 맥락에서 추론 적대자에 의해 악용될 위험을 도입합니다. 기존의 회원 정보 추론 및 데이터 추출 방법은 주로 탈옥이나 신중하게 설계된 비자연스러운 쿼리에 의존하는데, 이는 RAG 시스템에서 일반적인 쿼리 재작성 기술로 쉽게 감지되거나 방해될 수 있습니다. 본 연구에서는 RAG 데이터 저장소의 문서를 대상으로 하는 회원 정보 추론 기술인 Interrogation Attack (IA)을 제시합니다. 대상 문서의 존재로만 답변 가능한 자연어 쿼리를 작성함으로써, 우리의 접근 방식은 30개의 쿼리만으로 성공적인 추론을 시연하면서도 은밀하게 유지됩니다. 기존 방법에서의 적대적 프롬프트는 우리의 공격에 의해 생성된 것보다 약 76배 더 자주 감지됩니다. 우리는 다양한 RAG 구성에서 이전 추론 공격에 비해 TPR@1%FPR에서 2배의 성능 향상을 관찰하며, 문서 추론 당 비용이 $0.02 미만입니다.

English

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to generate grounded responses by leveraging external knowledge databases without altering model parameters. Although the absence of weight tuning prevents leakage via model parameters, it introduces the risk of inference adversaries exploiting retrieved documents in the model's context. Existing methods for membership inference and data extraction often rely on jailbreaking or carefully crafted unnatural queries, which can be easily detected or thwarted with query rewriting techniques common in RAG systems. In this work, we present Interrogation Attack (IA), a membership inference technique targeting documents in the RAG datastore. By crafting natural-text queries that are answerable only with the target document's presence, our approach demonstrates successful inference with just 30 queries while remaining stealthy; straightforward detectors identify adversarial prompts from existing methods up to ~76x more frequently than those generated by our attack. We observe a 2x improvement in TPR@1%FPR over prior inference attacks across diverse RAG configurations, all while costing less than $0.02 per document inference.

이 문제를 푸시오! 검은색 멤버십 추론을 위한 은밀함 증가 생성

Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation

초록

Summary

Support