SafeRAG: 대형 언어 모델의 검색 보강 생성에서 보안을 벤치마킹하기

초록

검색-검색-생성 패러다임의 검색-증강 생성(RAG)은 외부 지식을 대형 언어 모델(LLM)에 통합함으로써 지식 밀집형 작업을 해결하는 데 매우 성공적이었습니다. 그러나 외부 및 검증되지 않은 지식의 통합은 LLM의 취약성을 증가시키며, 공격자가 지식을 조작하여 공격 작업을 수행할 수 있습니다. 본 논문에서는 RAG 보안을 평가하기 위해 설계된 SafeRAG라는 벤치마크를 소개합니다. 먼저, 공격 작업을 은색 잡음, 상황 간 충돌, 부드러운 광고, 백색 서비스 거부로 분류합니다. 그런 다음, 각 작업에 대해 주로 수동으로 RAG 보안 평가 데이터셋(즉, SafeRAG 데이터셋)을 작성합니다. 그런 다음 SafeRAG 데이터셋을 활용하여 RAG가 직면할 수 있는 다양한 공격 시나리오를 시뮬레이션합니다. 14가지 대표적인 RAG 구성 요소에서 수행된 실험은 RAG가 모든 공격 작업에 대해 상당한 취약성을 나타내며, 가장 명백한 공격 작업조차도 기존의 검색기, 필터 또는 고급 LLM을 우회하여 RAG 서비스 품질을 저하시킬 수 있음을 보여줍니다. 코드는 다음에서 이용 가능합니다: https://github.com/IAAR-Shanghai/SafeRAG.

English

The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating external knowledge into large language models (LLMs). However, the incorporation of external and unverified knowledge increases the vulnerability of LLMs because attackers can perform attack tasks by manipulating knowledge. In this paper, we introduce a benchmark named SafeRAG designed to evaluate the RAG security. First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service. Next, we construct RAG security evaluation dataset (i.e., SafeRAG dataset) primarily manually for each task. We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter. Experiments conducted on 14 representative RAG components demonstrate that RAG exhibits significant vulnerability to all attack tasks and even the most apparent attack task can easily bypass existing retrievers, filters, or advanced LLMs, resulting in the degradation of RAG service quality. Code is available at: https://github.com/IAAR-Shanghai/SafeRAG.

SafeRAG: 대형 언어 모델의 검색 보강 생성에서 보안을 벤치마킹하기

SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model

초록

Support