SafeRAG：在大型语言模型的检索增强生成中进行安全性基准测试

摘要

在检索增强生成（RAG）的索引-检索-生成范式中，通过将外部知识整合到大型语言模型（LLMs）中，已经取得了极大的成功，用于解决知识密集型任务。然而，外部和未经验证知识的整合增加了LLMs的脆弱性，因为攻击者可以通过操纵知识来执行攻击任务。本文介绍了一个名为SafeRAG的基准，旨在评估RAG的安全性。首先，我们将攻击任务分类为银噪声、跨上下文冲突、软广告和白色拒绝服务。接下来，我们为每个任务主要手动构建了RAG安全性评估数据集（即SafeRAG数据集）。然后，我们利用SafeRAG数据集模拟RAG可能遇到的各种攻击场景。对14个代表性的RAG组件进行的实验表明，RAG对所有攻击任务都表现出明显的脆弱性，即使是最明显的攻击任务也可以轻松绕过现有的检索器、过滤器或先进的LLMs，导致RAG服务质量下降。代码可在以下网址找到：https://github.com/IAAR-Shanghai/SafeRAG。

English

The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating external knowledge into large language models (LLMs). However, the incorporation of external and unverified knowledge increases the vulnerability of LLMs because attackers can perform attack tasks by manipulating knowledge. In this paper, we introduce a benchmark named SafeRAG designed to evaluate the RAG security. First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service. Next, we construct RAG security evaluation dataset (i.e., SafeRAG dataset) primarily manually for each task. We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter. Experiments conducted on 14 representative RAG components demonstrate that RAG exhibits significant vulnerability to all attack tasks and even the most apparent attack task can easily bypass existing retrievers, filters, or advanced LLMs, resulting in the degradation of RAG service quality. Code is available at: https://github.com/IAAR-Shanghai/SafeRAG.

SafeRAG：在大型语言模型的检索增强生成中进行安全性基准测试

SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model

摘要

Summary

Support