ChatPaper.aiChatPaper

通过设计防范提示注入攻击

Defeating Prompt Injections by Design

March 24, 2025
作者: Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr
cs.AI

摘要

大型语言模型(LLMs)正越来越多地部署在与外部环境交互的代理系统中。然而,LLM代理在处理不可信数据时容易受到提示注入攻击。本文提出CaMeL,一种在LLM周围构建保护系统层的鲁棒防御机制,即使底层模型可能易受攻击,也能确保其安全。CaMeL通过明确提取(可信)查询中的控制流和数据流来运行;因此,LLM检索到的不可信数据永远不会影响程序流程。为进一步提升安全性,CaMeL依赖能力概念,以防止通过未授权数据流泄露私有数据。我们在AgentDojo [NeurIPS 2024]这一最新的代理安全基准测试中,证明了CaMeL的有效性,成功解决了67%具有可证明安全性的任务。
English
Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models may be susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL relies on a notion of a capability to prevent the exfiltration of private data over unauthorized data flows. We demonstrate effectiveness of CaMeL by solving 67% of tasks with provable security in AgentDojo [NeurIPS 2024], a recent agentic security benchmark.

Summary

AI-Generated Summary

PDF191March 25, 2025