Progent：面向大语言模型代理的可编程权限控制系统

摘要

LLM代理是一种新兴的AI系统形态，其中大型语言模型（LLMs）作为核心组件，利用多样化的工具集来完成用户分配的任务。尽管它们潜力巨大，LLM代理也带来了显著的安全风险。在与外部世界交互时，它们可能遭遇攻击者的恶意指令，导致执行危险操作。解决这一问题的一个有前景的方法是实施最小权限原则：仅允许完成任务所必需的操作，同时阻止不必要的行动。然而，实现这一目标颇具挑战，因为它需要在保障安全与实用性的同时，覆盖多样化的代理场景。我们推出了Progent，这是首个针对LLM代理的权限控制机制。其核心是一种领域特定语言，用于灵活表达在代理执行过程中应用的权限控制策略。这些策略对工具调用提供细粒度的约束，决定何时允许工具调用，并在不允许时指定备用方案。这使得代理开发者和用户能够为其特定用例定制合适的策略，并确定性地执行这些策略以确保安全。得益于其模块化设计，集成Progent不会改变代理的内部结构，仅需对代理实现进行最小化修改，从而提升了其实用性和广泛采用的潜力。为了自动化策略编写，我们利用LLMs根据用户查询生成策略，随后动态更新这些策略以增强安全性和实用性。我们广泛的评估表明，在AgentDojo、ASB和AgentPoison这三个不同场景或基准测试中，Progent在保持高实用性的同时实现了强大的安全性。此外，我们进行了深入分析，展示了其核心组件的有效性以及自动化策略生成在面对适应性攻击时的韧性。

English

LLM agents are an emerging form of AI systems where large language models (LLMs) serve as the central component, utilizing a diverse set of tools to complete user-assigned tasks. Despite their great potential, LLM agents pose significant security risks. When interacting with the external world, they may encounter malicious commands from attackers, leading to the execution of dangerous actions. A promising way to address this is by enforcing the principle of least privilege: allowing only essential actions for task completion while blocking unnecessary ones. However, achieving this is challenging, as it requires covering diverse agent scenarios while preserving both security and utility. We introduce Progent, the first privilege control mechanism for LLM agents. At its core is a domain-specific language for flexibly expressing privilege control policies applied during agent execution. These policies provide fine-grained constraints over tool calls, deciding when tool calls are permissible and specifying fallbacks if they are not. This enables agent developers and users to craft suitable policies for their specific use cases and enforce them deterministically to guarantee security. Thanks to its modular design, integrating Progent does not alter agent internals and requires only minimal changes to agent implementation, enhancing its practicality and potential for widespread adoption. To automate policy writing, we leverage LLMs to generate policies based on user queries, which are then updated dynamically for improved security and utility. Our extensive evaluation shows that it enables strong security while preserving high utility across three distinct scenarios or benchmarks: AgentDojo, ASB, and AgentPoison. Furthermore, we perform an in-depth analysis, showcasing the effectiveness of its core components and the resilience of its automated policy generation against adaptive attacks.

Progent：面向大语言模型代理的可编程权限控制系统

Progent: Programmable Privilege Control for LLM Agents

摘要

Summary

Support

Support