Progent：大型語言模型代理的可編程權限控制

摘要

LLM代理是一種新興的AI系統形式，其中大型語言模型（LLMs）作為核心組件，利用多種工具來完成用戶分配的任務。儘管它們具有巨大潛力，但LLM代理也帶來了顯著的安全風險。在與外部世界互動時，它們可能會遇到攻擊者的惡意指令，從而導致執行危險操作。解決這一問題的一個有前景的方法是實施最小權限原則：僅允許完成任務所必需的操作，同時阻止不必要的操作。然而，實現這一點具有挑戰性，因為它需要在涵蓋多樣化代理場景的同時，兼顧安全性和實用性。我們引入了Progent，這是首個針對LLM代理的權限控制機制。其核心是一種領域特定語言，用於靈活表達在代理執行過程中應用的權限控制策略。這些策略提供了對工具調用的細粒度約束，決定何時允許工具調用，並在不可行時指定備用方案。這使得代理開發者和用戶能夠為其特定用例制定合適的策略，並確定性地執行這些策略以確保安全性。得益於其模塊化設計，集成Progent不會改變代理的內部結構，僅需對代理實現進行最小程度的修改，從而增強了其實用性和廣泛採用的潛力。為了自動化策略編寫，我們利用LLM基於用戶查詢生成策略，並動態更新這些策略以提高安全性和實用性。我們的大量評估表明，它在三個不同的場景或基準測試（AgentDojo、ASB和AgentPoison）中實現了強大的安全性，同時保持了高實用性。此外，我們進行了深入分析，展示了其核心組件的有效性以及其自動化策略生成在應對自適應攻擊時的韌性。

English

LLM agents are an emerging form of AI systems where large language models (LLMs) serve as the central component, utilizing a diverse set of tools to complete user-assigned tasks. Despite their great potential, LLM agents pose significant security risks. When interacting with the external world, they may encounter malicious commands from attackers, leading to the execution of dangerous actions. A promising way to address this is by enforcing the principle of least privilege: allowing only essential actions for task completion while blocking unnecessary ones. However, achieving this is challenging, as it requires covering diverse agent scenarios while preserving both security and utility. We introduce Progent, the first privilege control mechanism for LLM agents. At its core is a domain-specific language for flexibly expressing privilege control policies applied during agent execution. These policies provide fine-grained constraints over tool calls, deciding when tool calls are permissible and specifying fallbacks if they are not. This enables agent developers and users to craft suitable policies for their specific use cases and enforce them deterministically to guarantee security. Thanks to its modular design, integrating Progent does not alter agent internals and requires only minimal changes to agent implementation, enhancing its practicality and potential for widespread adoption. To automate policy writing, we leverage LLMs to generate policies based on user queries, which are then updated dynamically for improved security and utility. Our extensive evaluation shows that it enables strong security while preserving high utility across three distinct scenarios or benchmarks: AgentDojo, ASB, and AgentPoison. Furthermore, we perform an in-depth analysis, showcasing the effectiveness of its core components and the resilience of its automated policy generation against adaptive attacks.

Progent：大型語言模型代理的可編程權限控制

Progent: Programmable Privilege Control for LLM Agents

摘要

Summary

Support

Support