为何网络AI代理比独立大语言模型更易受攻击？一项安全分析

摘要

近期，网络AI代理在应对复杂网页导航任务方面展现出了显著的能力。然而，新兴研究表明，尽管这些代理与独立的大型语言模型（LLMs）均基于相同的安全对齐模型构建，但前者却表现出更高的脆弱性。这一差异尤为令人关注，因为相较于独立的LLMs，网络AI代理具备更大的灵活性，这可能使其面临更广泛的对抗性用户输入。为构建一个解决这些问题的框架，本研究深入探讨了导致网络AI代理脆弱性增加的根本因素。值得注意的是，这种差异源于网络AI代理与独立LLMs之间的多方面差异，以及复杂信号——这些细微差别往往被简单的评估指标（如成功率）所忽视。为应对这些挑战，我们提出了组件级分析和更为细致、系统的评估框架。通过这一精细化的研究，我们识别出加剧网络AI代理脆弱性的三个关键因素：(1) 将用户目标嵌入系统提示中，(2) 多步骤动作生成，以及(3) 观察能力。我们的研究结果强调了在AI代理设计中提升安全性和鲁棒性的迫切需求，并为制定有针对性的防御策略提供了可操作的见解。

English

Recent advancements in Web AI agents have demonstrated remarkable capabilities in addressing complex web navigation tasks. However, emerging research shows that these agents exhibit greater vulnerability compared to standalone Large Language Models (LLMs), despite both being built upon the same safety-aligned models. This discrepancy is particularly concerning given the greater flexibility of Web AI Agent compared to standalone LLMs, which may expose them to a wider range of adversarial user inputs. To build a scaffold that addresses these concerns, this study investigates the underlying factors that contribute to the increased vulnerability of Web AI agents. Notably, this disparity stems from the multifaceted differences between Web AI agents and standalone LLMs, as well as the complex signals - nuances that simple evaluation metrics, such as success rate, often fail to capture. To tackle these challenges, we propose a component-level analysis and a more granular, systematic evaluation framework. Through this fine-grained investigation, we identify three critical factors that amplify the vulnerability of Web AI agents; (1) embedding user goals into the system prompt, (2) multi-step action generation, and (3) observational capabilities. Our findings highlights the pressing need to enhance security and robustness in AI agent design and provide actionable insights for targeted defense strategies.

为何网络AI代理比独立大语言模型更易受攻击？一项安全分析

Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis

摘要

Summary

Support

Support