ChatPaper.aiChatPaper

超级智能体带来灾难性风险:科学家AI能否开辟更安全的道路?

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

February 21, 2025
作者: Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, Marc-Antoine Rondeau, Pierre-Luc St-Charles, David Williams-King
cs.AI

摘要

领先的AI公司正日益聚焦于构建通用型AI代理——这些系统能够自主规划、行动并追求目标,几乎涵盖人类所能执行的所有任务。尽管这类系统可能极具实用价值,但不受约束的AI代理能力对公共安全与安保构成了重大风险,从恶意行为者的滥用,到可能造成人类控制权的不可逆丧失,不一而足。我们探讨了这些风险如何源于当前的AI训练方法。事实上,多种情境与实验已表明,AI代理有可能进行欺骗,或追求人类操作者未明确指定且与人类利益相冲突的目标,如自我保存。遵循预防原则,我们强烈认为需要寻找比当前代理驱动路径更安全、同时仍具实用性的替代方案。因此,我们提出作为进一步发展的核心构建模块,开发一种设计上可信且安全的非代理型AI系统,我们称之为“科学家AI”。该系统旨在通过观察解释世界,而非在其中采取行动以模仿或取悦人类。它包含一个生成理论以解释数据的世界模型,以及一个问答推理机。这两个组件均以明确的不确定性概念运作,以减轻过度自信预测的风险。鉴于这些考量,科学家AI可用于协助人类研究者加速科学进步,包括在AI安全领域。特别是,我们的系统可作为一道防护栏,抵御那些尽管存在风险仍可能被创建的AI代理。最终,专注于非代理型AI或许能在享受AI创新带来的益处的同时,规避当前发展路径伴随的风险。我们希望这些论点能激励研究者、开发者及政策制定者倾向于选择这条更为安全的道路。
English
The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. We discuss how these risks arise from current AI training methods. Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, we see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, we propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which we call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions. In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, our system can be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. We hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.

Summary

AI-Generated Summary

PDF52February 24, 2025