Curie:迈向由AI代理驱动的严谨自动化科学实验
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
February 22, 2025
作者: Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Ang Chen
cs.AI
摘要
科学实验作为人类进步的基石,其严谨性体现在可靠性、系统性控制与可解释性上,以确保获得有意义的结果。尽管大型语言模型(LLMs)在自动化科学流程的多个方面展现出日益增强的能力,但实现严格实验的自动化仍面临重大挑战。为填补这一空白,我们提出了Curie,一个旨在通过三大核心组件将严谨性融入实验过程的AI代理框架:内部代理严谨性模块以提升可靠性,外部代理严谨性模块以维持系统性控制,以及实验知识模块以增强可解释性。为评估Curie,我们设计了一个新颖的实验基准,该基准包含跨越计算机科学四个领域的46个问题,这些问题源自具有影响力的研究论文及广泛采用的开源项目。与所测试的最强基线相比,Curie在正确回答实验问题上实现了3.4倍的提升。Curie已在https://github.com/Just-Curieous/Curie开源。
English
Scientific experimentation, a cornerstone of human progress, demands rigor in
reliability, methodical control, and interpretability to yield meaningful
results. Despite the growing capabilities of large language models (LLMs) in
automating different aspects of the scientific process, automating rigorous
experimentation remains a significant challenge. To address this gap, we
propose Curie, an AI agent framework designed to embed rigor into the
experimentation process through three key components: an intra-agent rigor
module to enhance reliability, an inter-agent rigor module to maintain
methodical control, and an experiment knowledge module to enhance
interpretability. To evaluate Curie, we design a novel experimental benchmark
composed of 46 questions across four computer science domains, derived from
influential research papers, and widely adopted open-source projects. Compared
to the strongest baseline tested, we achieve a 3.4times improvement in
correctly answering experimental questions.Curie is open-sourced at
https://github.com/Just-Curieous/Curie.Summary
AI-Generated Summary