AIDE:代码空间中的AI驱动探索
AIDE: AI-Driven Exploration in the Space of Code
February 18, 2025
作者: Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, Yuxiang Wu
cs.AI
摘要
作为现代人工智能基石,机器学习推动了彻底改变世界的创新。然而,在这些进步背后,隐藏着一个复杂且往往繁琐的过程,需要大量人力和计算资源进行迭代与实验。开发机器学习模型的工程师和科学家们,将大量时间耗费在试错任务上,而非构思创新解决方案或研究假设。为应对这一挑战,我们推出了AI驱动探索(AIDE),一个由大型语言模型(LLMs)赋能的机器学习工程代理。AIDE将机器学习工程视为代码优化问题,并将试错过程构建为潜在解决方案空间中的树搜索。通过策略性地复用和精炼有前景的解决方案,AIDE有效地以计算资源换取性能提升,在包括我们的Kaggle评估、OpenAI MLE-Bench和METRs RE-Bench在内的多个机器学习工程基准测试中,均取得了业界领先的成绩。
English
Machine learning, the foundation of modern artificial intelligence, has
driven innovations that have fundamentally transformed the world. Yet, behind
advancements lies a complex and often tedious process requiring labor and
compute intensive iteration and experimentation. Engineers and scientists
developing machine learning models spend much of their time on trial-and-error
tasks instead of conceptualizing innovative solutions or research hypotheses.
To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine
learning engineering agent powered by large language models (LLMs). AIDE frames
machine learning engineering as a code optimization problem, and formulates
trial-and-error as a tree search in the space of potential solutions. By
strategically reusing and refining promising solutions, AIDE effectively trades
computational resources for enhanced performance, achieving state-of-the-art
results on multiple machine learning engineering benchmarks, including our
Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.Summary
AI-Generated Summary