AIDE：代码空间中的AI驱动探索

摘要

作为现代人工智能基石，机器学习推动了彻底改变世界的创新。然而，在这些进步背后，隐藏着一个复杂且往往繁琐的过程，需要大量人力和计算资源进行迭代与实验。开发机器学习模型的工程师和科学家们，将大量时间耗费在试错任务上，而非构思创新解决方案或研究假设。为应对这一挑战，我们推出了AI驱动探索（AIDE），一个由大型语言模型（LLMs）赋能的机器学习工程代理。AIDE将机器学习工程视为代码优化问题，并将试错过程构建为潜在解决方案空间中的树搜索。通过策略性地复用和精炼有前景的解决方案，AIDE有效地以计算资源换取性能提升，在包括我们的Kaggle评估、OpenAI MLE-Bench和METRs RE-Bench在内的多个机器学习工程基准测试中，均取得了业界领先的成绩。

English

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

AIDE：代码空间中的AI驱动探索

AIDE: AI-Driven Exploration in the Space of Code

摘要

Summary

Support