AutoKaggle:一个用于自主数据科学竞赛的多智能体框架
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
October 27, 2024
作者: Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tuney Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, Wanjun Zhong, Wangchunshu Zhou, Wenhao Huang, Ge Zhang
cs.AI
摘要
涉及表格数据的数据科学任务提出了复杂的挑战,需要复杂的问题解决方法。我们提出了AutoKaggle,这是一个强大且以用户为中心的框架,可通过协作式多智能体系统帮助数据科学家完成日常数据流程。AutoKaggle实施了一个迭代开发过程,结合了代码执行、调试和全面的单元测试,以确保代码的正确性和逻辑一致性。该框架提供高度可定制的工作流程,允许用户在每个阶段进行干预,从而将自动化智能与人类专业知识整合在一起。我们的通用数据科学工具包包括经过验证的数据清洗、特征工程和建模函数,构成了这一解决方案的基础,通过简化常见任务来提高生产率。我们选择了8个Kaggle竞赛来模拟真实应用场景中的数据处理工作流程。评估结果表明,AutoKaggle在典型的数据科学流程中实现了0.85的验证提交率和0.82的综合分数,充分证明了它在处理复杂数据科学任务方面的有效性和实用性。
English
Data science tasks involving tabular data present complex challenges that
require sophisticated problem-solving approaches. We propose AutoKaggle, a
powerful and user-centric framework that assists data scientists in completing
daily data pipelines through a collaborative multi-agent system. AutoKaggle
implements an iterative development process that combines code execution,
debugging, and comprehensive unit testing to ensure code correctness and logic
consistency. The framework offers highly customizable workflows, allowing users
to intervene at each phase, thus integrating automated intelligence with human
expertise. Our universal data science toolkit, comprising validated functions
for data cleaning, feature engineering, and modeling, forms the foundation of
this solution, enhancing productivity by streamlining common tasks. We selected
8 Kaggle competitions to simulate data processing workflows in real-world
application scenarios. Evaluation results demonstrate that AutoKaggle achieves
a validation submission rate of 0.85 and a comprehensive score of 0.82 in
typical data science pipelines, fully proving its effectiveness and
practicality in handling complex data science tasks.Summary
AI-Generated Summary