AutoKaggle:一個用於自主數據科學競賽的多智能體框架
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
October 27, 2024
作者: Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tuney Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, Wanjun Zhong, Wangchunshu Zhou, Wenhao Huang, Ge Zhang
cs.AI
摘要
涉及表格數據的數據科學任務提出了複雜的挑戰,需要複雜的問題解決方法。我們提出了AutoKaggle,這是一個強大且以用戶為中心的框架,通過協作式多智能體系統協助數據科學家完成日常數據管道。AutoKaggle實現了一個迭代開發過程,結合了代碼執行、調試和全面的單元測試,以確保代碼的正確性和邏輯一致性。該框架提供高度可定製的工作流程,允許用戶在每個階段進行干預,從而將自動化智能與人類專業知識相結合。我們的通用數據科學工具包包括經過驗證的數據清理、特徵工程和建模函數,構成了該解決方案的基礎,通過簡化常見任務來提高生產力。我們選擇了8個Kaggle競賽來模擬現實應用場景中的數據處理工作流程。評估結果表明,AutoKaggle在典型的數據科學管道中實現了0.85的驗證提交率和0.82的綜合得分,充分證明了其在處理複雜數據科學任務方面的有效性和實用性。
English
Data science tasks involving tabular data present complex challenges that
require sophisticated problem-solving approaches. We propose AutoKaggle, a
powerful and user-centric framework that assists data scientists in completing
daily data pipelines through a collaborative multi-agent system. AutoKaggle
implements an iterative development process that combines code execution,
debugging, and comprehensive unit testing to ensure code correctness and logic
consistency. The framework offers highly customizable workflows, allowing users
to intervene at each phase, thus integrating automated intelligence with human
expertise. Our universal data science toolkit, comprising validated functions
for data cleaning, feature engineering, and modeling, forms the foundation of
this solution, enhancing productivity by streamlining common tasks. We selected
8 Kaggle competitions to simulate data processing workflows in real-world
application scenarios. Evaluation results demonstrate that AutoKaggle achieves
a validation submission rate of 0.85 and a comprehensive score of 0.82 in
typical data science pipelines, fully proving its effectiveness and
practicality in handling complex data science tasks.Summary
AI-Generated Summary