CODESYNC:大规模语言模型与动态代码演化的同步机制
CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale
February 23, 2025
作者: Chenlong Wang, Zhaoyang Chu, Zhengxiang Cheng, Xuyi Yang, Kaiyue Qiu, Yao Wan, Zhou Zhao, Xuanhua Shi, Dongping Chen
cs.AI
摘要
大型语言模型(LLMs)在软件工程领域展现了卓越的性能,但在适应持续演进的代码知识方面仍面临挑战,尤其是针对第三方库API的频繁更新。这一局限源于静态预训练数据集,常导致生成不可执行的代码或实现安全性及效率欠佳。为此,本文提出了CODESYNC,一个用于识别过时代码模式并从Python第三方库中实时收集代码知识更新的数据引擎。基于CODESYNC,我们开发了CODESYNCBENCH,这是一个全面评估LLMs保持与代码进化同步能力的基准测试,涵盖了来自六个Python库的220个API的真实世界更新。我们的基准测试提供了跨三个评估任务的3,300个测试案例,以及一个包含2,200个训练样本的更新感知指令调优数据集。对14个顶尖LLMs的广泛实验表明,即便在先进知识更新方法(如DPO、ORPO和SimPO)的支持下,它们仍难以应对动态代码进化。我们相信,我们的基准测试能为未来开发更有效的实时代码知识更新方法奠定坚实基础。实验代码与数据集已公开于:https://github.com/Lucky-voyage/Code-Sync。
English
Large Language Models (LLMs) have exhibited exceptional performance in
software engineering yet face challenges in adapting to continually evolving
code knowledge, particularly regarding the frequent updates of third-party
library APIs. This limitation, stemming from static pre-training datasets,
often results in non-executable code or implementations with suboptimal safety
and efficiency. To this end, this paper introduces CODESYNC, a data engine for
identifying outdated code patterns and collecting real-time code knowledge
updates from Python third-party libraries. Building upon CODESYNC, we develop
CODESYNCBENCH, a comprehensive benchmark for assessing LLMs' ability to stay
synchronized with code evolution, which covers real-world updates for 220 APIs
from six Python libraries. Our benchmark offers 3,300 test cases across three
evaluation tasks and an update-aware instruction tuning dataset consisting of
2,200 training samples. Extensive experiments on 14 state-of-the-art LLMs
reveal that they struggle with dynamic code evolution, even with the support of
advanced knowledge updating methods (e.g., DPO, ORPO, and SimPO). We believe
that our benchmark can offer a strong foundation for the development of more
effective methods for real-time code knowledge updating in the future. The
experimental code and dataset are publicly available at:
https://github.com/Lucky-voyage/Code-Sync.Summary
AI-Generated Summary