GitChameleon:揭示代码生成模型的版本切换能力
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
November 5, 2024
作者: Nizar Islah, Justine Gehring, Diganta Misra, Eilif Muller, Irina Rish, Terry Yue Zhuo, Massimo Caccia
cs.AI
摘要
软件库的快速演进对代码生成模型构成重大挑战,这些模型必须适应频繁的版本更新,同时保持与先前版本的兼容性。现有的代码补全基准往往忽视这种动态方面,唯一考虑到这一点的基准依赖于静态代码预测任务,没有执行评估,这提供了对模型实际可用性的有限视角。为了填补这一空白,我们引入了\GitChameleon{},这是一个新颖的、手工策划的数据集,包括116个Python代码补全问题,每个问题都取决于特定的库版本,并附带可执行的单元测试。旨在严格评估现代大型语言模型(LLMs)生成特定版本代码的能力,这些代码不仅在语法上正确,而且在执行时也具有功能准确性。我们的全面评估显示,最先进的LLMs在这项任务上面临困难;例如,GPT-4o仅实现了39.9\%的pass@10(在提供错误反馈时为43.7\%),突显了问题的复杂性和当前模型的局限性。通过提供一个强调代码库动态性质的基于执行的基准,\GitChameleon{}作为推动更具适应性和可靠性的代码生成模型发展的关键工具。为了促进对版本条件代码生成的进一步探索,我们将我们的代码存储库公开放置在https://github.com/NizarIslah/GitChameleon。
English
The rapid evolution of software libraries presents a significant challenge
for code generation models, which must adapt to frequent version updates while
maintaining compatibility with previous versions. Existing code completion
benchmarks often overlook this dynamic aspect, and the one that does consider
it relies on static code prediction tasks without execution-based evaluation,
offering a limited perspective on a model's practical usability. To address
this gap, we introduce \GitChameleon{}, a novel, manually curated
dataset comprising 116 Python code completion problems, each conditioned on
specific library versions and accompanied by executable unit tests.
is designed to rigorously assess the ability of modern large
language models (LLMs) to generate version-specific code that is not only
syntactically correct but also functionally accurate upon execution. Our
comprehensive evaluations reveal that state-of-the-art LLMs struggle with this
task; for instance, GPT-4o achieves a pass@10 of only 39.9\% (43.7\%
when provided with error feedback), highlighting the complexity of the problem
and the limitations of current models. By providing an execution-based
benchmark that emphasizes the dynamic nature of code libraries,
serves as a critical tool to advance the development of more adaptable and
reliable code generation models. For facilitation for further exploration of
version-conditioned code generation, we make our code repository publicly
accessible at https://github.com/NizarIslah/GitChameleon.Summary
AI-Generated Summary