GitChameleon:揭示程式碼生成模型的版本切換能力
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
November 5, 2024
作者: Nizar Islah, Justine Gehring, Diganta Misra, Eilif Muller, Irina Rish, Terry Yue Zhuo, Massimo Caccia
cs.AI
摘要
軟體庫的快速演進對程式碼生成模型構成重大挑戰,這些模型必須適應頻繁的版本更新,同時保持與先前版本的兼容性。現有的程式碼完成基準往往忽略了這種動態方面,而唯一考慮到這一點的基準則依賴於沒有基於執行的評估的靜態程式碼預測任務,這提供了對模型實際可用性的有限觀點。為了填補這一空白,我們引入了 \GitChameleon{},這是一個新穎的、手工編纂的資料集,包含 116 個 Python 程式碼完成問題,每個問題都取決於特定的庫版本,並附帶可執行的單元測試。旨在嚴格評估現代大型語言模型 (LLMs) 生成特定版本程式碼的能力,這些程式碼不僅在語法上正確,而且在執行時也具有功能準確性。我們的全面評估顯示,最先進的 LLMs 在這項任務上遇到困難;例如,GPT-4o 的 pass@10 只有 39.9\%(當提供錯誤反饋時為 43.7\%),突顯了問題的複雜性和目前模型的限制。通過提供一個強調程式庫代碼動態性質的基準,\GitChameleon{} 是推動更具適應性和可靠性的程式碼生成模型發展的關鍵工具。為了進一步探索版本條件下的程式碼生成,我們將我們的程式碼存儲庫公開放在 https://github.com/NizarIslah/GitChameleon。
English
The rapid evolution of software libraries presents a significant challenge
for code generation models, which must adapt to frequent version updates while
maintaining compatibility with previous versions. Existing code completion
benchmarks often overlook this dynamic aspect, and the one that does consider
it relies on static code prediction tasks without execution-based evaluation,
offering a limited perspective on a model's practical usability. To address
this gap, we introduce \GitChameleon{}, a novel, manually curated
dataset comprising 116 Python code completion problems, each conditioned on
specific library versions and accompanied by executable unit tests.
is designed to rigorously assess the ability of modern large
language models (LLMs) to generate version-specific code that is not only
syntactically correct but also functionally accurate upon execution. Our
comprehensive evaluations reveal that state-of-the-art LLMs struggle with this
task; for instance, GPT-4o achieves a pass@10 of only 39.9\% (43.7\%
when provided with error feedback), highlighting the complexity of the problem
and the limitations of current models. By providing an execution-based
benchmark that emphasizes the dynamic nature of code libraries,
serves as a critical tool to advance the development of more adaptable and
reliable code generation models. For facilitation for further exploration of
version-conditioned code generation, we make our code repository publicly
accessible at https://github.com/NizarIslah/GitChameleon.Summary
AI-Generated Summary