MutaGReP：基于代码库的无执行计划搜索

摘要

当人类请求大型语言模型（LLM）利用大型代码库中的功能完成编程任务时，我们如何向LLM提供代码库的上下文？一种方法是将整个代码库添加到LLM的上下文窗口中。然而，大多数任务仅涉及代码库中的一小部分符号，过长的上下文会损害LLM的推理能力，且上下文窗口并非无限。另一种方法是模拟人类在大型代码库中导航、挑选合适功能并制定任务解决计划的能力。我们提出了MutaGReP（基于变异的代码库计划搜索），这是一种搜索计划的方法，将用户请求分解为基于代码库的自然语言步骤。MutaGReP在计划空间中进行神经树搜索，通过变异计划进行探索，并使用符号检索器进行基础构建。在具有挑战性的LongCodeArena基准测试中，我们的计划仅使用了GPT-4o 128K上下文窗口的不到5%，但其编码性能却与填满代码库上下文的GPT-4o相当。MutaGReP生成的计划使Qwen 2.5 Coder 32B和72B能够与具有完整代码库上下文的GPT-4o性能相媲美，并在最难的LongCodeArena任务上取得进展。项目页面：zaidkhan.me/MutaGReP。

English

When a human requests an LLM to complete a coding task using functionality from a large code repository, how do we provide context from the repo to the LLM? One approach is to add the entire repo to the LLM's context window. However, most tasks involve only fraction of symbols from a repo, longer contexts are detrimental to the LLM's reasoning abilities, and context windows are not unlimited. Alternatively, we could emulate the human ability to navigate a large repo, pick out the right functionality, and form a plan to solve the task. We propose MutaGReP (Mutation-guided Grounded Repository Plan Search), an approach to search for plans that decompose a user request into natural language steps grounded in the codebase. MutaGReP performs neural tree search in plan space, exploring by mutating plans and using a symbol retriever for grounding. On the challenging LongCodeArena benchmark, our plans use less than 5% of the 128K context window for GPT-4o but rival the coding performance of GPT-4o with a context window filled with the repo. Plans produced by MutaGReP allow Qwen 2.5 Coder 32B and 72B to match the performance of GPT-4o with full repo context and enable progress on the hardest LongCodeArena tasks. Project page: zaidkhan.me/MutaGReP

MutaGReP：基于代码库的无执行计划搜索

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

摘要

Summary

Support

Support