Horizon-Length Prediction: 透過前瞻規劃提升代碼生成的填充中間能力
Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning
October 4, 2024
作者: Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang
cs.AI
摘要
填充中間(FIM)已經成為程式語言模型中不可或缺的部分,使得能夠根據左右內容生成缺失的程式碼。然而,目前的FIM訓練範式重新排序原始訓練序列,然後進行常規的下一令牌預測(NTP),往往導致模型難以生成與周圍上下文平滑對齊的內容。重要的是,現有的研究依賴基於規則的後處理來繞過這個弱點,但這些方法在開放域程式碼完成任務中並不實用,因為它們依賴於限制性的、特定於數據集的假設(例如,生成與實際情況中相同數量的行)。此外,在沒有這些不切實際的假設的情況下,模型在FIM任務上的表現會顯著下降。
我們假設僅靠NTP是不足以讓模型學習有效的規劃,這種規劃是基於遠端右上下文的,這是成功的程式碼填充的關鍵因素。為了克服這一點,我們提出了Horizon-Length Prediction(HLP),這是一種新穎的訓練目標,教導模型在每一步預測剩餘中間令牌的數量(即,地平線長度)。HLP通過具有前瞻性規劃來推進FIM,使模型能夠在不依賴特定於數據集的後處理的情況下,內在地學習任意左右上下文的填充邊界。我們在不同模型和尺寸上的評估顯示,HLP在各種基準測試中使FIM性能相對提高了多達24%,跨文件級和存儲庫級,並且無需使用不切實際的後處理方法。此外,通過HLP獲得的增強規劃能力提升了模型在程式碼推理上的性能。重要的是,HLP只帶來可忽略的訓練開銷,並且不會增加額外的推理成本,確保其在現實場景中的實用性。
English
Fill-in-the-Middle (FIM) has become integral to code language models,
enabling generation of missing code given both left and right contexts.
However, the current FIM training paradigm, which reorders original training
sequences and then performs regular next-token prediction (NTP), often leads to
models struggling to generate content that aligns smoothly with the surrounding
context. Crucially, while existing works rely on rule-based post-processing to
circumvent this weakness, such methods are not practically usable in
open-domain code completion tasks as they depend on restrictive,
dataset-specific assumptions (e.g., generating the same number of lines as in
the ground truth). Moreover, model performance on FIM tasks deteriorates
significantly without these unrealistic assumptions.
We hypothesize that NTP alone is insufficient for models to learn effective
planning conditioned on the distant right context, a critical factor for
successful code infilling. To overcome this, we propose Horizon-Length
Prediction (HLP), a novel training objective that teaches models to predict the
number of remaining middle tokens (i.e., horizon length) at each step. HLP
advances FIM with lookahead planning, enabling models to inherently learn
infilling boundaries for arbitrary left and right contexts without relying on
dataset-specific post-processing. Our evaluation across different models and
sizes shows that HLP significantly improves FIM performance by up to 24%
relatively on diverse benchmarks, across file-level and repository-level, and
without resorting to unrealistic post-processing methods. Furthermore, the
enhanced planning capability gained through HLP boosts model performance on
code reasoning. Importantly, HLP only incurs negligible training overhead and
no additional inference cost, ensuring its practicality for real-world
scenarios.Summary
AI-Generated Summary