Horizon-Length Prediction: 透過前瞻規劃提升代碼生成的填充中間能力

摘要

填充中間（FIM）已經成為程式語言模型中不可或缺的部分，使得能夠根據左右內容生成缺失的程式碼。然而，目前的FIM訓練範式重新排序原始訓練序列，然後進行常規的下一令牌預測（NTP），往往導致模型難以生成與周圍上下文平滑對齊的內容。重要的是，現有的研究依賴基於規則的後處理來繞過這個弱點，但這些方法在開放域程式碼完成任務中並不實用，因為它們依賴於限制性的、特定於數據集的假設（例如，生成與實際情況中相同數量的行）。此外，在沒有這些不切實際的假設的情況下，模型在FIM任務上的表現會顯著下降。我們假設僅靠NTP是不足以讓模型學習有效的規劃，這種規劃是基於遠端右上下文的，這是成功的程式碼填充的關鍵因素。為了克服這一點，我們提出了Horizon-Length Prediction（HLP），這是一種新穎的訓練目標，教導模型在每一步預測剩餘中間令牌的數量（即，地平線長度）。HLP通過具有前瞻性規劃來推進FIM，使模型能夠在不依賴特定於數據集的後處理的情況下，內在地學習任意左右上下文的填充邊界。我們在不同模型和尺寸上的評估顯示，HLP在各種基準測試中使FIM性能相對提高了多達24％，跨文件級和存儲庫級，並且無需使用不切實際的後處理方法。此外，通過HLP獲得的增強規劃能力提升了模型在程式碼推理上的性能。重要的是，HLP只帶來可忽略的訓練開銷，並且不會增加額外的推理成本，確保其在現實場景中的實用性。

English

Fill-in-the-Middle (FIM) has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm, which reorders original training sequences and then performs regular next-token prediction (NTP), often leads to models struggling to generate content that aligns smoothly with the surrounding context. Crucially, while existing works rely on rule-based post-processing to circumvent this weakness, such methods are not practically usable in open-domain code completion tasks as they depend on restrictive, dataset-specific assumptions (e.g., generating the same number of lines as in the ground truth). Moreover, model performance on FIM tasks deteriorates significantly without these unrealistic assumptions. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens (i.e., horizon length) at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different models and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level, and without resorting to unrealistic post-processing methods. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP only incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.

Horizon-Length Prediction: 透過前瞻規劃提升代碼生成的填充中間能力

Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning

摘要

Summary

Support

Support