基於自適應加權拒絕採樣的語言模型快速控制生成

摘要

在生成受某些約束的語言模型時，主導方法是局部約束解碼（LCD），即在每個時間步逐步採樣詞元，確保約束從未被違反。通常，這通過詞元遮罩實現：遍歷詞彙表並排除不符合條件的詞元。這種方法存在兩個重要問題：(i) 對每個詞元評估約束可能成本過高——語言模型的詞彙表通常超過100,000個詞元。(ii) LCD可能扭曲字符串的全局分佈，僅基於局部信息採樣詞元，即使這些詞元可能導致死胡同。本研究引入了一種新算法，解決了這兩個問題。首先，為了避免在生成的每一步對完整詞彙表進行約束評估，我們提出了一種自適應拒絕採樣算法，通常需要數量級更少的約束評估。其次，我們展示了如何以極小的額外成本擴展該算法，以生成低方差、無偏的重要性權重估計——這些估計可以安全地用於先前提出的序列蒙特卡羅算法中，以糾正局部約束執行的短視行為。通過在文本到SQL、分子合成、目標推斷、模式匹配和JSON領域的廣泛實證評估，我們展示了我們的方法優於最先進的基線，支持更廣泛的約束類別，並提高了運行時性能和效果。額外的理論和實證分析表明，我們方法的運行時效率得益於其動態計算使用，隨著無約束和約束語言模型之間的差異而擴展，因此，對於更好的模型，運行時改進更為顯著。

English

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is achieved through token masking: looping over the vocabulary and excluding non-conforming tokens. There are two important problems with this approach. (i) Evaluating the constraint on every token can be prohibitively expensive -- LM vocabularies often exceed 100,000 tokens. (ii) LCD can distort the global distribution over strings, sampling tokens based only on local information, even if they lead down dead-end paths. This work introduces a new algorithm that addresses both these problems. First, to avoid evaluating a constraint on the full vocabulary at each step of generation, we propose an adaptive rejection sampling algorithm that typically requires orders of magnitude fewer constraint evaluations. Second, we show how this algorithm can be extended to produce low-variance, unbiased estimates of importance weights at a very small additional cost -- estimates that can be soundly used within previously proposed sequential Monte Carlo algorithms to correct for the myopic behavior of local constraint enforcement. Through extensive empirical evaluation in text-to-SQL, molecular synthesis, goal inference, pattern matching, and JSON domains, we show that our approach is superior to state-of-the-art baselines, supporting a broader class of constraints and improving both runtime and performance. Additional theoretical and empirical analyses show that our method's runtime efficiency is driven by its dynamic use of computation, scaling with the divergence between the unconstrained and constrained LM, and as a consequence, runtime improvements are greater for better models.

基於自適應加權拒絕採樣的語言模型快速控制生成

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

摘要

Summary

Support

Support