利用過程獎勵引導樹搜索,對大型語言模型進行集成,以改善複雜推理
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
December 20, 2024
作者: Sungjin Park, Xiao Liu, Yeyun Gong, Edward Choi
cs.AI
摘要
儘管大型語言模型近年來取得了重大進展,但開源模型在複雜推理任務上往往難以保持穩定的高效表現。現有的集成方法,無論是應用在標記還是輸出層面,都無法應對這些挑戰。為此,我們提出了一種名為具蒙特卡羅樹搜索的語言模型集成(LE-MCTS)的新框架,用於對語言模型進行過程級集成。LE-MCTS將使用一組語言模型進行逐步推理的過程形式化為馬爾可夫決策過程。在這個框架中,狀態代表中間推理路徑,而動作包括使用從預定池中選擇的語言模型之一生成下一個推理步驟。在過程基礎獎勵模型的指導下,LE-MCTS對由不同語言模型生成的推理步驟進行樹搜索,識別最準確的推理鏈。對五個數學推理基準測試的實驗結果表明,我們的方法優於單一語言模型解碼算法和語言模型集成方法。值得注意的是,LE-MCTS在MATH和MQA數據集上的表現分別提高了3.6%和4.3%,突顯了它在解決複雜推理問題上的有效性。
English
Despite recent advances in large language models, open-source models often
struggle to consistently perform well on complex reasoning tasks. Existing
ensemble methods, whether applied at the token or output levels, fail to
address these challenges. In response, we present Language model Ensemble with
Monte Carlo Tree Search (LE-MCTS), a novel framework for process-level
ensembling of language models. LE-MCTS formulates step-by-step reasoning with
an ensemble of language models as a Markov decision process. In this framework,
states represent intermediate reasoning paths, while actions consist of
generating the next reasoning step using one of the language models selected
from a predefined pool. Guided by a process-based reward model, LE-MCTS
performs a tree search over the reasoning steps generated by different language
models, identifying the most accurate reasoning chain. Experimental results on
five mathematical reasoning benchmarks demonstrate that our approach
outperforms both single language model decoding algorithms and language model
ensemble methods. Notably, LE-MCTS improves performance by 3.6% and 4.3% on the
MATH and MQA datasets, respectively, highlighting its effectiveness in solving
complex reasoning problems.Summary
AI-Generated Summary