プロセス報酬誘導木探索を用いた大規模言語モデルのアンサンブルによる複雑な推論の向上

要旨

最近の大規模言語モデルの進歩にもかかわらず、オープンソースモデルはしばしば複雑な推論タスクで一貫して高い性能を発揮することが難しい。既存のアンサンブル手法は、トークンレベルまたは出力レベルで適用されていても、これらの課題に対処できない。この課題に対処するために、我々はMonte Carlo Tree Search（MCTS）を用いた言語モデルアンサンブルであるLE-MCTSを提案する。LE-MCTSは、言語モデルのアンサンブルによるプロセスレベルのアンサンブリングのための革新的なフレームワークである。LE-MCTSは、ステップごとの推論をマルコフ決定過程として表現する。このフレームワークでは、状態は中間推論経路を表し、アクションは事前に定義されたプールから選択された言語モデルを使用して次の推論ステップを生成することから構成される。プロセスベースの報酬モデルによって導かれ、LE-MCTSは異なる言語モデルによって生成された推論ステップに対して木探索を行い、最も正確な推論チェーンを特定する。5つの数学的推論ベンチマークでの実験結果は、当該手法が単一言語モデルデコーディングアルゴリズムおよび言語モデルアンサンブル手法を上回ることを示している。特に、LE-MCTSは、MATHデータセットとMQAデータセットにおいてそれぞれ3.6%および4.3%の性能向上を達成し、複雑な推論問題の解決能力を高めていることが強調されている。

English

Despite recent advances in large language models, open-source models often struggle to consistently perform well on complex reasoning tasks. Existing ensemble methods, whether applied at the token or output levels, fail to address these challenges. In response, we present Language model Ensemble with Monte Carlo Tree Search (LE-MCTS), a novel framework for process-level ensembling of language models. LE-MCTS formulates step-by-step reasoning with an ensemble of language models as a Markov decision process. In this framework, states represent intermediate reasoning paths, while actions consist of generating the next reasoning step using one of the language models selected from a predefined pool. Guided by a process-based reward model, LE-MCTS performs a tree search over the reasoning steps generated by different language models, identifying the most accurate reasoning chain. Experimental results on five mathematical reasoning benchmarks demonstrate that our approach outperforms both single language model decoding algorithms and language model ensemble methods. Notably, LE-MCTS improves performance by 3.6% and 4.3% on the MATH and MQA datasets, respectively, highlighting its effectiveness in solving complex reasoning problems.

プロセス報酬誘導木探索を用いた大規模言語モデルのアンサンブルによる複雑な推論の向上

Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

要旨

Summary

Support

Support