マルチエージェントファインチューニング：多様な推論チェーンによる自己改善

要旨

近年、大規模言語モデル（LLMs）は顕著な性能を達成していますが、根底にある訓練データによって基本的に制限されています。訓練データを超えてモデルを改善するために、最近の研究では、LLMsが合成データを生成して自律的な自己改善を行う方法が探求されています。ただし、自己改善の連続する段階は収益の限界に達する可能性があります。本研究では、自己改善に向けた補完的なアプローチを提案します。ここでは、多様なエージェント社会の言語モデルにファインチューニングを適用する方法を提案します。同じ基本モデルから始まる一群の言語モデルが、各モデルを更新して独自のデータを生成することで独立して特化されます。各モデルを独立したデータセットで訓練することにより、このアプローチがモデル間での特化とモデルセット全体での多様化を可能にする方法を示します。その結果、全体システムは多様な推論チェーンを維持し、単一エージェントの自己改善方法よりも多くのファインチューニングラウンドで自律的に改善することができます。我々は、幅広い推論タスクにわたってこのアプローチの効果を定量的に示しています。

English

Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. A group of language models, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result, our overall system is able to preserve diverse reasoning chains and autonomously improve over many more rounds of fine-tuning than single-agent self-improvement methods. We quantitatively illustrate the efficacy of the approach across a wide suite of reasoning tasks.

マルチエージェントファインチューニング：多様な推論チェーンによる自己改善

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

要旨

Support