多智能鍛煉:多元推理鏈的自我改進
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
January 10, 2025
作者: Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Shuang Li, Igor Mordatch
cs.AI
摘要
大型語言模型(LLMs)近年來在表現上取得了顯著的成就,但基本上受到底層訓練數據的限制。為了超越訓練數據,最近的研究探討了如何利用LLMs生成合成數據以進行自主自我改進。然而,連續的自我改進步驟可能會達到收益遞減的程度。在這項工作中,我們提出了一種補充方法來進行自我改進,即對語言模型的多智能體社會應用微調。一組語言模型,全部從同一基礎模型開始,通過更新每個模型使用通過模型間多智能體互動生成的數據來獨立進行專業化。通過在獨立數據集上訓練每個模型,我們說明了這種方法如何實現模型之間的專業化和模型集合的多樣化。因此,我們的整體系統能夠保留多樣的推理鏈,並在比單一智能體自我改進方法更多輪的微調中自主改進。我們定量說明了這種方法在廣泛的推理任務套件中的有效性。
English
Large language models (LLMs) have achieved remarkable performance in recent
years but are fundamentally limited by the underlying training data. To improve
models beyond the training data, recent works have explored how LLMs can be
used to generate synthetic data for autonomous self-improvement. However,
successive steps of self-improvement can reach a point of diminishing returns.
In this work, we propose a complementary approach towards self-improvement
where finetuning is applied to a multiagent society of language models. A group
of language models, all starting from the same base model, are independently
specialized by updating each one using data generated through multiagent
interactions among the models. By training each model on independent sets of
data, we illustrate how this approach enables specialization across models and
diversification over the set of models. As a result, our overall system is able
to preserve diverse reasoning chains and autonomously improve over many more
rounds of fine-tuning than single-agent self-improvement methods. We
quantitatively illustrate the efficacy of the approach across a wide suite of
reasoning tasks.Summary
AI-Generated Summary