多智能体微调：多样推理链的自我改进

摘要

大型语言模型（LLMs）近年来取得了显著的性能，但基本上受到底层训练数据的限制。为了改进模型超越训练数据，最近的研究探讨了LLMs如何生成合成数据以进行自主自我改进。然而，连续的自我改进步骤可能会达到收益递减的点。在这项工作中，我们提出了一种辅助自我改进的方法，即对语言模型的多智能体社会应用微调。一组语言模型，都从同一基础模型开始，通过更新每个模型使用多智能体之间相互作用生成的数据来独立专门化。通过在独立数据集上训练每个模型，我们阐明了这种方法如何实现模型间的专业化和模型集合的多样化。因此，我们的整体系统能够保留多样化的推理链，并在比单一智能体自我改进方法更多轮的微调中自主改进。我们定量地展示了这种方法在广泛的推理任务中的有效性。

English

Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. A group of language models, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result, our overall system is able to preserve diverse reasoning chains and autonomously improve over many more rounds of fine-tuning than single-agent self-improvement methods. We quantitatively illustrate the efficacy of the approach across a wide suite of reasoning tasks.

多智能体微调：多样推理链的自我改进

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

摘要

Support