ChatPaper.aiChatPaper

符号专家混合模型:面向异构推理的自适应技能路由

Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

March 7, 2025
作者: Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, Mohit Bansal
cs.AI

摘要

结合现有的预训练专家大语言模型(LLMs)是应对大规模多样化任务的一条极具前景的路径。然而,在任务层面选择专家往往过于粗放,因为异构任务可能对每个实例需要不同的专业知识。为了实现预训练LLM专家在实例层面的自适应混合,我们提出了Symbolic-MoE,一个基于符号、文本且无需梯度的专家混合框架。Symbolic-MoE采用细粒度选择方法,强调技能,例如数学中的代数或生物医学推理中的分子生物学。我们提出了一种基于技能的招募策略,根据专家的优势动态选择最相关的一组专家LLMs来处理多样化的推理任务。每个被选中的专家随后生成自己的推理,从而产生k个专家的k个输出,这些输出随后由一个基于其整合多样化推理输出能力选择的聚合器综合成一个最终的高质量响应。我们展示了Symbolic-MoE在实例层面的专家选择大幅提升了性能,但若简单实现,可能会因频繁的模型加载与卸载引入高计算开销。为解决这一问题,我们实施了批处理推理策略,根据分配的专家对实例进行分组,每个模型仅加载一次。这使得我们能够在1个GPU上集成16个专家模型,其时间成本与使用4个GPU的先前多智能体基线相当或更优。通过对多样化基准(MMLU-Pro、GPQA、AIME和MedMCQA)的广泛评估,我们证明Symbolic-MoE超越了如GPT4o-mini等强LLMs以及多智能体方法,相较于最佳多智能体基线,平均绝对提升达到8.15%。此外,Symbolic-MoE无需昂贵的多轮讨论,以更少的计算量超越了讨论基线。
English
Combining existing pre-trained expert LLMs is a promising avenue for scalably tackling large-scale and diverse tasks. However, selecting experts at the task level is often too coarse-grained, as heterogeneous tasks may require different expertise for each instance. To enable adaptive instance-level mixing of pre-trained LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. Symbolic-MoE takes a fine-grained approach to selection by emphasizing skills, e.g., algebra in math or molecular biology in biomedical reasoning. We propose a skill-based recruiting strategy that dynamically selects the most relevant set of expert LLMs for diverse reasoning tasks based on their strengths. Each selected expert then generates its own reasoning, resulting in k outputs from k experts, which are then synthesized into a final high-quality response by an aggregator chosen based on its ability to integrate diverse reasoning outputs. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead due to the need for constant model loading and offloading. To address this, we implement a batch inference strategy that groups instances based on their assigned experts, loading each model only once. This allows us to integrate 16 expert models on 1 GPU with a time cost comparable to or better than prior multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), we demonstrate that Symbolic-MoE outperforms strong LLMs like GPT4o-mini, as well as multi-agent approaches, with an absolute average improvement of 8.15% over the best multi-agent baseline. Moreover, Symbolic-MoE removes the need for expensive multi-round discussions, outperforming discussion baselines with less computation.

Summary

AI-Generated Summary

PDF12March 11, 2025