NeKo:面向后识别生成校正的大型语言模型与面向任务的专家
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
November 8, 2024
作者: Yen-Ting Lin, Chao-Han Huck Yang, Zhehuai Chen, Piotr Zelasko, Xuesong Yang, Zih-Ching Chen, Krishna C Puvvada, Szu-Wei Fu, Ke Hu, Jun Wei Chiu, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang
cs.AI
摘要
构建一个通用的后识别错误校正器提出了一个关键问题:如何能够在大量领域数据集的混合中最有效地训练模型?答案在于学习数据集特定的特征,并将它们的知识融合到单个模型中。先前的方法通过拥有单独的校正语言模型来实现这一点,导致参数显著增加。在这项工作中,我们提出了专家混合模型作为解决方案,强调MoE远不止是一个可扩展性工具。我们提出了一种多任务校正MoE,通过训练专家成为语音到文本、语言到文本和视觉到文本数据集的“专家”,学会将每个数据集的标记路由到其映射的专家。在Open ASR排行榜上的实验表明,我们通过实现平均相对5.0%的WER降低和语音和翻译任务的BLEU分数显著提高,探索了一个新的最先进性能。在零-shot评估中,NeKo在Hyporadise基准测试中相对WER降低15.5%至27.6%,超过了GPT-3.5和Claude-Opus。NeKo作为一个多任务模型在语法和后OCR校正方面表现出竞争力。
English
Construction of a general-purpose post-recognition error corrector poses a
crucial question: how can we most effectively train a model on a large mixture
of domain datasets? The answer would lie in learning dataset-specific features
and digesting their knowledge in a single model. Previous methods achieve this
by having separate correction language models, resulting in a significant
increase in parameters. In this work, we present Mixture-of-Experts as a
solution, highlighting that MoEs are much more than a scalability tool. We
propose a Multi-Task Correction MoE, where we train the experts to become an
``expert'' of speech-to-text, language-to-text and vision-to-text datasets by
learning to route each dataset's tokens to its mapped expert. Experiments on
the Open ASR Leaderboard show that we explore a new state-of-the-art
performance by achieving an average relative 5.0% WER reduction and
substantial improvements in BLEU scores for speech and translation tasks. On
zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with 15.5% to
27.6% relative WER reduction in the Hyporadise benchmark. NeKo performs
competitively on grammar and post-OCR correction as a multi-task model.Summary
AI-Generated Summary