NeKo：面向后识别生成校正的大型语言模型与面向任务的专家

摘要

构建一个通用的后识别错误校正器提出了一个关键问题：如何能够在大量领域数据集的混合中最有效地训练模型？答案在于学习数据集特定的特征，并将它们的知识融合到单个模型中。先前的方法通过拥有单独的校正语言模型来实现这一点，导致参数显著增加。在这项工作中，我们提出了专家混合模型作为解决方案，强调MoE远不止是一个可扩展性工具。我们提出了一种多任务校正MoE，通过训练专家成为语音到文本、语言到文本和视觉到文本数据集的“专家”，学会将每个数据集的标记路由到其映射的专家。在Open ASR排行榜上的实验表明，我们通过实现平均相对5.0%的WER降低和语音和翻译任务的BLEU分数显著提高，探索了一个新的最先进性能。在零-shot评估中，NeKo在Hyporadise基准测试中相对WER降低15.5%至27.6%，超过了GPT-3.5和Claude-Opus。NeKo作为一个多任务模型在语法和后OCR校正方面表现出竞争力。

English

Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative 5.0% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with 15.5% to 27.6% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.

NeKo：面向后识别生成校正的大型语言模型与面向任务的专家

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

摘要

Summary

Support

Support