NeKo:朝向後識別生成校正大型語言模型與任務導向專家前進
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
November 8, 2024
作者: Yen-Ting Lin, Chao-Han Huck Yang, Zhehuai Chen, Piotr Zelasko, Xuesong Yang, Zih-Ching Chen, Krishna C Puvvada, Szu-Wei Fu, Ke Hu, Jun Wei Chiu, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang
cs.AI
摘要
建構一個通用的後識別錯誤校正器提出了一個關鍵問題:我們如何能夠在大量混合領域數據集上最有效地訓練模型?答案在於學習數據集特定的特徵並將它們的知識融合到單一模型中。先前的方法通過擁有獨立的校正語言模型來實現這一點,這導致參數顯著增加。在這項工作中,我們提出了專家混合模型作為解決方案,強調MoEs不僅僅是一個可擴展性工具。我們提出了一種多任務校正MoE,通過訓練專家將成為“專家”的語音轉文字、語言轉文字和視覺轉文字數據集,學習將每個數據集的標記路由到其映射的專家。在Open ASR Leaderboard上的實驗表明,我們通過實現平均相對5.0%的WER降低和語音和翻譯任務的BLEU分數顯著提高,探索了一種新的最先進性能。在零-shot評估中,NeKo在Hyporadise基準測試中相對WER降低15.5%至27.6%,優於GPT-3.5和Claude-Opus。NeKo作為一個多任務模型,在語法和後OCR校正方面表現出競爭力。
English
Construction of a general-purpose post-recognition error corrector poses a
crucial question: how can we most effectively train a model on a large mixture
of domain datasets? The answer would lie in learning dataset-specific features
and digesting their knowledge in a single model. Previous methods achieve this
by having separate correction language models, resulting in a significant
increase in parameters. In this work, we present Mixture-of-Experts as a
solution, highlighting that MoEs are much more than a scalability tool. We
propose a Multi-Task Correction MoE, where we train the experts to become an
``expert'' of speech-to-text, language-to-text and vision-to-text datasets by
learning to route each dataset's tokens to its mapped expert. Experiments on
the Open ASR Leaderboard show that we explore a new state-of-the-art
performance by achieving an average relative 5.0% WER reduction and
substantial improvements in BLEU scores for speech and translation tasks. On
zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with 15.5% to
27.6% relative WER reduction in the Hyporadise benchmark. NeKo performs
competitively on grammar and post-OCR correction as a multi-task model.Summary
AI-Generated Summary