向前失敗：利用合成數據和檢索增強改進語音識別的生成式錯誤修正

摘要

生成式錯誤修正（GEC）已成為一種強大的後處理方法，可提升自動語音識別（ASR）系統的性能。然而，我們發現GEC模型在訓練期間遇到的特定錯誤類型之外很難進行泛化，限制了它們在測試時修正新的、未見過的錯誤的能力，特別是在域外（OOD）情況下。這種現象在命名實體（NEs）方面尤為明顯，除了缺乏有關NEs的上下文信息或知識外，還不斷出現新的NEs。為解決這些問題，我們提出了DARAG（數據和檢索增強生成式錯誤修正），這是一種旨在改進ASR中GEC在域內（ID）和OOD情況下的方法。我們通過提示LLMs和文本轉語音模型生成的合成數據來擴充GEC訓練數據集，從而模擬模型可以學習的額外錯誤。對於OOD情況，我們以相似且無監督的方式模擬新域的測試時錯誤。此外，為了更好地處理命名實體，我們引入了檢索增強校正，通過從數據庫檢索的實體來擴充輸入。我們的方法簡單、可擴展，且不受域和語言的限制。我們在多個數據集和設置上進行實驗，結果顯示DARAG優於所有基準，ID情況下相對WER改進8％-30％，OOD情況下改進10％-33％。

English

Generative Error Correction (GEC) has emerged as a powerful post-processing method to enhance the performance of Automatic Speech Recognition (ASR) systems. However, we show that GEC models struggle to generalize beyond the specific types of errors encountered during training, limiting their ability to correct new, unseen errors at test time, particularly in out-of-domain (OOD) scenarios. This phenomenon amplifies with named entities (NEs), where, in addition to insufficient contextual information or knowledge about the NEs, novel NEs keep emerging. To address these issues, we propose DARAG (Data- and Retrieval-Augmented Generative Error Correction), a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios. We augment the GEC training dataset with synthetic data generated by prompting LLMs and text-to-speech models, thereby simulating additional errors from which the model can learn. For OOD scenarios, we simulate test-time errors from new domains similarly and in an unsupervised fashion. Additionally, to better handle named entities, we introduce retrieval-augmented correction by augmenting the input with entities retrieved from a database. Our approach is simple, scalable, and both domain- and language-agnostic. We experiment on multiple datasets and settings, showing that DARAG outperforms all our baselines, achieving 8\% -- 30\% relative WER improvements in ID and 10\% -- 33\% improvements in OOD settings.

向前失敗：利用合成數據和檢索增強改進語音識別的生成式錯誤修正

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

摘要

Summary

Support

Support