向前失敗:利用合成數據和檢索增強改進語音識別的生成式錯誤修正
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
October 17, 2024
作者: Sreyan Ghosh, Mohammad Sadegh Rasooli, Michael Levit, Peidong Wang, Jian Xue, Dinesh Manocha, Jinyu Li
cs.AI
摘要
生成式錯誤修正(GEC)已成為一種強大的後處理方法,可提升自動語音識別(ASR)系統的性能。然而,我們發現GEC模型在訓練期間遇到的特定錯誤類型之外很難進行泛化,限制了它們在測試時修正新的、未見過的錯誤的能力,特別是在域外(OOD)情況下。這種現象在命名實體(NEs)方面尤為明顯,除了缺乏有關NEs的上下文信息或知識外,還不斷出現新的NEs。為解決這些問題,我們提出了DARAG(數據和檢索增強生成式錯誤修正),這是一種旨在改進ASR中GEC在域內(ID)和OOD情況下的方法。我們通過提示LLMs和文本轉語音模型生成的合成數據來擴充GEC訓練數據集,從而模擬模型可以學習的額外錯誤。對於OOD情況,我們以相似且無監督的方式模擬新域的測試時錯誤。此外,為了更好地處理命名實體,我們引入了檢索增強校正,通過從數據庫檢索的實體來擴充輸入。我們的方法簡單、可擴展,且不受域和語言的限制。我們在多個數據集和設置上進行實驗,結果顯示DARAG優於所有基準,ID情況下相對WER改進8%-30%,OOD情況下改進10%-33%。
English
Generative Error Correction (GEC) has emerged as a powerful post-processing
method to enhance the performance of Automatic Speech Recognition (ASR)
systems. However, we show that GEC models struggle to generalize beyond the
specific types of errors encountered during training, limiting their ability to
correct new, unseen errors at test time, particularly in out-of-domain (OOD)
scenarios. This phenomenon amplifies with named entities (NEs), where, in
addition to insufficient contextual information or knowledge about the NEs,
novel NEs keep emerging. To address these issues, we propose DARAG (Data- and
Retrieval-Augmented Generative Error Correction), a novel approach designed to
improve GEC for ASR in in-domain (ID) and OOD scenarios. We augment the GEC
training dataset with synthetic data generated by prompting LLMs and
text-to-speech models, thereby simulating additional errors from which the
model can learn. For OOD scenarios, we simulate test-time errors from new
domains similarly and in an unsupervised fashion. Additionally, to better
handle named entities, we introduce retrieval-augmented correction by
augmenting the input with entities retrieved from a database. Our approach is
simple, scalable, and both domain- and language-agnostic. We experiment on
multiple datasets and settings, showing that DARAG outperforms all our
baselines, achieving 8\% -- 30\% relative WER improvements in ID and 10\% --
33\% improvements in OOD settings.Summary
AI-Generated Summary