以LLM為基礎的字素轉音轉換:基準測試與案例研究
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study
September 13, 2024
作者: Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee
cs.AI
摘要
Grapheme-to-phoneme (G2P) 轉換在語音處理中至關重要,尤其對於語音合成等應用。G2P 系統必須具備對多音詞和上下文相依音素的語言理解和上下文意識。近來,大型語言模型(LLMs)在各種語言任務中展現了顯著潛力,表明它們的語音知識可以用於 G2P。本文評估了LLMs在G2P轉換中的表現,並介紹了促使和後處理方法,可增強LLMs的輸出,而無需額外訓練或標記數據。我們還提出了一個基準數據集,旨在評估對波斯語句子級語音挑戰的G2P表現。我們的結果顯示,通過應用所提出的方法,LLMs可以在波斯語等少有代表性的語言中,超越傳統的G2P工具,突顯了開發LLM輔助的G2P系統的潛力。
English
Grapheme-to-phoneme (G2P) conversion is critical in speech processing,
particularly for applications like speech synthesis. G2P systems must possess
linguistic understanding and contextual awareness of languages with polyphone
words and context-dependent phonemes. Large language models (LLMs) have
recently demonstrated significant potential in various language tasks,
suggesting that their phonetic knowledge could be leveraged for G2P. In this
paper, we evaluate the performance of LLMs in G2P conversion and introduce
prompting and post-processing methods that enhance LLM outputs without
additional training or labeled data. We also present a benchmarking dataset
designed to assess G2P performance on sentence-level phonetic challenges of the
Persian language. Our results show that by applying the proposed methods, LLMs
can outperform traditional G2P tools, even in an underrepresented language like
Persian, highlighting the potential of developing LLM-aided G2P systems.Summary
AI-Generated Summary