ChatPaper.aiChatPaper

在资源稀缺环境中实现跨语言音频滥用检测:少样本学习

Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning

December 2, 2024
作者: Aditya Narayan Sankaran, Reza Farahbaksh, Noel Crespi
cs.AI

摘要

在线虐待内容检测,在低资源环境中尤其是在音频模态中,仍然未被充分探索。我们研究了预训练音频表示在检测低资源语言中滥用语言的潜力,本例中是在印度语言中使用少样本学习(FSL)。利用诸如Wav2Vec和Whisper等模型的强大表示,我们使用ADIMA数据集结合FSL探索跨语言滥用检测。我们的方法将这些表示集成到模型无关元学习(MAML)框架中,以对10种语言中的滥用语言进行分类。我们尝试不同的样本量(50-200),评估有限数据对性能的影响。此外,进行了特征可视化研究,以更好地理解模型行为。这项研究突出了预训练模型在低资源情境中的泛化能力,并为在多语境中检测滥用语言提供了宝贵的见解。
English
Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.

Summary

AI-Generated Summary

PDF22December 3, 2024