ChatPaper.aiChatPaper

ReCLAP:通過描述聲音來改善零樣本音頻分類

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds

September 13, 2024
作者: Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
cs.AI

摘要

開放詞彙的語音語言模型,如CLAP,提供了一種有前途的方法,可以通過使用自然語言提示指定的任意類別集進行零樣本語音分類(ZSAC)。在本文中,我們提出了一種簡單但有效的方法來改善CLAP的ZSAC。具體而言,我們從使用具有抽象類別標籤的提示的傳統方法(例如,管風琴的聲音)轉變為使用描述聲音的固有描述特徵的提示,在多樣化的情境中描述聲音(例如,管風琴的深沉而共鳴的音調充滿了大教堂)。為了實現這一點,我們首先提出了ReCLAP,這是一個使用重寫的音頻標題訓練的CLAP模型,以改進對野外聲音的理解。這些重寫的標題描述了原始標題中的每個聲音事件,並使用它們獨特的區分特徵。ReCLAP在多模式音頻-文本檢索和ZSAC上均優於所有基準。接下來,為了改進使用ReCLAP的零樣本語音分類,我們提出了提示擴充。與傳統方法使用手寫模板提示不同,我們為數據集中的每個獨特標籤生成自定義提示。這些自定義提示首先描述標籤中的聲音事件,然後在不同場景中應用它們。我們提出的方法將ReCLAP在ZSAC上的性能提高了1%-18%,並且在所有基準上的表現提高了1% - 55%。
English
Open-vocabulary audio-language models, like CLAP, offer a promising approach for zero-shot audio classification (ZSAC) by enabling classification with any arbitrary set of categories specified with natural language prompts. In this paper, we propose a simple but effective method to improve ZSAC with CLAP. Specifically, we shift from the conventional method of using prompts with abstract category labels (e.g., Sound of an organ) to prompts that describe sounds using their inherent descriptive features in a diverse context (e.g.,The organ's deep and resonant tones filled the cathedral.). To achieve this, we first propose ReCLAP, a CLAP model trained with rewritten audio captions for improved understanding of sounds in the wild. These rewritten captions describe each sound event in the original caption using their unique discriminative characteristics. ReCLAP outperforms all baselines on both multi-modal audio-text retrieval and ZSAC. Next, to improve zero-shot audio classification with ReCLAP, we propose prompt augmentation. In contrast to the traditional method of employing hand-written template prompts, we generate custom prompts for each unique label in the dataset. These custom prompts first describe the sound event in the label and then employ them in diverse scenes. Our proposed method improves ReCLAP's performance on ZSAC by 1%-18% and outperforms all baselines by 1% - 55%.

Summary

AI-Generated Summary

PDF132November 16, 2024