ChatPaper.aiChatPaper

AAD-LLM:基于神经注意力的听觉场景理解模型

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

February 24, 2025
作者: Xilin Jiang, Sukru Samet Dindar, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Guy M McKhann, Adeen Flinker, Daniel Friedman, Nima Mesgarani
cs.AI

摘要

听觉基础模型,包括听觉大语言模型(LLMs),在处理所有声音输入时一视同仁,与听者的感知无关。然而,人类的听觉感知本质上是选择性的:在复杂的听觉场景中,听者会专注于特定的说话者而忽略其他声音。现有模型未能融入这种选择性,限制了其生成与感知一致响应的能力。为解决这一问题,我们提出了意图感知的听觉场景理解(II-ASU),并展示了听觉注意力驱动的大语言模型(AAD-LLM),这是一个通过整合脑信号来推断听者注意力的原型系统。AAD-LLM通过引入颅内脑电图(iEEG)记录,扩展了听觉LLM,以解码听者正在关注哪位说话者,并据此优化响应。该模型首先从神经活动中预测被关注的说话者,然后基于这一推断的注意力状态来生成响应。我们在多说话者场景下对AAD-LLM进行了说话者描述、语音转录与提取以及问答任务的评估,主客观评分均显示其与听者意图的契合度显著提升。通过迈出意图感知听觉AI的第一步,本研究探索了一种新的范式,即让听者感知指导机器听觉,为未来以听者为中心的听觉系统开辟了道路。演示与代码请访问:https://aad-llm.github.io。
English
Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios, with both objective and subjective ratings showing improved alignment with listener intention. By taking a first step toward intention-aware auditory AI, this work explores a new paradigm where listener perception informs machine listening, paving the way for future listener-centered auditory systems. Demo and code available: https://aad-llm.github.io.

Summary

AI-Generated Summary

PDF53February 26, 2025