AAD-LLM:基于神经注意力的听觉场景理解模型
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
February 24, 2025
作者: Xilin Jiang, Sukru Samet Dindar, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Guy M McKhann, Adeen Flinker, Daniel Friedman, Nima Mesgarani
cs.AI
摘要
听觉基础模型,包括听觉大语言模型(LLMs),在处理所有声音输入时一视同仁,与听者的感知无关。然而,人类的听觉感知本质上是选择性的:在复杂的听觉场景中,听者会专注于特定的说话者而忽略其他声音。现有模型未能融入这种选择性,限制了其生成与感知一致响应的能力。为解决这一问题,我们提出了意图感知的听觉场景理解(II-ASU),并展示了听觉注意力驱动的大语言模型(AAD-LLM),这是一个通过整合脑信号来推断听者注意力的原型系统。AAD-LLM通过引入颅内脑电图(iEEG)记录,扩展了听觉LLM,以解码听者正在关注哪位说话者,并据此优化响应。该模型首先从神经活动中预测被关注的说话者,然后基于这一推断的注意力状态来生成响应。我们在多说话者场景下对AAD-LLM进行了说话者描述、语音转录与提取以及问答任务的评估,主客观评分均显示其与听者意图的契合度显著提升。通过迈出意图感知听觉AI的第一步,本研究探索了一种新的范式,即让听者感知指导机器听觉,为未来以听者为中心的听觉系统开辟了道路。演示与代码请访问:https://aad-llm.github.io。
English
Auditory foundation models, including auditory large language models (LLMs),
process all sound inputs equally, independent of listener perception. However,
human auditory perception is inherently selective: listeners focus on specific
speakers while ignoring others in complex auditory scenes. Existing models do
not incorporate this selectivity, limiting their ability to generate
perception-aligned responses. To address this, we introduce Intention-Informed
Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM
(AAD-LLM), a prototype system that integrates brain signals to infer listener
attention. AAD-LLM extends an auditory LLM by incorporating intracranial
electroencephalography (iEEG) recordings to decode which speaker a listener is
attending to and refine responses accordingly. The model first predicts the
attended speaker from neural activity, then conditions response generation on
this inferred attentional state. We evaluate AAD-LLM on speaker description,
speech transcription and extraction, and question answering in multitalker
scenarios, with both objective and subjective ratings showing improved
alignment with listener intention. By taking a first step toward
intention-aware auditory AI, this work explores a new paradigm where listener
perception informs machine listening, paving the way for future
listener-centered auditory systems. Demo and code available:
https://aad-llm.github.io.Summary
AI-Generated Summary