FocusedAD:以角色為中心的電影口述影像
FocusedAD: Character-centric Movie Audio Description
April 16, 2025
作者: Xiaojun Ye, Chun Wang, Yiren Song, Sheng Zhou, Liangcheng Li, Jiajun Bu
cs.AI
摘要
電影音頻描述(AD)旨在對話空白段中敘述視覺內容,特別有益於盲人和視力障礙(BVI)觀眾。與一般的視頻字幕相比,AD要求提供與情節相關的敘述,並明確提及角色名稱,這在電影理解方面提出了獨特的挑戰。為了識別活躍的主要角色並聚焦於與故事線相關的區域,我們提出了FocusedAD,這是一個新穎的框架,提供以角色為中心的電影音頻描述。它包括:(i)一個角色感知模塊(CPM),用於追蹤角色區域並將其與名稱關聯;(ii)一個動態先驗模塊(DPM),通過可學習的軟提示從先前的AD和字幕中注入上下文線索;(iii)一個聚焦字幕模塊(FCM),生成富含情節相關細節和命名角色的敘述。為克服角色識別的局限性,我們還引入了一個自動化管道來構建角色查詢庫。FocusedAD在多個基準測試中達到了最先進的性能,包括在MAD-eval-Named和我們新提出的Cinepile-AD數據集上的強大零樣本結果。代碼和數據將在https://github.com/Thorin215/FocusedAD 發布。
English
Movie Audio Description (AD) aims to narrate visual content during
dialogue-free segments, particularly benefiting blind and visually impaired
(BVI) audiences. Compared with general video captioning, AD demands
plot-relevant narration with explicit character name references, posing unique
challenges in movie understanding.To identify active main characters and focus
on storyline-relevant regions, we propose FocusedAD, a novel framework that
delivers character-centric movie audio descriptions. It includes: (i) a
Character Perception Module(CPM) for tracking character regions and linking
them to names; (ii) a Dynamic Prior Module(DPM) that injects contextual cues
from prior ADs and subtitles via learnable soft prompts; and (iii) a Focused
Caption Module(FCM) that generates narrations enriched with plot-relevant
details and named characters. To overcome limitations in character
identification, we also introduce an automated pipeline for building character
query banks. FocusedAD achieves state-of-the-art performance on multiple
benchmarks, including strong zero-shot results on MAD-eval-Named and our newly
proposed Cinepile-AD dataset. Code and data will be released at
https://github.com/Thorin215/FocusedAD .Summary
AI-Generated Summary