FocusedAD:以角色为核心的电影音频描述
FocusedAD: Character-centric Movie Audio Description
April 16, 2025
作者: Xiaojun Ye, Chun Wang, Yiren Song, Sheng Zhou, Liangcheng Li, Jiajun Bu
cs.AI
摘要
电影音频描述(AD)旨在在无对话片段中叙述视觉内容,尤其惠及盲人和视障(BVI)观众。与普通视频字幕相比,AD要求提供与剧情相关的叙述,并明确提及角色姓名,这对电影理解提出了独特挑战。为识别活跃的主要角色并聚焦于与故事情节相关的区域,我们提出了FocusedAD,一个新颖的框架,用于生成以角色为中心的电影音频描述。该框架包含:(i)角色感知模块(CPM),用于追踪角色区域并将其与姓名关联;(ii)动态先验模块(DPM),通过可学习的软提示从先前的AD和字幕中注入上下文线索;(iii)聚焦字幕模块(FCM),生成富含剧情细节和命名角色的叙述。为克服角色识别中的限制,我们还引入了一个自动化流程来构建角色查询库。FocusedAD在多个基准测试中实现了最先进的性能,包括在MAD-eval-Named和我们新提出的Cinepile-AD数据集上取得的强劲零样本结果。代码和数据将在https://github.com/Thorin215/FocusedAD 发布。
English
Movie Audio Description (AD) aims to narrate visual content during
dialogue-free segments, particularly benefiting blind and visually impaired
(BVI) audiences. Compared with general video captioning, AD demands
plot-relevant narration with explicit character name references, posing unique
challenges in movie understanding.To identify active main characters and focus
on storyline-relevant regions, we propose FocusedAD, a novel framework that
delivers character-centric movie audio descriptions. It includes: (i) a
Character Perception Module(CPM) for tracking character regions and linking
them to names; (ii) a Dynamic Prior Module(DPM) that injects contextual cues
from prior ADs and subtitles via learnable soft prompts; and (iii) a Focused
Caption Module(FCM) that generates narrations enriched with plot-relevant
details and named characters. To overcome limitations in character
identification, we also introduce an automated pipeline for building character
query banks. FocusedAD achieves state-of-the-art performance on multiple
benchmarks, including strong zero-shot results on MAD-eval-Named and our newly
proposed Cinepile-AD dataset. Code and data will be released at
https://github.com/Thorin215/FocusedAD .Summary
AI-Generated Summary