FocusedAD：以角色为核心的电影音频描述

摘要

电影音频描述（AD）旨在在无对话片段中叙述视觉内容，尤其惠及盲人和视障（BVI）观众。与普通视频字幕相比，AD要求提供与剧情相关的叙述，并明确提及角色姓名，这对电影理解提出了独特挑战。为识别活跃的主要角色并聚焦于与故事情节相关的区域，我们提出了FocusedAD，一个新颖的框架，用于生成以角色为中心的电影音频描述。该框架包含：（i）角色感知模块（CPM），用于追踪角色区域并将其与姓名关联；（ii）动态先验模块（DPM），通过可学习的软提示从先前的AD和字幕中注入上下文线索；（iii）聚焦字幕模块（FCM），生成富含剧情细节和命名角色的叙述。为克服角色识别中的限制，我们还引入了一个自动化流程来构建角色查询库。FocusedAD在多个基准测试中实现了最先进的性能，包括在MAD-eval-Named和我们新提出的Cinepile-AD数据集上取得的强劲零样本结果。代码和数据将在https://github.com/Thorin215/FocusedAD 发布。

English

Movie Audio Description (AD) aims to narrate visual content during dialogue-free segments, particularly benefiting blind and visually impaired (BVI) audiences. Compared with general video captioning, AD demands plot-relevant narration with explicit character name references, posing unique challenges in movie understanding.To identify active main characters and focus on storyline-relevant regions, we propose FocusedAD, a novel framework that delivers character-centric movie audio descriptions. It includes: (i) a Character Perception Module(CPM) for tracking character regions and linking them to names; (ii) a Dynamic Prior Module(DPM) that injects contextual cues from prior ADs and subtitles via learnable soft prompts; and (iii) a Focused Caption Module(FCM) that generates narrations enriched with plot-relevant details and named characters. To overcome limitations in character identification, we also introduce an automated pipeline for building character query banks. FocusedAD achieves state-of-the-art performance on multiple benchmarks, including strong zero-shot results on MAD-eval-Named and our newly proposed Cinepile-AD dataset. Code and data will be released at https://github.com/Thorin215/FocusedAD .

FocusedAD：以角色为核心的电影音频描述

FocusedAD: Character-centric Movie Audio Description

摘要

Summary

Support

Support