FocusedAD：以角色為中心的電影口述影像

摘要

電影音頻描述（AD）旨在對話空白段中敘述視覺內容，特別有益於盲人和視力障礙（BVI）觀眾。與一般的視頻字幕相比，AD要求提供與情節相關的敘述，並明確提及角色名稱，這在電影理解方面提出了獨特的挑戰。為了識別活躍的主要角色並聚焦於與故事線相關的區域，我們提出了FocusedAD，這是一個新穎的框架，提供以角色為中心的電影音頻描述。它包括：（i）一個角色感知模塊（CPM），用於追蹤角色區域並將其與名稱關聯；（ii）一個動態先驗模塊（DPM），通過可學習的軟提示從先前的AD和字幕中注入上下文線索；（iii）一個聚焦字幕模塊（FCM），生成富含情節相關細節和命名角色的敘述。為克服角色識別的局限性，我們還引入了一個自動化管道來構建角色查詢庫。FocusedAD在多個基準測試中達到了最先進的性能，包括在MAD-eval-Named和我們新提出的Cinepile-AD數據集上的強大零樣本結果。代碼和數據將在https://github.com/Thorin215/FocusedAD 發布。

English

Movie Audio Description (AD) aims to narrate visual content during dialogue-free segments, particularly benefiting blind and visually impaired (BVI) audiences. Compared with general video captioning, AD demands plot-relevant narration with explicit character name references, posing unique challenges in movie understanding.To identify active main characters and focus on storyline-relevant regions, we propose FocusedAD, a novel framework that delivers character-centric movie audio descriptions. It includes: (i) a Character Perception Module(CPM) for tracking character regions and linking them to names; (ii) a Dynamic Prior Module(DPM) that injects contextual cues from prior ADs and subtitles via learnable soft prompts; and (iii) a Focused Caption Module(FCM) that generates narrations enriched with plot-relevant details and named characters. To overcome limitations in character identification, we also introduce an automated pipeline for building character query banks. FocusedAD achieves state-of-the-art performance on multiple benchmarks, including strong zero-shot results on MAD-eval-Named and our newly proposed Cinepile-AD dataset. Code and data will be released at https://github.com/Thorin215/FocusedAD .

FocusedAD：以角色為中心的電影口述影像

FocusedAD: Character-centric Movie Audio Description

摘要

Summary

Support

Support