WikiVideo:基于多视频的文章生成
WikiVideo: Article Generation from Multiple Videos
April 1, 2025
作者: Alexander Martin, Reno Kriz, William Gantt Walden, Kate Sanders, Hannah Recknor, Eugene Yang, Francis Ferraro, Benjamin Van Durme
cs.AI
摘要
我们提出了一项具有挑战性的任务:自动生成一篇高层次的维基百科风格文章,该文章需整合来自多个多样化视频中关于现实世界事件(如自然灾害或政治选举)的信息。视频作为检索增强生成(RAG)的直观来源,但当前大多数RAG工作流程主要侧重于文本,而现有的基于视频的摘要方法则聚焦于低层次的场景理解而非高层次的事件语义。为填补这一空白,我们引入了WikiVideo,一个由专家撰写的文章和密集标注的视频组成的基准数据集,这些视频为文章中的主张提供了证据,促进了视频与RAG管道的整合,并支持创建基于多模态来源的深度内容。此外,我们提出了协作文章生成(CAG),一种新颖的从多个视频中创建文章的交互式方法。CAG利用r1风格推理模型与VideoLLM之间的迭代交互,对目标事件做出比单独使用VideoLLM更高层次的推断,后者往往局限于低层次的视觉特征。我们在理想检索和RAG设置下对最先进的VideoLLM和CAG进行了基准测试,发现CAG始终优于其他方法,同时为未来研究指出了引人入胜的方向。
English
We present the challenging task of automatically creating a high-level
Wikipedia-style article that aggregates information from multiple diverse
videos about real-world events, such as natural disasters or political
elections. Videos are intuitive sources for retrieval-augmented generation
(RAG), but most contemporary RAG workflows focus heavily on text and existing
methods for video-based summarization focus on low-level scene understanding
rather than high-level event semantics. To close this gap, we introduce
WikiVideo, a benchmark consisting of expert-written articles and densely
annotated videos that provide evidence for articles' claims, facilitating the
integration of video into RAG pipelines and enabling the creation of in-depth
content that is grounded in multimodal sources. We further propose
Collaborative Article Generation (CAG), a novel interactive method for article
creation from multiple videos. CAG leverages an iterative interaction between
an r1-style reasoning model and a VideoLLM to draw higher level inferences
about the target event than is possible with VideoLLMs alone, which fixate on
low-level visual features. We benchmark state-of-the-art VideoLLMs and CAG in
both oracle retrieval and RAG settings and find that CAG consistently
outperforms alternative methods, while suggesting intriguing avenues for future
work.Summary
AI-Generated Summary