走向通用足球视频理解
Towards Universal Soccer Video Understanding
December 2, 2024
作者: Jiayuan Rao, Haoning Wu, Hao Jiang, Ya Zhang, Yanfeng Wang, Weidi Xie
cs.AI
摘要
作为一项全球受欢迎的运动,足球吸引了世界各地球迷的广泛兴趣。本文旨在开发一个全面的多模态足球视频理解框架。具体而言,本文在以下方面做出了贡献:(i) 我们介绍了迄今为止最大的多模态足球数据集SoccerReplay-1988,包括来自1,988场完整比赛的视频和详细注释,采用自动化注释流程;(ii) 我们提出了足球领域的首个视觉-语言基础模型MatchVision,利用足球视频中的时空信息,在各种下游任务中表现出色;(iii) 我们对事件分类、评论生成和多视角犯规识别进行了大量实验和消融研究。MatchVision在所有这些任务上展现出最先进的性能,远远优于现有模型,突显了我们提出的数据和模型的优越性。我们相信这项工作将为体育理解研究提供一个标准范式。
English
As a globally celebrated sport, soccer has attracted widespread interest from
fans all over the world. This paper aims to develop a comprehensive multi-modal
framework for soccer video understanding. Specifically, we make the following
contributions in this paper: (i) we introduce SoccerReplay-1988, the largest
multi-modal soccer dataset to date, featuring videos and detailed annotations
from 1,988 complete matches, with an automated annotation pipeline; (ii) we
present the first visual-language foundation model in the soccer domain,
MatchVision, which leverages spatiotemporal information across soccer videos
and excels in various downstream tasks; (iii) we conduct extensive experiments
and ablation studies on event classification, commentary generation, and
multi-view foul recognition. MatchVision demonstrates state-of-the-art
performance on all of them, substantially outperforming existing models, which
highlights the superiority of our proposed data and model. We believe that this
work will offer a standard paradigm for sports understanding research.Summary
AI-Generated Summary