MotionBench:用于视觉语言模型 Fine-grained 视频运动理解的基准测试和改进

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

January 6, 2025
作者: Wenyi Hong, Yean Cheng, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang
cs.AI

摘要

近年来,视觉语言模型(VLMs)在视频理解方面取得了显著进展。然而,一个关键能力——细粒度动作理解,在当前基准测试中仍未得到充分探索。为了填补这一空白,我们提出了MotionBench,一个全面的评估基准,旨在评估视频理解模型对细粒度动作理解的能力。MotionBench通过六种主要的面向动作的问题类型评估模型的动作级别感知,并包含从多种来源收集的数据,确保对真实世界视频内容的广泛代表性。实验结果显示,现有的VLMs在理解细粒度动作方面表现不佳。为了增强VLM在有限LLM序列长度内感知细粒度动作的能力,我们进行了大量实验,审查了针对视频特征压缩进行优化的VLM架构,并提出了一种新颖高效的Through-Encoder(TE)融合方法。实验表明,更高帧率的输入和TE融合可以提高动作理解能力,但仍有很大的改进空间。我们的基准旨在引导和激励更具能力的视频理解模型的发展,强调细粒度动作理解的重要性。项目页面:https://motion-bench.github.io。
English
In recent years, vision language models (VLMs) have made significant advancements in video understanding. However, a crucial capability - fine-grained motion comprehension - remains under-explored in current benchmarks. To address this gap, we propose MotionBench, a comprehensive evaluation benchmark designed to assess the fine-grained motion comprehension of video understanding models. MotionBench evaluates models' motion-level perception through six primary categories of motion-oriented question types and includes data collected from diverse sources, ensuring a broad representation of real-world video content. Experimental results reveal that existing VLMs perform poorly in understanding fine-grained motions. To enhance VLM's ability to perceive fine-grained motion within a limited sequence length of LLM, we conduct extensive experiments reviewing VLM architectures optimized for video feature compression and propose a novel and efficient Through-Encoder (TE) Fusion method. Experiments show that higher frame rate inputs and TE Fusion yield improvements in motion understanding, yet there is still substantial room for enhancement. Our benchmark aims to guide and motivate the development of more capable video understanding models, emphasizing the importance of fine-grained motion comprehension. Project page: https://motion-bench.github.io .

Summary

AI-Generated Summary

PDF402January 8, 2025