VBench++:视频生成模型的全面多功能基准套件

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

November 20, 2024
作者: Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu
cs.AI

摘要

视频生成已经取得了显著的进展,但评估这些模型仍然是一个挑战。视频生成的全面评估基准至关重要,原因有两点:1)现有的度量标准并不完全符合人类感知;2)理想的评估系统应提供见解,以指导未来视频生成的发展。为此,我们提出了VBench,一个全面的基准套件,将“视频生成质量”分解为具体、分层和解耦的维度,每个维度都有量身定制的提示和评估方法。VBench具有几个吸引人的特点:1)全面的维度:VBench包括视频生成中的16个维度(例如,主体身份不一致、动作平滑度、时间闪烁和空间关系等)。细粒度级别的评估度量揭示了各个模型的优势和劣势。2)与人类对齐:我们还提供了一个人类偏好注释数据集,以验证我们基准与人类感知的对齐性,分别针对每个评估维度。3)宝贵的见解:我们研究了当前模型在各种评估维度和各种内容类型上的能力。我们还调查了视频和图像生成模型之间的差距。4)多功能基准测试:VBench++支持评估文本到视频和图像到视频。我们引入了一个具有自适应宽高比的高质量图像套件,以实现在不同图像到视频生成设置下的公平评估。除了评估技术质量,VBench++还评估视频生成模型的可信度,提供了对模型性能更全面的视角。5)完全开源:我们完全开源了VBench++,并不断向我们的排行榜添加新的视频生成模型,推动视频生成领域的发展。
English
Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has several appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models' strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models' ability across various evaluation dimensions, and various content types. We also investigate the gaps between video and image generation models. 4) Versatile Benchmarking: VBench++ supports evaluating text-to-video and image-to-video. We introduce a high-quality Image Suite with an adaptive aspect ratio to enable fair evaluations across different image-to-video generation settings. Beyond assessing technical quality, VBench++ evaluates the trustworthiness of video generative models, providing a more holistic view of model performance. 5) Full Open-Sourcing: We fully open-source VBench++ and continually add new video generation models to our leaderboard to drive forward the field of video generation.

Summary

AI-Generated Summary

PDF151November 21, 2024