VBench++:用於視頻生成模型的全面且多功能的基準套件
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
November 20, 2024
作者: Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu
cs.AI
摘要
影片生成已經取得顯著進展,然而評估這些模型仍然是一個挑戰。對於影片生成,一個全面的評估基準至關重要,原因有兩點:1) 現有的指標並不完全符合人類感知;2) 理想的評估系統應提供洞察力,以指導未來影片生成的發展。為此,我們提出了VBench,一個全面的基準套件,將「影片生成品質」細分為具體、分層和解耦的維度,每個維度都有量身定制的提示和評估方法。VBench 具有幾個吸引人的特點:1) 全面的維度:VBench 包括影片生成的 16 個維度(例如,主題身份不一致、動作平滑度、時間閃爍和空間關係等)。細粒度的評估指標揭示了各個模型的優勢和劣勢。2) 人類對齊:我們還提供了一組人類偏好標註的數據集,以驗證我們的基準與人類感知的對齊性,分別針對每個評估維度。3) 有價值的洞察:我們研究了當前模型在各種評估維度和各種內容類型上的能力。我們還調查了影片生成模型和圖像生成模型之間的差距。4) 多功能基準:VBench++ 支持評估文本到影片和圖像到影片。我們引入了一個高質量的圖像套件,具有適應性的長寬比,以實現在不同圖像到影片生成設置中公平評估。除了評估技術品質外,VBench++ 還評估了影片生成模型的可信度,提供了對模型性能更全面的觀點。5) 完全開源:我們完全開源了 VBench++,並不斷將新的影片生成模型添加到我們的排行榜中,推動影片生成領域的發展。
English
Video generation has witnessed significant advancements, yet evaluating these
models remains a challenge. A comprehensive evaluation benchmark for video
generation is indispensable for two reasons: 1) Existing metrics do not fully
align with human perceptions; 2) An ideal evaluation system should provide
insights to inform future developments of video generation. To this end, we
present VBench, a comprehensive benchmark suite that dissects "video generation
quality" into specific, hierarchical, and disentangled dimensions, each with
tailored prompts and evaluation methods. VBench has several appealing
properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in
video generation (e.g., subject identity inconsistency, motion smoothness,
temporal flickering, and spatial relationship, etc). The evaluation metrics
with fine-grained levels reveal individual models' strengths and weaknesses. 2)
Human Alignment: We also provide a dataset of human preference annotations to
validate our benchmarks' alignment with human perception, for each evaluation
dimension respectively. 3) Valuable Insights: We look into current models'
ability across various evaluation dimensions, and various content types. We
also investigate the gaps between video and image generation models. 4)
Versatile Benchmarking: VBench++ supports evaluating text-to-video and
image-to-video. We introduce a high-quality Image Suite with an adaptive aspect
ratio to enable fair evaluations across different image-to-video generation
settings. Beyond assessing technical quality, VBench++ evaluates the
trustworthiness of video generative models, providing a more holistic view of
model performance. 5) Full Open-Sourcing: We fully open-source VBench++ and
continually add new video generation models to our leaderboard to drive forward
the field of video generation.Summary
AI-Generated Summary