生成式视频模型是否可以通过观看视频学习物理原理?

Do generative video models learn physical principles from watching videos?

January 14, 2025
作者: Saman Motamed, Laura Culp, Kevin Swersky, Priyank Jaini, Robert Geirhos
cs.AI

摘要

AI视频生成正在经历一场革命,质量和逼真度迅速提升。这些进步引发了一场激烈的科学争论:视频模型是否学习到了“世界模型”,从而发现物理定律,或者说它们仅仅是复杂的像素预测器,实现了视觉逼真度,却并不理解现实世界的物理原理?我们通过开发Physics-IQ,一个全面的基准数据集来回答这个问题,这个数据集只有通过深入理解各种物理原理,如流体动力学、光学、固体力学、磁学和热力学才能解决。我们发现,在一系列当前模型(Sora、Runway、Pika、Lumiere、Stable Video Diffusion和VideoPoet)中,物理理解严重受限,并且与视觉逼真度无关。与此同时,一些测试案例已经可以成功解决。这表明仅通过观察就可能获得某些物理原理,但仍然存在重大挑战。虽然我们预计未来会有快速进展,但我们的工作表明,视觉逼真度并不意味着对物理的理解。我们的项目页面位于https://physics-iq.github.io;代码位于https://github.com/google-deepmind/physics-IQ-benchmark。
English
AI video generation is undergoing a revolution, with quality and realism advancing rapidly. These advances have led to a passionate scientific debate: Do video models learn ``world models'' that discover laws of physics -- or, alternatively, are they merely sophisticated pixel predictors that achieve visual realism without understanding the physical principles of reality? We address this question by developing Physics-IQ, a comprehensive benchmark dataset that can only be solved by acquiring a deep understanding of various physical principles, like fluid dynamics, optics, solid mechanics, magnetism and thermodynamics. We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism. At the same time, some test cases can already be successfully solved. This indicates that acquiring certain physical principles from observation alone may be possible, but significant challenges remain. While we expect rapid advances ahead, our work demonstrates that visual realism does not imply physical understanding. Our project page is at https://physics-iq.github.io; code at https://github.com/google-deepmind/physics-IQ-benchmark.

Summary

AI-Generated Summary

PDF92January 17, 2025