生成式影片模型是否能從觀看影片中學習物理原理?
Do generative video models learn physical principles from watching videos?
January 14, 2025
作者: Saman Motamed, Laura Culp, Kevin Swersky, Priyank Jaini, Robert Geirhos
cs.AI
摘要
AI視頻生成正在經歷一場革命,質量和逼真度正在迅速提升。這些進步引發了一場激烈的科學辯論:視頻模型是否學習到了「世界模型」,從而發現了物理定律,或者僅僅是複雜的像素預測器,實現了視覺逼真度,卻沒有理解現實物理原則?我們通過開發Physics-IQ來回答這個問題,這是一個全面的基準數據集,只有通過深入理解各種物理原則,如流體動力學、光學、固體力學、磁學和熱力學,才能解決。我們發現,在眾多當前模型(Sora、Runway、Pika、Lumiere、Stable Video Diffusion和VideoPoet)中,物理理解受到嚴重限制,並且與視覺逼真度無關。同時,一些測試案例已經可以成功解決。這表明僅通過觀察就可能獲得某些物理原則,但仍然存在重大挑戰。儘管我們預計未來將取得快速進展,但我們的工作表明,視覺逼真度並不意味著對物理的理解。我們的項目頁面位於https://physics-iq.github.io;代碼位於https://github.com/google-deepmind/physics-IQ-benchmark。
English
AI video generation is undergoing a revolution, with quality and realism
advancing rapidly. These advances have led to a passionate scientific debate:
Do video models learn ``world models'' that discover laws of physics -- or,
alternatively, are they merely sophisticated pixel predictors that achieve
visual realism without understanding the physical principles of reality? We
address this question by developing Physics-IQ, a comprehensive benchmark
dataset that can only be solved by acquiring a deep understanding of various
physical principles, like fluid dynamics, optics, solid mechanics, magnetism
and thermodynamics. We find that across a range of current models (Sora,
Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical
understanding is severely limited, and unrelated to visual realism. At the same
time, some test cases can already be successfully solved. This indicates that
acquiring certain physical principles from observation alone may be possible,
but significant challenges remain. While we expect rapid advances ahead, our
work demonstrates that visual realism does not imply physical understanding.
Our project page is at https://physics-iq.github.io; code at
https://github.com/google-deepmind/physics-IQ-benchmark.Summary
AI-Generated Summary