PhysGen:剛體物理基礎的影像到影片生成
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
September 27, 2024
作者: Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, Shenlong Wang
cs.AI
摘要
我們提出了 PhysGen,一種新穎的影像轉視頻生成方法,將單張影像和輸入條件(例如,應用於影像中物體的力和扭矩)轉換為產生逼真、符合物理規律且時間上一致的視頻。我們的關鍵洞察是將基於模型的物理模擬與數據驅動的視頻生成過程相結合,實現了可信的影像空間動態。我們系統的核心組成部分包括三個核心組件:(i)一個影像理解模塊,有效捕捉影像的幾何形狀、材料和物理參數;(ii)一個利用剛體物理和推斷參數進行模擬的影像空間動力學模型,模擬逼真行為;以及(iii)一個利用生成式視頻擴散進行影像渲染和細化的模塊,生成展示模擬運動的逼真視頻素材。生成的視頻在物理和外觀上都是逼真的,甚至可以精確控制,通過定量比較和全面的用戶研究展示出優於現有數據驅動影像轉視頻生成作品的卓越結果。PhysGen 生成的視頻可用於各種下游應用,例如將影像轉換為逼真動畫或讓用戶與影像互動並創建各種動態。項目頁面:https://stevenlsw.github.io/physgen/
English
We present PhysGen, a novel image-to-video generation method that converts a
single image and an input condition (e.g., force and torque applied to an
object in the image) to produce a realistic, physically plausible, and
temporally consistent video. Our key insight is to integrate model-based
physical simulation with a data-driven video generation process, enabling
plausible image-space dynamics. At the heart of our system are three core
components: (i) an image understanding module that effectively captures the
geometry, materials, and physical parameters of the image; (ii) an image-space
dynamics simulation model that utilizes rigid-body physics and inferred
parameters to simulate realistic behaviors; and (iii) an image-based rendering
and refinement module that leverages generative video diffusion to produce
realistic video footage featuring the simulated motion. The resulting videos
are realistic in both physics and appearance and are even precisely
controllable, showcasing superior results over existing data-driven
image-to-video generation works through quantitative comparison and
comprehensive user study. PhysGen's resulting videos can be used for various
downstream applications, such as turning an image into a realistic animation or
allowing users to interact with the image and create various dynamics. Project
page: https://stevenlsw.github.io/physgen/Summary
AI-Generated Summary