WHAC:基于现实世界的人类与相机系统
WHAC: World-grounded Humans and Cameras
March 19, 2024
作者: Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang
cs.AI
摘要
从单目视频中准确估计世界坐标系下的人体与相机轨迹,是一个极具吸引力却又充满挑战且病态的问题。本研究旨在通过利用世界、人体与相机三者间的协同作用,联合恢复具有表现力的参数化人体模型(即SMPL-X)及相应的相机姿态。我们的方法基于两个关键观察:首先,基于相机框架的SMPL-X估计方法能够轻松恢复人体的绝对深度;其次,人体运动本身提供了绝对的空间线索。整合这些见解,我们提出了一个名为WHAC的新框架,以促进基于世界坐标的表现力人体姿态与形状估计(EHPS)以及相机姿态估计,而无需依赖传统的优化技术。此外,我们引入了一个新的合成数据集WHAC-A-Mole,该数据集包含精确标注的人体与相机,并展示了多样化的交互式人体运动以及真实的相机轨迹。在标准及新建立的基准测试上进行的大量实验,凸显了我们框架的优越性与有效性。我们将公开代码与数据集。
English
Estimating human and camera trajectories with accurate scale in the world
coordinate system from a monocular video is a highly desirable yet challenging
and ill-posed problem. In this study, we aim to recover expressive parametric
human models (i.e., SMPL-X) and corresponding camera poses jointly, by
leveraging the synergy between three critical players: the world, the human,
and the camera. Our approach is founded on two key observations. Firstly,
camera-frame SMPL-X estimation methods readily recover absolute human depth.
Secondly, human motions inherently provide absolute spatial cues. By
integrating these insights, we introduce a novel framework, referred to as
WHAC, to facilitate world-grounded expressive human pose and shape estimation
(EHPS) alongside camera pose estimation, without relying on traditional
optimization techniques. Additionally, we present a new synthetic dataset,
WHAC-A-Mole, which includes accurately annotated humans and cameras, and
features diverse interactive human motions as well as realistic camera
trajectories. Extensive experiments on both standard and newly established
benchmarks highlight the superiority and efficacy of our framework. We will
make the code and dataset publicly available.Summary
AI-Generated Summary