IDOL:從單張圖像即時生成逼真的3D人物
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
December 19, 2024
作者: Yiyu Zhuang, Jiaxi Lv, Hao Wen, Qing Shuai, Ailing Zeng, Hao Zhu, Shifeng Chen, Yujiu Yang, Xun Cao, Wei Liu
cs.AI
摘要
從單張圖像創建高保真、可動畫的3D全身化身是一項具有挑戰性的任務,原因在於人類的外觀和姿勢多樣,且高質量訓練數據有限。為了實現快速且高質量的人體重建,本研究從性質、模型和表徵的角度重新思考了這一任務。首先,我們引入了一個大規模的以人為中心生成數據集,名為HuGe100K,包含10萬組多樣、逼真的人類圖像集。每組包含特定人體姿勢的24個視角幀,透過可控姿勢的圖像到多視角模型生成。接著,利用HuGe100K內的視角、姿勢和外觀多樣性,我們開發了一個可擴展的前饋變換器模型,從給定的人像圖像預測出一個統一空間中的3D人體高斯表徵。該模型訓練用於解開人體姿勢、身體形狀、服裝幾何和紋理。估算的高斯可以在無需後處理的情況下進行動畫。我們進行了全面的實驗以驗證所提出的數據集和方法的有效性。我們的模型展示了在單個GPU上即時從單個輸入圖像高效重建1K分辨率逼真人像的能力。此外,它無縫支持各種應用,以及形狀和紋理編輯任務。
English
Creating a high-fidelity, animatable 3D full-body avatar from a single image
is a challenging task due to the diverse appearance and poses of humans and the
limited availability of high-quality training data. To achieve fast and
high-quality human reconstruction, this work rethinks the task from the
perspectives of dataset, model, and representation. First, we introduce a
large-scale HUman-centric GEnerated dataset, HuGe100K, consisting of 100K
diverse, photorealistic sets of human images. Each set contains 24-view frames
in specific human poses, generated using a pose-controllable
image-to-multi-view model. Next, leveraging the diversity in views, poses, and
appearances within HuGe100K, we develop a scalable feed-forward transformer
model to predict a 3D human Gaussian representation in a uniform space from a
given human image. This model is trained to disentangle human pose, body shape,
clothing geometry, and texture. The estimated Gaussians can be animated without
post-processing. We conduct comprehensive experiments to validate the
effectiveness of the proposed dataset and method. Our model demonstrates the
ability to efficiently reconstruct photorealistic humans at 1K resolution from
a single input image using a single GPU instantly. Additionally, it seamlessly
supports various applications, as well as shape and texture editing tasks.Summary
AI-Generated Summary