ChatPaper.aiChatPaper

OmniHuman-1:重新思考单阶段条件人类动画模型的扩展

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

February 3, 2025
作者: Gaojie Lin, Jianwen Jiang, Jiaqi Yang, Zerong Zheng, Chao Liang
cs.AI

摘要

最近几年,端到端的人体动画,如音频驱动的人类生成语音,取得了显著进展。然而,现有方法仍然难以像大型通用视频生成模型那样扩展,限制了它们在实际应用中的潜力。在本文中,我们提出了OmniHuman,这是一个基于扩散Transformer的框架,通过将与运动相关的条件混合到训练阶段来扩展数据规模。为此,我们引入了两种针对这些混合条件的训练原则,以及相应的模型架构和推理策略。这些设计使OmniHuman能够充分利用数据驱动的动作生成,最终实现高度逼真的人类视频生成。更重要的是,OmniHuman支持各种肖像内容(面部特写,肖像,半身像,全身像),支持说话和唱歌,处理人体与物体的互动和具有挑战性的身体姿势,并适应不同的图像风格。与现有的端到端音频驱动方法相比,OmniHuman不仅能够产生更逼真的视频,还能在输入方面提供更大的灵活性。它还支持多种驱动模式(音频驱动,视频驱动和组合驱动信号)。视频样本可在ttfamily项目页面(https://omnihuman-lab.github.io)上找到。
English
End-to-end human animation, such as audio-driven talking human generation, has undergone notable advancements in the recent few years. However, existing methods still struggle to scale up as large general video generation models, limiting their potential in real applications. In this paper, we propose OmniHuman, a Diffusion Transformer-based framework that scales up data by mixing motion-related conditions into the training phase. To this end, we introduce two training principles for these mixed conditions, along with the corresponding model architecture and inference strategy. These designs enable OmniHuman to fully leverage data-driven motion generation, ultimately achieving highly realistic human video generation. More importantly, OmniHuman supports various portrait contents (face close-up, portrait, half-body, full-body), supports both talking and singing, handles human-object interactions and challenging body poses, and accommodates different image styles. Compared to existing end-to-end audio-driven methods, OmniHuman not only produces more realistic videos, but also offers greater flexibility in inputs. It also supports multiple driving modalities (audio-driven, video-driven and combined driving signals). Video samples are provided on the ttfamily project page (https://omnihuman-lab.github.io)

Summary

AI-Generated Summary

PDF19119February 4, 2025