ChatPaper.aiChatPaper

备忘录:记忆引导扩散用于生成具有表现力的说话视频

MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

December 5, 2024
作者: Longtao Zheng, Yifan Zhang, Hanzhong Guo, Jiachun Pan, Zhenxiong Tan, Jiahao Lu, Chuanxin Tang, Bo An, Shuicheng Yan
cs.AI

摘要

最近视频扩散模型的进展为实现逼真的音频驱动对话视频生成开辟了新的潜力。然而,实现无缝音频唇部同步、保持长期身份一致性以及在生成的对话视频中产生自然、音频对齐的表情仍然是重大挑战。为了解决这些挑战,我们提出了一种记忆引导的情感感知扩散(MEMO)方法,这是一种端到端的音频驱动肖像动画方法,用于生成具有一致身份和富有表现力的对话视频。我们的方法围绕两个关键模块构建:(1)记忆引导的时间模块,通过开发记忆状态来存储来自更长过去上下文的信息,通过线性注意力引导时间建模,增强长期身份一致性和运动平滑度;以及(2)情感感知音频模块,用多模态注意力取代传统的交叉注意力,增强音频-视频交互,同时从音频中检测情绪,通过情感自适应层归一化改进面部表情。广泛的定量和定性结果表明,MEMO在各种图像和音频类型上生成更逼真的对话视频,优于现有方法在整体质量、音频唇部同步、身份一致性和表情-情感对齐方面的表现。
English
Recent advances in video diffusion models have unlocked new potential for realistic audio-driven talking video generation. However, achieving seamless audio-lip synchronization, maintaining long-term identity consistency, and producing natural, audio-aligned expressions in generated talking videos remain significant challenges. To address these challenges, we propose Memory-guided EMOtion-aware diffusion (MEMO), an end-to-end audio-driven portrait animation approach to generate identity-consistent and expressive talking videos. Our approach is built around two key modules: (1) a memory-guided temporal module, which enhances long-term identity consistency and motion smoothness by developing memory states to store information from a longer past context to guide temporal modeling via linear attention; and (2) an emotion-aware audio module, which replaces traditional cross attention with multi-modal attention to enhance audio-video interaction, while detecting emotions from audio to refine facial expressions via emotion adaptive layer norm. Extensive quantitative and qualitative results demonstrate that MEMO generates more realistic talking videos across diverse image and audio types, outperforming state-of-the-art methods in overall quality, audio-lip synchronization, identity consistency, and expression-emotion alignment.

Summary

AI-Generated Summary

PDF102December 6, 2024