V-Express:條件式輸出層退出用於人像視頻生成的漸進式訓練

V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

June 4, 2024
作者: Cong Wang, Kuan Tian, Jun Zhang, Yonghang Guan, Feng Luo, Fei Shen, Zhiwei Jiang, Qing Gu, Xiao Han, Wei Yang
cs.AI

摘要

在肖像影片生成領域中,使用單張圖像生成肖像影片的做法日益普遍。一種常見的方法是利用生成模型來增強適配器以進行受控生成。然而,控制信號(例如文本、音訊、參考圖像、姿勢、深度圖等)的強度可能有所不同。在這些信號中,較弱的條件通常因較強的條件的干擾而難以發揮作用,這構成了平衡這些條件的挑戰。在我們的肖像影片生成工作中,我們發現音訊信號特別薄弱,常常被臉部姿勢和參考圖像等較強信號所掩蓋。然而,直接使用薄弱信號進行訓練往往會導致收斂困難。為解決這個問題,我們提出了V-Express,一種通過漸進訓練和條件丟棄操作來平衡不同控制信號的簡單方法。我們的方法逐漸實現了對較弱條件的有效控制,從而實現同時考慮臉部姿勢、參考圖像和音訊的生成能力。實驗結果表明,我們的方法能夠有效生成由音訊控制的肖像影片。此外,我們提供了一種潛在解決方案,以同時有效地利用不同強度的條件。
English
In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effective due to interference from stronger conditions, posing a challenge in balancing these conditions. In our work on portrait video generation, we identified audio signals as particularly weak, often overshadowed by stronger signals such as facial pose and reference image. However, direct training with weak signals often leads to difficulties in convergence. To address this, we propose V-Express, a simple method that balances different control signals through the progressive training and the conditional dropout operation. Our method gradually enables effective control by weak conditions, thereby achieving generation capabilities that simultaneously take into account the facial pose, reference image, and audio. The experimental results demonstrate that our method can effectively generate portrait videos controlled by audio. Furthermore, a potential solution is provided for the simultaneous and effective use of conditions of varying strengths.

Summary

AI-Generated Summary

PDF112December 12, 2024