DisCoRD:通過矯正流解碼從離散標記到連續運動
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
November 29, 2024
作者: Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu
cs.AI
摘要
人類運動本質上是連續且動態的,對生成模型提出了重大挑戰。儘管離散量化方法(如VQ-VAEs)佔主導地位,但存在固有限制,包括表現受限和逐幀噪音。連續方法雖然產生更平滑和更自然的動作,但由於高維度複雜性和有限的訓練數據,通常會遇到困難。為了解決離散和連續表示之間的“不協調”,我們引入了DisCoRD:通過矯正流解碼將離散運動令牌解碼為連續運動的新方法。通過在連續空間中採用迭代細化過程,DisCoRD捕捉了細粒度動態並確保更平滑和更自然的運動。我們的方法與任何基於離散的框架兼容,增強了自然性,同時不損害對條件信號的忠實度。廣泛的評估表明,DisCoRD在HumanML3D和KIT-ML上的FID分別為0.032和0.169,實現了最先進的性能。這些結果鞏固了DisCoRD作為彌合離散效率和連續現實主義之間差距的堅固解決方案。我們的項目頁面位於:https://whwjdqls.github.io/discord.github.io/。
English
Human motion, inherently continuous and dynamic, presents significant
challenges for generative models. Despite their dominance, discrete
quantization methods, such as VQ-VAEs, suffer from inherent limitations,
including restricted expressiveness and frame-wise noise artifacts. Continuous
approaches, while producing smoother and more natural motions, often falter due
to high-dimensional complexity and limited training data. To resolve this
"discord" between discrete and continuous representations, we introduce
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, a
novel method that decodes discrete motion tokens into continuous motion through
rectified flow. By employing an iterative refinement process in the continuous
space, DisCoRD captures fine-grained dynamics and ensures smoother and more
natural motions. Compatible with any discrete-based framework, our method
enhances naturalness without compromising faithfulness to the conditioning
signals. Extensive evaluations demonstrate that DisCoRD achieves
state-of-the-art performance, with FID of 0.032 on HumanML3D and 0.169 on
KIT-ML. These results solidify DisCoRD as a robust solution for bridging the
divide between discrete efficiency and continuous realism. Our project page is
available at: https://whwjdqls.github.io/discord.github.io/.Summary
AI-Generated Summary