DisCoRD:通过修正流实现从离散标记到连续运动的解码
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
November 29, 2024
作者: Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu
cs.AI
摘要
人类运动,作为一种固有的连续动态过程,对生成模型提出了重大挑战。尽管离散量化方法如VQ-VAEs在领域中占主导地位,但存在固有局限,包括表达能力受限和逐帧噪声伪影。连续方法虽然能够产生更加平滑和自然的运动,但往往因高维复杂性和有限训练数据而表现不佳。为了解决离散和连续表示之间的“不协调”,我们引入了DisCoRD:通过修正流解码将离散运动令牌转换为连续运动的新方法。通过在连续空间中采用迭代细化过程,DisCoRD捕捉了细粒度动态并确保更加平滑和自然的运动。我们的方法与任何基于离散的框架兼容,增强了自然性,同时不影响对条件信号的忠实性。广泛的评估表明,DisCoRD在HumanML3D上的FID为0.032,在KIT-ML上为0.169,实现了最先进的性能。这些结果巩固了DisCoRD作为弥合离散效率和连续逼真性差距的强大解决方案。我们的项目页面链接为:https://whwjdqls.github.io/discord.github.io/。
English
Human motion, inherently continuous and dynamic, presents significant
challenges for generative models. Despite their dominance, discrete
quantization methods, such as VQ-VAEs, suffer from inherent limitations,
including restricted expressiveness and frame-wise noise artifacts. Continuous
approaches, while producing smoother and more natural motions, often falter due
to high-dimensional complexity and limited training data. To resolve this
"discord" between discrete and continuous representations, we introduce
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, a
novel method that decodes discrete motion tokens into continuous motion through
rectified flow. By employing an iterative refinement process in the continuous
space, DisCoRD captures fine-grained dynamics and ensures smoother and more
natural motions. Compatible with any discrete-based framework, our method
enhances naturalness without compromising faithfulness to the conditioning
signals. Extensive evaluations demonstrate that DisCoRD achieves
state-of-the-art performance, with FID of 0.032 on HumanML3D and 0.169 on
KIT-ML. These results solidify DisCoRD as a robust solution for bridging the
divide between discrete efficiency and continuous realism. Our project page is
available at: https://whwjdqls.github.io/discord.github.io/.Summary
AI-Generated Summary