MixerMDM：可学习的人体运动扩散模型组合

摘要

在文本描述等条件引导下生成人体运动极具挑战性，这主要源于需要具备高质量运动及其对应条件配对的数据集。当追求更精细的生成控制时，这一难度进一步加大。为此，先前的研究提出将多个在不同类型条件数据集上预训练的运动扩散模型相结合，从而实现多条件控制。然而，这些提出的融合策略忽视了结合生成过程的最佳方式可能依赖于每个预训练生成模型的特性以及具体的文本描述。在此背景下，我们引入了MixerMDM，这是首个可学习的模型组合技术，用于结合预训练的文本条件人体运动扩散模型。与以往方法不同，MixerMDM提供了一种动态混合策略，通过对抗训练学习如何根据驱动生成的条件集合来结合每个模型的去噪过程。通过使用MixerMDM结合单人和多人运动扩散模型，我们实现了对每个人动态的精细控制，以及对整体互动的调控。此外，我们提出了一种新的评估技术，首次在该任务中通过计算混合生成运动与其条件之间的对齐度，以及MixerMDM根据待混合运动调整整个去噪过程中混合方式的能力，来衡量互动质量和个体质量。

English

Generating human motion guided by conditions such as textual descriptions is challenging due to the need for datasets with pairs of high-quality motion and their corresponding conditions. The difficulty increases when aiming for finer control in the generation. To that end, prior works have proposed to combine several motion diffusion models pre-trained on datasets with different types of conditions, thus allowing control with multiple conditions. However, the proposed merging strategies overlook that the optimal way to combine the generation processes might depend on the particularities of each pre-trained generative model and also the specific textual descriptions. In this context, we introduce MixerMDM, the first learnable model composition technique for combining pre-trained text-conditioned human motion diffusion models. Unlike previous approaches, MixerMDM provides a dynamic mixing strategy that is trained in an adversarial fashion to learn to combine the denoising process of each model depending on the set of conditions driving the generation. By using MixerMDM to combine single- and multi-person motion diffusion models, we achieve fine-grained control on the dynamics of every person individually, and also on the overall interaction. Furthermore, we propose a new evaluation technique that, for the first time in this task, measures the interaction and individual quality by computing the alignment between the mixed generated motions and their conditions as well as the capabilities of MixerMDM to adapt the mixing throughout the denoising process depending on the motions to mix.

MixerMDM：可学习的人体运动扩散模型组合

MixerMDM: Learnable Composition of Human Motion Diffusion Models

摘要

Summary

Support

Support