MotionLab:通过动作-条件-动作范式实现统一的人体动作生成和编辑
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
February 4, 2025
作者: Ziyan Guo, Zeyu Hu, Na Zhao, De Wen Soh
cs.AI
摘要
人类动作生成和编辑是计算机图形学和视觉的关键组成部分。然而,当前在这一领域的方法往往提供针对特定任务量身定制的孤立解决方案,这可能对实际应用效率低且不切实际。虽然一些努力旨在统一与动作相关的任务,但这些方法仅仅使用不同形式作为条件来引导动作生成。因此,它们缺乏编辑能力、细粒度控制,并且未能促进跨任务的知识共享。为了解决这些限制并提供一个能够处理人类动作生成和编辑的多功能统一框架,我们引入了一种新范式:动作-条件-动作,它能够统一表达多样的任务,包括三个概念:源动作、条件和目标动作。基于这一范式,我们提出了一个统一框架MotionLab,它结合了矫正流来学习从源动作到目标动作的映射,由指定条件引导。在MotionLab中,我们引入了1)MotionFlow Transformer来增强有条件的生成和编辑,而无需特定任务模块;2)对齐旋转位置编码以确保源动作和目标动作之间的时间同步;3)任务指定指令调制;以及4)动作课程学习,用于有效的多任务学习和跨任务的知识共享。值得注意的是,我们的MotionLab展示了在多个人类动作基准测试中具有良好的泛化能力和推理效率。我们的代码和额外的视频结果可在以下网址获取:https://diouo.github.io/motionlab.github.io/。
English
Human motion generation and editing are key components of computer graphics
and vision. However, current approaches in this field tend to offer isolated
solutions tailored to specific tasks, which can be inefficient and impractical
for real-world applications. While some efforts have aimed to unify
motion-related tasks, these methods simply use different modalities as
conditions to guide motion generation. Consequently, they lack editing
capabilities, fine-grained control, and fail to facilitate knowledge sharing
across tasks. To address these limitations and provide a versatile, unified
framework capable of handling both human motion generation and editing, we
introduce a novel paradigm: Motion-Condition-Motion, which enables the unified
formulation of diverse tasks with three concepts: source motion, condition, and
target motion. Based on this paradigm, we propose a unified framework,
MotionLab, which incorporates rectified flows to learn the mapping from source
motion to target motion, guided by the specified conditions. In MotionLab, we
introduce the 1) MotionFlow Transformer to enhance conditional generation and
editing without task-specific modules; 2) Aligned Rotational Position Encoding}
to guarantee the time synchronization between source motion and target motion;
3) Task Specified Instruction Modulation; and 4) Motion Curriculum Learning for
effective multi-task learning and knowledge sharing across tasks. Notably, our
MotionLab demonstrates promising generalization capabilities and inference
efficiency across multiple benchmarks for human motion. Our code and additional
video results are available at: https://diouo.github.io/motionlab.github.io/.Summary
AI-Generated Summary