2Dに移動する：2D条件付き人間の動き生成

要旨

現実的な人間のビデオを生成することは依然として困難な課題であり、最も効果的な手法は現在、人間の動きのシーケンスを制御信号として利用しています。既存のアプローチは、他のビデオから抽出された既存の動きを使用することが一般的であり、これにより特定の動きタイプやグローバルシーンの一致に制約が生じます。私たちは、シーン画像に応じて異なるシーンに適応する多様な動きを可能にする、Move-in-2Dという新しいアプローチを提案します。私たちのアプローチは、シーン画像とテキストプロンプトの両方を入力として受け入れる拡散モデルを利用し、シーンに合わせた動きシーケンスを生成します。このモデルを訓練するために、単一の人間の活動を特集した大規模なビデオデータセットを収集し、各ビデオに対応する人間の動きをターゲット出力として注釈付けします。実験では、私たちの手法が、射影後にシーン画像と整合する人間の動きを効果的に予測することを示しています。さらに、生成された動きシーケンスがビデオ合成タスクにおいて人間の動きの品質を向上させることを示しています。

English

Generating realistic human videos remains a challenging task, with the most effective methods currently relying on a human motion sequence as a control signal. Existing approaches often use existing motion extracted from other videos, which restricts applications to specific motion types and global scene matching. We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image, allowing for diverse motion that adapts to different scenes. Our approach utilizes a diffusion model that accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene. To train this model, we collect a large-scale video dataset featuring single-human activities, annotating each video with the corresponding human motion as the target output. Experiments demonstrate that our method effectively predicts human motion that aligns with the scene image after projection. Furthermore, we show that the generated motion sequence improves human motion quality in video synthesis tasks.

2Dに移動する：2D条件付き人間の動き生成

Move-in-2D: 2D-Conditioned Human Motion Generation

要旨

Summary

Support

Support