Agent-to-Sim：從非正式長期影片中學習互動行為模型

摘要

我們提出了Agent-to-Sim（ATS）框架，用於從日常長期視頻收集中學習3D代理的互動行為模型。與依賴基於標記的跟踪和多視角攝像頭的先前作品不同，ATS通過在單一環境中長時間記錄的視頻觀察，非侵入性地學習動物和人類代理的自然行為。建模代理的3D行為需要在長時間內持續進行3D跟踪（例如，了解哪個點對應於哪個點）。為了獲取這樣的數據，我們開發了一種從粗到細的配准方法，通過一個規範的3D空間隨時間跟踪代理和攝像機，從而產生完整且持久的時空4D表示。然後，我們使用從4D重建中查詢的代理感知和運動的配對數據來訓練一個生成模型的代理行為。ATS實現了從代理的視頻記錄到互動行為模擬器的實時轉換。我們展示了對寵物（例如貓、狗、兔子）和人類的結果，這些結果是通過智能手機拍攝的單眼RGBD視頻。

English

We present Agent-to-Sim (ATS), a framework for learning interactive behavior models of 3D agents from casual longitudinal video collections. Different from prior works that rely on marker-based tracking and multiview cameras, ATS learns natural behaviors of animal and human agents non-invasively through video observations recorded over a long time-span (e.g., a month) in a single environment. Modeling 3D behavior of an agent requires persistent 3D tracking (e.g., knowing which point corresponds to which) over a long time period. To obtain such data, we develop a coarse-to-fine registration method that tracks the agent and the camera over time through a canonical 3D space, resulting in a complete and persistent spacetime 4D representation. We then train a generative model of agent behaviors using paired data of perception and motion of an agent queried from the 4D reconstruction. ATS enables real-to-sim transfer from video recordings of an agent to an interactive behavior simulator. We demonstrate results on pets (e.g., cat, dog, bunny) and human given monocular RGBD videos captured by a smartphone.

Agent-to-Sim：從非正式長期影片中學習互動行為模型

Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos

摘要

Summary

Support

Support