ChatPaper.aiChatPaper

透過測試時訓練實現的一分鐘影片生成

One-Minute Video Generation with Test-Time Training

April 7, 2025
作者: Karan Dalal, Daniel Koceja, Gashon Hussein, Jiarui Xu, Yue Zhao, Youjin Song, Shihao Han, Ka Chun Cheung, Jan Kautz, Carlos Guestrin, Tatsunori Hashimoto, Sanmi Koyejo, Yejin Choi, Yu Sun, Xiaolong Wang
cs.AI

摘要

現今的Transformer模型在生成一分鐘影片時仍面臨挑戰,因為自注意力層在處理長上下文時效率低下。替代方案如Mamba層則因隱藏狀態的表達能力不足,難以應對複雜的多場景故事。我們嘗試了測試時訓練(TTT)層,其隱藏狀態本身可以是神經網絡,因此更具表達力。將TTT層加入預訓練的Transformer中,使其能夠從文字故事板生成一分鐘影片。作為概念驗證,我們基於《湯姆與傑利》卡通策劃了一個數據集。與Mamba~2、門控DeltaNet及滑動窗口注意力層等基線相比,TTT層生成的影片在講述複雜故事時更為連貫,在每種方法100部影片的人類評估中以34 Elo分領先。儘管前景看好,結果仍存在瑕疵,這可能歸因於預訓練的50億參數模型能力有限。我們的實現效率也有提升空間。由於資源限制,我們僅實驗了一分鐘影片,但該方法可擴展至更長影片及更複雜的故事。樣本影片、程式碼與註解可於以下網址取得:https://test-time-training.github.io/video-dit
English
Transformers today still struggle to generate one-minute videos because self-attention layers are inefficient for long context. Alternatives such as Mamba layers struggle with complex multi-scene stories because their hidden states are less expressive. We experiment with Test-Time Training (TTT) layers, whose hidden states themselves can be neural networks, therefore more expressive. Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos from text storyboards. For proof of concept, we curate a dataset based on Tom and Jerry cartoons. Compared to baselines such as Mamba~2, Gated DeltaNet, and sliding-window attention layers, TTT layers generate much more coherent videos that tell complex stories, leading by 34 Elo points in a human evaluation of 100 videos per method. Although promising, results still contain artifacts, likely due to the limited capability of the pre-trained 5B model. The efficiency of our implementation can also be improved. We have only experimented with one-minute videos due to resource constraints, but the approach can be extended to longer videos and more complex stories. Sample videos, code and annotations are available at: https://test-time-training.github.io/video-dit

Summary

AI-Generated Summary

PDF944April 8, 2025