ChatPaper.aiChatPaper

Goku:基于流的视频生成基础模型

Goku: Flow Based Video Generative Foundation Models

February 7, 2025
作者: Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu
cs.AI

摘要

本文介绍了Goku,这是一种最先进的联合图像和视频生成模型系列,利用矫正流Transformer实现了业界领先的性能。我们详细介绍了支持高质量视觉生成的基本要素,包括数据筛选流程、模型架构设计、流程制定以及用于高效稳健大规模训练的先进基础设施。Goku模型在定性和定量评估中展现出卓越性能,在主要任务中建立了新的基准。具体来说,Goku在GenEval上达到0.76,在DPG-Bench上达到83.65用于文本到图像生成,在VBench上达到84.85用于文本到视频任务。我们相信,这项工作为研究社区在开发联合图像和视频生成模型方面提供了宝贵的见解和实用进展。
English
This paper introduces Goku, a state-of-the-art family of joint image-and-video generation models leveraging rectified flow Transformers to achieve industry-leading performance. We detail the foundational elements enabling high-quality visual generation, including the data curation pipeline, model architecture design, flow formulation, and advanced infrastructure for efficient and robust large-scale training. The Goku models demonstrate superior performance in both qualitative and quantitative evaluations, setting new benchmarks across major tasks. Specifically, Goku achieves 0.76 on GenEval and 83.65 on DPG-Bench for text-to-image generation, and 84.85 on VBench for text-to-video tasks. We believe that this work provides valuable insights and practical advancements for the research community in developing joint image-and-video generation models.

Summary

AI-Generated Summary

PDF9712February 10, 2025