视频作者:迈向长篇故事视频生成
VideoAuteur: Towards Long Narrative Video Generation
January 10, 2025
作者: Junfei Xiao, Feng Cheng, Lu Qi, Liangke Gui, Jiepeng Cen, Zhibei Ma, Alan Yuille, Lu Jiang
cs.AI
摘要
最近的视频生成模型展示了在生成持续几秒钟的高质量视频剪辑方面的有希望的结果。然而,这些模型在生成传达清晰和信息丰富事件的长序列方面面临挑战,从而限制了它们支持连贯叙述的能力。在本文中,我们提出了一个大规模烹饪视频数据集,旨在推动烹饪领域长篇叙事生成的发展。我们利用最先进的视觉语言模型(VLMs)和视频生成模型分别验证了我们提出的数据集在视觉保真度和文本描述准确性方面的质量。我们进一步引入了一个长篇叙事视频导演,以增强生成视频中的视觉和语义连贯性,并强调了调整视觉嵌入以实现整体视频质量改善的作用。我们的方法展示了在生成视觉详细且语义对齐的关键帧方面的显著改进,支持通过在视频生成过程中整合文本和图像嵌入的微调技术。项目页面:https://videoauteur.github.io/
English
Recent video generation models have shown promising results in producing
high-quality video clips lasting several seconds. However, these models face
challenges in generating long sequences that convey clear and informative
events, limiting their ability to support coherent narrations. In this paper,
we present a large-scale cooking video dataset designed to advance long-form
narrative generation in the cooking domain. We validate the quality of our
proposed dataset in terms of visual fidelity and textual caption accuracy using
state-of-the-art Vision-Language Models (VLMs) and video generation models,
respectively. We further introduce a Long Narrative Video Director to enhance
both visual and semantic coherence in generated videos and emphasize the role
of aligning visual embeddings to achieve improved overall video quality. Our
method demonstrates substantial improvements in generating visually detailed
and semantically aligned keyframes, supported by finetuning techniques that
integrate text and image embeddings within the video generation process.
Project page: https://videoauteur.github.io/Summary
AI-Generated Summary
论文概述
该文介绍了一种名为VideoAuteur的长篇叙事视频生成模型,旨在解决现有视频生成模型在生成长序列方面的挑战。研究团队构建了CookGen数据集,用于推动烹饪领域的长篇叙事生成研究。通过长篇叙事视频导演模型,整合文本和图像嵌入技术,实现了生成视觉详细且语义对齐的关键帧,取得显著改进。
核心贡献
- 提出VideoAuteur长篇叙事视频生成模型。
- 构建CookGen数据集支持长篇叙事视频生成研究。
- 长篇叙事视频导演模型提高视频的视觉和语义连贯性。
研究背景
该研究针对现有视频生成模型在生成长序列方面的挑战展开。通过构建CookGen数据集,提供详细的烹饪视频注释,推动长篇叙事视频生成研究。存在技术挑战包括字幕与动作对齐、视觉质量和连贯性。
研究方法
研究采用长篇叙事视频导演模型和视觉条件视频生成模型,整合文本和图像嵌入技术,提高视频生成的视觉和语义连贯性。实验验证了提出方法的有效性,通过回归目标和视觉条件视频生成提高了生成视频的质量和稳健性。
实验验证
- 使用IoU作为度量标准,确保字幕与动作之间的对齐。
- 通过人类评估和FVD评估验证数据集包含高质量字幕。
- 提出VideoAuteur流水线,包括长篇叙事视频导演和视觉条件视频生成两个主要组件。
- 实验结果表明,交错式方法在视觉一致性和现实感方面表现更好。
影响与启示
- 长篇叙事视频生成中,视觉一致性比美学得分更为重要。
- 通过视觉潜在变量调节视频生成模型,提高了语义一致性和视频质量。
- 空间和时间上的一致性提升为长篇叙事视频生成领域的进一步研究提供了希望。
该文提出了一种创新的长篇叙事视频生成模型,通过实验验证了其在视觉和语义连贯性方面的显著改进,为长篇视频生成领域带来新的研究思路。
1比特LLM时代:所有大型语言模型均为1.58比特。The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
1比特LLM时代:所有大型语言模型均为1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei•Feb 27, 2024•612142
DeepSeek-R1:通过强化学习激励LLMs中的推理能力DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-R1:通过强化学习激励LLMs中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen Zhang•Jan 22, 2025•3745
Qwen2.5 技术报告Qwen2.5 Technical Report
Qwen2.5 技术报告
Qwen2.5 Technical Report
Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu•Dec 19, 2024•36411