ChatPaper.aiChatPaper

Seedream 3.0 技术报告

Seedream 3.0 Technical Report

April 15, 2025
作者: Yu Gao, Lixue Gong, Qiushan Guo, Xiaoxia Hou, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Rui Wang, Xuanda Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Zhonghua Zhai, Xinyu Zhang, Qi Zhang, Yuwei Zhang, Shijia Zhao, Jianchao Yang, Weilin Huang
cs.AI

摘要

我们推出Seedream 3.0,这是一款高性能的中英双语图像生成基础模型。针对Seedream 2.0存在的多项挑战,我们开发了多项技术改进,包括复杂提示的对齐、精细排版生成、视觉美学与保真度的优化以及图像分辨率的提升。具体而言,Seedream 3.0的进步源于从数据构建到模型部署整个流程的全面优化。在数据层面,我们采用缺陷感知训练范式与双轴协作数据采样框架,使数据集规模翻倍。此外,在预训练阶段,我们引入了混合分辨率训练、跨模态RoPE、表示对齐损失及分辨率感知时间步采样等多项有效技术。在训练后阶段,我们利用多样化的美学描述进行SFT,并采用基于VLM的奖励模型进行缩放,从而实现了与人类偏好高度契合的输出。更为重要的是,Seedream 3.0开创了一种新颖的加速范式。通过采用一致噪声期望与重要性感知时间步采样,我们在保持图像质量的同时实现了4至8倍的加速。相较于Seedream 2.0,Seedream 3.0展现出显著提升:它增强了整体能力,特别是在复杂汉字文本渲染方面,这对于专业排版生成至关重要。此外,它还提供了原生高分辨率输出(最高可达2K),能够生成具有高视觉品质的图像。
English
We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 stem from improvements across the entire pipeline, from data construction to model deployment. At the data stratum, we double the dataset using a defect-aware training paradigm and a dual-axis collaborative data-sampling framework. Furthermore, we adopt several effective techniques such as mixed-resolution training, cross-modality RoPE, representation alignment loss, and resolution-aware timestep sampling in the pre-training phase. During the post-training stage, we utilize diversified aesthetic captions in SFT, and a VLM-based reward model with scaling, thereby achieving outputs that well align with human preferences. Furthermore, Seedream 3.0 pioneers a novel acceleration paradigm. By employing consistent noise expectation and importance-aware timestep sampling, we achieve a 4 to 8 times speedup while maintaining image quality. Seedream 3.0 demonstrates significant improvements over Seedream 2.0: it enhances overall capabilities, in particular for text-rendering in complicated Chinese characters which is important to professional typography generation. In addition, it provides native high-resolution output (up to 2K), allowing it to generate images with high visual quality.

Summary

AI-Generated Summary

PDF465April 16, 2025