Seedream 3.0 技術報告
Seedream 3.0 Technical Report
April 15, 2025
作者: Yu Gao, Lixue Gong, Qiushan Guo, Xiaoxia Hou, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Rui Wang, Xuanda Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Zhonghua Zhai, Xinyu Zhang, Qi Zhang, Yuwei Zhang, Shijia Zhao, Jianchao Yang, Weilin Huang
cs.AI
摘要
我們推出Seedream 3.0,這是一個高性能的中英雙語圖像生成基礎模型。我們開發了多項技術改進,以解決Seedream 2.0中存在的挑戰,包括與複雜提示的對齊、精細的排版生成、視覺美學與保真度的不足,以及有限的圖像分辨率。具體而言,Seedream 3.0的進步源自於從數據構建到模型部署的整個流程的改進。在數據層面,我們通過缺陷感知訓練範式和雙軸協作數據採樣框架,將數據集規模翻倍。此外,我們在預訓練階段採用了多種有效技術,如混合分辨率訓練、跨模態RoPE、表示對齊損失和分辨率感知時間步採樣。在後訓練階段,我們在SFT中使用了多樣化的美學描述,並結合基於VLM的獎勵模型進行擴展,從而實現了與人類偏好高度一致的輸出。此外,Seedream 3.0開創了一種新穎的加速範式。通過採用一致的噪聲期望和重要性感知時間步採樣,我們在保持圖像質量的同時實現了4到8倍的加速。Seedream 3.0相較於Seedream 2.0展現了顯著的改進:它提升了整體能力,特別是在複雜漢字的文本渲染方面,這對於專業排版生成至關重要。此外,它還提供了原生高分辨率輸出(最高可達2K),使其能夠生成具有高視覺質量的圖像。
English
We present Seedream 3.0, a high-performance Chinese-English bilingual image
generation foundation model. We develop several technical improvements to
address existing challenges in Seedream 2.0, including alignment with
complicated prompts, fine-grained typography generation, suboptimal visual
aesthetics and fidelity, and limited image resolutions. Specifically, the
advancements of Seedream 3.0 stem from improvements across the entire pipeline,
from data construction to model deployment. At the data stratum, we double the
dataset using a defect-aware training paradigm and a dual-axis collaborative
data-sampling framework. Furthermore, we adopt several effective techniques
such as mixed-resolution training, cross-modality RoPE, representation
alignment loss, and resolution-aware timestep sampling in the pre-training
phase. During the post-training stage, we utilize diversified aesthetic
captions in SFT, and a VLM-based reward model with scaling, thereby achieving
outputs that well align with human preferences. Furthermore, Seedream 3.0
pioneers a novel acceleration paradigm. By employing consistent noise
expectation and importance-aware timestep sampling, we achieve a 4 to 8 times
speedup while maintaining image quality. Seedream 3.0 demonstrates significant
improvements over Seedream 2.0: it enhances overall capabilities, in particular
for text-rendering in complicated Chinese characters which is important to
professional typography generation. In addition, it provides native
high-resolution output (up to 2K), allowing it to generate images with high
visual quality.Summary
AI-Generated Summary