DiffSensei:将多模态LLM和扩散模型相结合,实现定制漫画生成
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
December 10, 2024
作者: Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong
cs.AI
摘要
故事可视化是从文本描述中创建视觉叙事的任务,在文本到图像生成模型方面取得了进展。然而,这些模型通常缺乏对角色外观和互动的有效控制,特别是在多角色场景中。为了解决这些限制,我们提出了一个新任务:定制漫画生成,并引入了DiffSensei,这是一个专门设计用于生成具有动态多角色控制的漫画的创新框架。DiffSensei将基于扩散的图像生成器与多模态大型语言模型(MLLM)集成在一起,后者充当文本兼容的身份适配器。我们的方法采用了掩码交叉注意力,无缝地整合了角色特征,实现了精确的布局控制,而无需直接像素传输。此外,基于MLLM的适配器调整角色特征以与面板特定文本线索对齐,允许对角色表情、姿势和动作进行灵活调整。我们还介绍了MangaZero,这是一个专为这一任务量身定制的大规模数据集,包含43,264页漫画和427,147个带注释的面板,支持在连续帧中可视化各种角色互动和动作。广泛的实验证明,DiffSensei优于现有模型,在漫画生成方面取得了重大进展,实现了可适应文本的角色定制。项目页面链接为https://jianzongwu.github.io/projects/diffsensei/。
English
Story visualization, the task of creating visual narratives from textual
descriptions, has seen progress with text-to-image generation models. However,
these models often lack effective control over character appearances and
interactions, particularly in multi-character scenes. To address these
limitations, we propose a new task: customized manga generation and
introduce DiffSensei, an innovative framework specifically designed
for generating manga with dynamic multi-character control. DiffSensei
integrates a diffusion-based image generator with a multimodal large language
model (MLLM) that acts as a text-compatible identity adapter. Our approach
employs masked cross-attention to seamlessly incorporate character features,
enabling precise layout control without direct pixel transfer. Additionally,
the MLLM-based adapter adjusts character features to align with panel-specific
text cues, allowing flexible adjustments in character expressions, poses, and
actions. We also introduce MangaZero, a large-scale dataset tailored
to this task, containing 43,264 manga pages and 427,147 annotated panels,
supporting the visualization of varied character interactions and movements
across sequential frames. Extensive experiments demonstrate that DiffSensei
outperforms existing models, marking a significant advancement in manga
generation by enabling text-adaptable character customization. The project page
is https://jianzongwu.github.io/projects/diffsensei/.Summary
AI-Generated Summary