InstantCharacter:基于可扩展扩散Transformer框架的任意角色个性化系统
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework
April 16, 2025
作者: Jiale Tao, Yanbing Zhang, Qixun Wang, Yiji Cheng, Haofan Wang, Xu Bai, Zhengguang Zhou, Ruihuang Li, Linqing Wang, Chunyu Wang, Qin Lin, Qinglin Lu
cs.AI
摘要
当前基于学习的角色定制方法主要依赖U-Net架构,存在泛化能力有限和图像质量受损的问题。与此同时,基于优化的方法需要进行特定角色的微调,这不可避免地降低了文本控制性。为解决这些挑战,我们提出了InstantCharacter,一个基于基础扩散变换器的可扩展角色定制框架。InstantCharacter展现出三大核心优势:首先,它实现了跨多样角色外观、姿态和风格的开域个性化,同时保持高保真效果。其次,该框架引入了带有堆叠变换器编码器的可扩展适配器,能有效处理开域角色特征,并无缝与现代扩散变换器的潜在空间交互。第三,为有效训练该框架,我们构建了一个包含千万级样本的大规模角色数据集。该数据集系统性地组织为配对(多视角角色)和非配对(文本-图像组合)子集。这种双重数据结构通过不同的学习路径,同步优化了身份一致性和文本可编辑性。定性实验展示了InstantCharacter在生成高保真、文本可控且角色一致的图像方面的先进能力,为角色驱动的图像生成设立了新基准。我们的源代码可在https://github.com/Tencent/InstantCharacter获取。
English
Current learning-based subject customization approaches, predominantly
relying on U-Net architectures, suffer from limited generalization ability and
compromised image quality. Meanwhile, optimization-based methods require
subject-specific fine-tuning, which inevitably degrades textual
controllability. To address these challenges, we propose InstantCharacter, a
scalable framework for character customization built upon a foundation
diffusion transformer. InstantCharacter demonstrates three fundamental
advantages: first, it achieves open-domain personalization across diverse
character appearances, poses, and styles while maintaining high-fidelity
results. Second, the framework introduces a scalable adapter with stacked
transformer encoders, which effectively processes open-domain character
features and seamlessly interacts with the latent space of modern diffusion
transformers. Third, to effectively train the framework, we construct a
large-scale character dataset containing 10-million-level samples. The dataset
is systematically organized into paired (multi-view character) and unpaired
(text-image combinations) subsets. This dual-data structure enables
simultaneous optimization of identity consistency and textual editability
through distinct learning pathways. Qualitative experiments demonstrate the
advanced capabilities of InstantCharacter in generating high-fidelity,
text-controllable, and character-consistent images, setting a new benchmark for
character-driven image generation. Our source code is available at
https://github.com/Tencent/InstantCharacter.Summary
AI-Generated Summary