InstantCharacter:基於可擴展擴散變換器框架的個性化角色生成
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework
April 16, 2025
作者: Jiale Tao, Yanbing Zhang, Qixun Wang, Yiji Cheng, Haofan Wang, Xu Bai, Zhengguang Zhou, Ruihuang Li, Linqing Wang, Chunyu Wang, Qin Lin, Qinglin Lu
cs.AI
摘要
當前基於學習的主體定制方法,主要依賴於U-Net架構,存在泛化能力有限和圖像質量受損的問題。與此同時,基於優化的方法需要針對特定主體進行微調,這不可避免地降低了文本的可控性。為解決這些挑戰,我們提出了InstantCharacter,這是一個基於基礎擴散變換器的可擴展角色定制框架。InstantCharacter展現了三個基本優勢:首先,它能夠在保持高保真結果的同時,實現跨多樣角色外觀、姿態和風格的開放域個性化。其次,該框架引入了一個帶有堆疊變換器編碼器的可擴展適配器,有效處理開放域角色特徵,並與現代擴散變換器的潛在空間無縫交互。第三,為了有效訓練該框架,我們構建了一個包含千萬級樣本的大規模角色數據集。該數據集系統地組織成配對(多視角角色)和非配對(文本-圖像組合)子集。這種雙數據結構通過不同的學習路徑,實現了身份一致性和文本可編輯性的同步優化。定性實驗展示了InstantCharacter在生成高保真、文本可控且角色一致的圖像方面的先進能力,為角色驅動的圖像生成樹立了新標杆。我們的源代碼可在https://github.com/Tencent/InstantCharacter獲取。
English
Current learning-based subject customization approaches, predominantly
relying on U-Net architectures, suffer from limited generalization ability and
compromised image quality. Meanwhile, optimization-based methods require
subject-specific fine-tuning, which inevitably degrades textual
controllability. To address these challenges, we propose InstantCharacter, a
scalable framework for character customization built upon a foundation
diffusion transformer. InstantCharacter demonstrates three fundamental
advantages: first, it achieves open-domain personalization across diverse
character appearances, poses, and styles while maintaining high-fidelity
results. Second, the framework introduces a scalable adapter with stacked
transformer encoders, which effectively processes open-domain character
features and seamlessly interacts with the latent space of modern diffusion
transformers. Third, to effectively train the framework, we construct a
large-scale character dataset containing 10-million-level samples. The dataset
is systematically organized into paired (multi-view character) and unpaired
(text-image combinations) subsets. This dual-data structure enables
simultaneous optimization of identity consistency and textual editability
through distinct learning pathways. Qualitative experiments demonstrate the
advanced capabilities of InstantCharacter in generating high-fidelity,
text-controllable, and character-consistent images, setting a new benchmark for
character-driven image generation. Our source code is available at
https://github.com/Tencent/InstantCharacter.Summary
AI-Generated Summary