Cobra:利用更廣泛參考實現高效線稿上色
Cobra: Efficient Line Art COlorization with BRoAder References
April 16, 2025
作者: Junhao Zhuang, Lingen Li, Xuan Ju, Zhaoyang Zhang, Chun Yuan, Ying Shan
cs.AI
摘要
漫畫製作行業需要基於參考的線稿上色技術,要求具備高精度、高效率、上下文一致性以及靈活的控制能力。一頁漫畫通常包含多樣的角色、物體和背景,這使得上色過程變得複雜。儘管擴散模型在圖像生成方面取得了進展,但其在線稿上色中的應用仍有限,面臨處理大量參考圖像、耗時的推理過程以及靈活控制的挑戰。我們探討了廣泛上下文圖像指導對線稿上色質量的必要性。為應對這些挑戰,我們提出了Cobra,這是一種高效且多功能的方法,支持顏色提示並利用超過200張參考圖像,同時保持低延遲。Cobra的核心是因果稀疏DiT架構,該架構利用特別設計的位置編碼、因果稀疏注意力機制和鍵值緩存,有效管理長上下文參考並確保顏色身份的一致性。結果表明,Cobra通過廣泛的上下文參考實現了精確的線稿上色,顯著提升了推理速度和交互性,從而滿足了關鍵的工業需求。我們已在項目頁面發佈了代碼和模型:https://zhuang2002.github.io/Cobra/。
English
The comic production industry requires reference-based line art colorization
with high accuracy, efficiency, contextual consistency, and flexible control. A
comic page often involves diverse characters, objects, and backgrounds, which
complicates the coloring process. Despite advancements in diffusion models for
image generation, their application in line art colorization remains limited,
facing challenges related to handling extensive reference images,
time-consuming inference, and flexible control. We investigate the necessity of
extensive contextual image guidance on the quality of line art colorization. To
address these challenges, we introduce Cobra, an efficient and versatile method
that supports color hints and utilizes over 200 reference images while
maintaining low latency. Central to Cobra is a Causal Sparse DiT architecture,
which leverages specially designed positional encodings, causal sparse
attention, and Key-Value Cache to effectively manage long-context references
and ensure color identity consistency. Results demonstrate that Cobra achieves
accurate line art colorization through extensive contextual reference,
significantly enhancing inference speed and interactivity, thereby meeting
critical industrial demands. We release our codes and models on our project
page: https://zhuang2002.github.io/Cobra/.Summary
AI-Generated Summary