ChatPaper.aiChatPaper

ROICtrl:增强视觉生成的实例控制

ROICtrl: Boosting Instance Control for Visual Generation

November 27, 2024
作者: Yuchao Gu, Yipin Zhou, Yunfan Ye, Yixin Nie, Licheng Yu, Pingchuan Ma, Kevin Qinghong Lin, Mike Zheng Shou
cs.AI

摘要

自然语言经常难以准确地将位置和属性信息与多个实例关联起来,这限制了当前基于文本的视觉生成模型仅能处理包含少数主要实例的简单构图。为了解决这一局限性,本研究通过引入区域实例控制来增强扩散模型,其中每个实例由一个边界框和一个自由形式的标题配对控制。该领域的先前方法通常依赖于隐式位置编码或显式注意力蒙版来分离感兴趣的区域(ROIs),从而导致要么注入不准确的坐标,要么计算开销巨大。受目标检测中的ROI-Align启发,我们引入了一个称为ROI-Unpool的互补操作。ROI-Align和ROI-Unpool共同实现了对高分辨率特征图上明确、高效和准确的ROI操作。基于ROI-Unpool,我们提出了ROICtrl,这是一个适配器,可用于预训练的扩散模型,实现精确的区域实例控制。ROICtrl与社区微调的扩散模型兼容,也与现有的基于空间的附加组件(如ControlNet、T2I-Adapter)和基于嵌入的附加组件(如IP-Adapter、ED-LoRA)兼容,将它们的应用扩展到多实例生成。实验证明,ROICtrl在区域实例控制方面表现出优越性能,同时显著降低了计算成本。
English
Natural language often struggles to accurately associate positional and attribute information with multiple instances, which limits current text-based visual generation models to simpler compositions featuring only a few dominant instances. To address this limitation, this work enhances diffusion models by introducing regional instance control, where each instance is governed by a bounding box paired with a free-form caption. Previous methods in this area typically rely on implicit position encoding or explicit attention masks to separate regions of interest (ROIs), resulting in either inaccurate coordinate injection or large computational overhead. Inspired by ROI-Align in object detection, we introduce a complementary operation called ROI-Unpool. Together, ROI-Align and ROI-Unpool enable explicit, efficient, and accurate ROI manipulation on high-resolution feature maps for visual generation. Building on ROI-Unpool, we propose ROICtrl, an adapter for pretrained diffusion models that enables precise regional instance control. ROICtrl is compatible with community-finetuned diffusion models, as well as with existing spatial-based add-ons (\eg, ControlNet, T2I-Adapter) and embedding-based add-ons (\eg, IP-Adapter, ED-LoRA), extending their applications to multi-instance generation. Experiments show that ROICtrl achieves superior performance in regional instance control while significantly reducing computational costs.

Summary

AI-Generated Summary

PDF712November 28, 2024