ROICtrl:提升視覺生成的實例控制

ROICtrl: Boosting Instance Control for Visual Generation

November 27, 2024
作者: Yuchao Gu, Yipin Zhou, Yunfan Ye, Yixin Nie, Licheng Yu, Pingchuan Ma, Kevin Qinghong Lin, Mike Zheng Shou
cs.AI

摘要

自然語言通常難以準確地將位置和屬性信息與多個實例關聯起來,這限制了當前基於文本的視覺生成模型僅能處理較簡單組合,其中僅包含少數主要實例。為解決這一限制,本研究通過引入區域實例控制來增強擴散模型,其中每個實例由與自由形式標題配對的邊界框控制。該領域的先前方法通常依賴於隱式位置編碼或顯式注意力遮罩來分離感興趣區域(ROIs),這導致注入坐標不準確或計算開銷過大。受物體檢測中的ROI-Align啟發,我們引入了一個稱為ROI-Unpool的互補操作。ROI-Align和ROI-Unpool共同在高分辨率特徵圖上實珅了明確、高效且準確的ROI操作。基於ROI-Unpool,我們提出了ROICtrl,這是一個用於預訓練擴散模型的適配器,實現精確的區域實例控制。ROICtrl與社區微調的擴散模型兼容,同時也兼容現有的基於空間的附加組件(例如ControlNet、T2I-Adapter)和基於嵌入的附加組件(例如IP-Adapter、ED-LoRA),將它們的應用擴展到多實例生成。實驗表明,ROICtrl在區域實例控制方面實現了優越性能,同時顯著降低了計算成本。
English
Natural language often struggles to accurately associate positional and attribute information with multiple instances, which limits current text-based visual generation models to simpler compositions featuring only a few dominant instances. To address this limitation, this work enhances diffusion models by introducing regional instance control, where each instance is governed by a bounding box paired with a free-form caption. Previous methods in this area typically rely on implicit position encoding or explicit attention masks to separate regions of interest (ROIs), resulting in either inaccurate coordinate injection or large computational overhead. Inspired by ROI-Align in object detection, we introduce a complementary operation called ROI-Unpool. Together, ROI-Align and ROI-Unpool enable explicit, efficient, and accurate ROI manipulation on high-resolution feature maps for visual generation. Building on ROI-Unpool, we propose ROICtrl, an adapter for pretrained diffusion models that enables precise regional instance control. ROICtrl is compatible with community-finetuned diffusion models, as well as with existing spatial-based add-ons (\eg, ControlNet, T2I-Adapter) and embedding-based add-ons (\eg, IP-Adapter, ED-LoRA), extending their applications to multi-instance generation. Experiments show that ROICtrl achieves superior performance in regional instance control while significantly reducing computational costs.

Summary

AI-Generated Summary

PDF712November 28, 2024