ChatPaper.aiChatPaper

UniF^2ace:基于统一多模态模型的细粒度人脸理解与生成

UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

March 11, 2025
作者: Junzhe Li, Xuerui Qiu, Linrui Xu, Liya Guo, Delin Qu, Tingting Long, Chun Fan, Ming Li
cs.AI

摘要

统一多模态模型(UMMs)已成为基础计算机视觉研究中的一个强大范式,在图像理解和生成方面展现出显著潜力。然而,现有的人脸领域研究主要集中于粗粒度面部属性的理解,处理细粒度面部属性的能力有限,且未涉及生成能力。为克服这些局限,我们提出了UniF^2ace,这是首个专为细粒度人脸理解与生成量身定制的UMM。总体而言,我们利用两种互补的扩散技术和两级专家混合架构,在自建的专业数据集上训练UniF^2ace。具体来说,我们首先构建了一个大规模人脸数据集UniF^2ace-130K,包含13万张图像-文本对及覆盖广泛面部属性的百万级问答对。其次,我们建立了离散扩散评分匹配与掩码生成模型之间的理论联系,同时优化两者的证据下界,显著提升了模型合成面部细节的能力。最后,我们引入了令牌级和序列级的专家混合机制,为理解和生成任务实现了高效的细粒度表示学习。在UniF^2ace-130K上的大量实验表明,UniF^2ace在理解和生成任务上均超越了现有的UMMs和生成模型,展现出卓越的性能。
English
Unified multimodal models (UMMs) have emerged as a powerful paradigm in foundational computer vision research, demonstrating significant potential in both image understanding and generation. However, existing research in the face domain primarily focuses on coarse facial attribute understanding, with limited capacity to handle fine-grained facial attributes and without addressing generation capabilities. To overcome these limitations, we propose UniF^2ace, the first UMM tailored specifically for fine-grained face understanding and generation. In general, we train UniF^2ace on a self-constructed, specialized dataset utilizing two mutually beneficial diffusion techniques and a two-level mixture-of-experts architecture. Specifically, we first build a large-scale facial dataset, UniF^2ace-130K, which contains 130K image-text pairs with one million question-answering pairs that span a wide range of facial attributes. Second, we establish a theoretical connection between discrete diffusion score matching and masked generative models, optimizing both evidence lower bounds simultaneously, which significantly improves the model's ability to synthesize facial details. Finally, we introduce both token-level and sequence-level mixture-of-experts, enabling efficient fine-grained representation learning for both understanding and generation tasks. Extensive experiments on UniF^2ace-130K demonstrate that UniF^2ace outperforms existing UMMs and generative models, achieving superior performance across both understanding and generation tasks.

Summary

AI-Generated Summary

PDF273March 12, 2025