FiVA:用于文本到图像扩散模型的细粒度视觉属性数据集
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
December 10, 2024
作者: Tong Wu, Yinghao Xu, Ryan Po, Mengchen Zhang, Guandao Yang, Jiaqi Wang, Ziwei Liu, Dahua Lin, Gordon Wetzstein
cs.AI
摘要
最近在文本到图像生成领域取得的进展使得可以创造具有多种应用的高质量图像。然而,准确描述所需的视觉属性可能具有挑战性,特别是对于艺术和摄影领域的非专家。一种直观的解决方案是从源图像中采用有利的属性。当前的方法尝试从源图像中提取身份和风格。然而,“风格”是一个广泛的概念,包括纹理、颜色和艺术元素,但并不涵盖其他重要属性,比如光照和动态。此外,简化的“风格”调整会阻止将来自不同源的多个属性组合到一个生成的图像中。在这项工作中,我们制定了一种更有效的方法,将图片的美学分解为特定的视觉属性,使用户能够从不同图像中应用光照、纹理和动态等特征。为了实现这一目标,据我们所知,我们构建了第一个细粒度视觉属性数据集(FiVA)。这个FiVA数据集具有一个良好组织的视觉属性分类法,并包括约1百万张带有视觉属性注释的高质量生成图像。利用这个数据集,我们提出了一种细粒度视觉属性调整框架(FiVA-Adapter),它可以将一个或多个源图像中的视觉属性解耦并调整到生成的图像中。这种方法增强了用户友好的定制功能,使用户能够选择性地应用所需的属性,创造符合其独特偏好和具体内容要求的图像。
English
Recent advances in text-to-image generation have enabled the creation of
high-quality images with diverse applications. However, accurately describing
desired visual attributes can be challenging, especially for non-experts in art
and photography. An intuitive solution involves adopting favorable attributes
from the source images. Current methods attempt to distill identity and style
from source images. However, "style" is a broad concept that includes texture,
color, and artistic elements, but does not cover other important attributes
such as lighting and dynamics. Additionally, a simplified "style" adaptation
prevents combining multiple attributes from different sources into one
generated image. In this work, we formulate a more effective approach to
decompose the aesthetics of a picture into specific visual attributes, allowing
users to apply characteristics such as lighting, texture, and dynamics from
different images. To achieve this goal, we constructed the first fine-grained
visual attributes dataset (FiVA) to the best of our knowledge. This FiVA
dataset features a well-organized taxonomy for visual attributes and includes
around 1 M high-quality generated images with visual attribute annotations.
Leveraging this dataset, we propose a fine-grained visual attribute adaptation
framework (FiVA-Adapter), which decouples and adapts visual attributes from one
or more source images into a generated one. This approach enhances
user-friendly customization, allowing users to selectively apply desired
attributes to create images that meet their unique preferences and specific
content requirements.Summary
AI-Generated Summary