FiVA：用於文本到圖像擴散模型的精細視覺屬性數據集

摘要

最近在文本到圖像生成方面取得的進展使得創建具有多樣應用的高質量圖像成為可能。然而，準確描述所需的視覺特徵可能具有挑戰性，尤其對於藝術和攝影非專家而言。一個直觀的解決方案是從源圖像中採用有利的特徵。目前的方法試圖從源圖像中提煉身份和風格。然而，“風格”是一個廣泛的概念，包括紋理、顏色和藝術元素，但並不涵蓋其他重要的屬性，如燈光和動態。此外，簡化的“風格”適應會阻礙將來自不同源的多個屬性結合到一個生成的圖像中。在這項工作中，我們制定了一種更有效的方法，將圖片的美學分解為特定的視覺屬性，使用戶能夠從不同圖像應用照明、紋理和動態等特徵。為了實現這一目標，我們構建了我們所知的第一個細粒度視覺屬性數據集（FiVA）。這個FiVA數據集具有為視覺屬性組織良好的分類法，包括約1百萬張帶有視覺屬性標註的高質量生成圖像。利用這個數據集，我們提出了一個細粒度視覺屬性適應框架（FiVA-Adapter），它將一個或多個源圖像中的視覺屬性解耦並適應到生成的圖像中。這種方法增強了用戶友好的定製功能，使用戶能夠選擇性地應用所需的屬性，創建符合其獨特偏好和具體內容要求的圖像。

English

Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from the source images. Current methods attempt to distill identity and style from source images. However, "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes such as lighting and dynamics. Additionally, a simplified "style" adaptation prevents combining multiple attributes from different sources into one generated image. In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, allowing users to apply characteristics such as lighting, texture, and dynamics from different images. To achieve this goal, we constructed the first fine-grained visual attributes dataset (FiVA) to the best of our knowledge. This FiVA dataset features a well-organized taxonomy for visual attributes and includes around 1 M high-quality generated images with visual attribute annotations. Leveraging this dataset, we propose a fine-grained visual attribute adaptation framework (FiVA-Adapter), which decouples and adapts visual attributes from one or more source images into a generated one. This approach enhances user-friendly customization, allowing users to selectively apply desired attributes to create images that meet their unique preferences and specific content requirements.

FiVA：用於文本到圖像擴散模型的精細視覺屬性數據集

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

摘要

Support