風格碼:將風格資訊編碼用於影像生成
Stylecodes: Encoding Stylistic Information For Image Generation
November 19, 2024
作者: Ciara Rowles
cs.AI
摘要
擴散模型在圖像生成方面表現出色,但控制它們仍然是一個挑戰。我們專注於風格條件下的圖像生成問題。儘管示例圖像可行,但它們很繁瑣:MidJourney 的 srefs(風格參考碼)解決了這個問題,通過用簡短的數字代碼表達特定圖像風格。由於易於分享且允許使用圖像進行風格控制,而無需發布源圖像本身,這些已經在社交媒體上得到廣泛應用。然而,用戶無法從自己的圖像生成 srefs,並且底層的訓練過程也不是公開的。我們提出了 StyleCodes:一種開源和開放研究的風格編碼器架構和訓練程序,以將圖像風格表達為一個 20 個符號的 base64 編碼。我們的實驗表明,與傳統的圖像到風格技術相比,我們的編碼結果在質量上損失最小。
English
Diffusion models excel in image generation, but controlling them remains a
challenge. We focus on the problem of style-conditioned image generation.
Although example images work, they are cumbersome: srefs (style-reference
codes) from MidJourney solve this issue by expressing a specific image style in
a short numeric code. These have seen widespread adoption throughout social
media due to both their ease of sharing and the fact they allow using an image
for style control, without having to post the source images themselves.
However, users are not able to generate srefs from their own images, nor is the
underlying training procedure public. We propose StyleCodes: an open-source and
open-research style encoder architecture and training procedure to express
image style as a 20-symbol base64 code. Our experiments show that our encoding
results in minimal loss in quality compared to traditional image-to-style
techniques.Summary
AI-Generated Summary