風格碼：將風格資訊編碼用於影像生成

摘要

擴散模型在圖像生成方面表現出色，但控制它們仍然是一個挑戰。我們專注於風格條件下的圖像生成問題。儘管示例圖像可行，但它們很繁瑣：MidJourney 的 srefs（風格參考碼）解決了這個問題，通過用簡短的數字代碼表達特定圖像風格。由於易於分享且允許使用圖像進行風格控制，而無需發布源圖像本身，這些已經在社交媒體上得到廣泛應用。然而，用戶無法從自己的圖像生成 srefs，並且底層的訓練過程也不是公開的。我們提出了 StyleCodes：一種開源和開放研究的風格編碼器架構和訓練程序，以將圖像風格表達為一個 20 個符號的 base64 編碼。我們的實驗表明，與傳統的圖像到風格技術相比，我們的編碼結果在質量上損失最小。

English

Diffusion models excel in image generation, but controlling them remains a challenge. We focus on the problem of style-conditioned image generation. Although example images work, they are cumbersome: srefs (style-reference codes) from MidJourney solve this issue by expressing a specific image style in a short numeric code. These have seen widespread adoption throughout social media due to both their ease of sharing and the fact they allow using an image for style control, without having to post the source images themselves. However, users are not able to generate srefs from their own images, nor is the underlying training procedure public. We propose StyleCodes: an open-source and open-research style encoder architecture and training procedure to express image style as a 20-symbol base64 code. Our experiments show that our encoding results in minimal loss in quality compared to traditional image-to-style techniques.

風格碼：將風格資訊編碼用於影像生成

Stylecodes: Encoding Stylistic Information For Image Generation

摘要

Support