风格编码:为图像生成编码风格信息
Stylecodes: Encoding Stylistic Information For Image Generation
November 19, 2024
作者: Ciara Rowles
cs.AI
摘要
扩散模型在图像生成方面表现出色,但控制它们仍然是一个挑战。我们专注于风格条件的图像生成问题。尽管示例图像有效,但它们很繁琐:MidJourney的srefs(风格参考代码)通过用简短的数字代码表达特定图像风格来解决这个问题。由于易于分享并且允许使用图像进行风格控制,而无需发布源图像本身,这些代码已经在社交媒体上得到广泛采用。然而,用户无法从自己的图像生成srefs,也无法公开底层训练过程。我们提出了StyleCodes:一个开源和开放研究的风格编码器架构和训练过程,将图像风格表达为一个包含20个符号的base64代码。我们的实验表明,与传统的图像到风格技术相比,我们的编码结果在质量上几乎没有损失。
English
Diffusion models excel in image generation, but controlling them remains a
challenge. We focus on the problem of style-conditioned image generation.
Although example images work, they are cumbersome: srefs (style-reference
codes) from MidJourney solve this issue by expressing a specific image style in
a short numeric code. These have seen widespread adoption throughout social
media due to both their ease of sharing and the fact they allow using an image
for style control, without having to post the source images themselves.
However, users are not able to generate srefs from their own images, nor is the
underlying training procedure public. We propose StyleCodes: an open-source and
open-research style encoder architecture and training procedure to express
image style as a 20-symbol base64 code. Our experiments show that our encoding
results in minimal loss in quality compared to traditional image-to-style
techniques.Summary
AI-Generated Summary