ChatPaper.aiChatPaper

AMO采样器:通过过冲增强文本渲染

AMO Sampler: Enhancing Text Rendering with Overshooting

November 28, 2024
作者: Xixi Hu, Keyang Xu, Bo Liu, Qiang Liu, Hongliang Fei
cs.AI

摘要

在文本到图像生成中,确保文本指令与生成的图像精确对齐是一个重要挑战,特别是在图像中呈现书面文本。像Stable Diffusion 3(SD3)、Flux和AuraFlow这样的最先进模型仍然在准确呈现文本方面存在困难,导致拼写错误或文本不一致。我们引入了一种无需训练的方法,计算开销极小,可以显著提高文本呈现质量。具体而言,我们为预训练的矫正流(RF)模型引入了一种过冲采样器,通过在学习的常微分方程(ODE)之间交替进行过度模拟和重新引入噪声。与Euler采样器相比,过冲采样器有效地引入了额外的朗之万动力学项,有助于纠正连续Euler步骤中的复合误差,从而改善文本呈现。然而,当过冲强度较高时,我们观察到生成的图像上出现了过度平滑的伪影。为了解决这个问题,我们提出了一种自适应控制每个图像块的过冲强度的注意力调节过冲采样器(AMO),根据它们与文本内容的注意力得分。AMO在不影响整体图像质量或增加推理成本的情况下,在SD3和Flux上展示了32.3%和35.9%的文本呈现准确性改进。
English
Achieving precise alignment between textual instructions and generated images in text-to-image generation is a significant challenge, particularly in rendering written text within images. Sate-of-the-art models like Stable Diffusion 3 (SD3), Flux, and AuraFlow still struggle with accurate text depiction, resulting in misspelled or inconsistent text. We introduce a training-free method with minimal computational overhead that significantly enhances text rendering quality. Specifically, we introduce an overshooting sampler for pretrained rectified flow (RF) models, by alternating between over-simulating the learned ordinary differential equation (ODE) and reintroducing noise. Compared to the Euler sampler, the overshooting sampler effectively introduces an extra Langevin dynamics term that can help correct the compounding error from successive Euler steps and therefore improve the text rendering. However, when the overshooting strength is high, we observe over-smoothing artifacts on the generated images. To address this issue, we propose an Attention Modulated Overshooting sampler (AMO), which adaptively controls the strength of overshooting for each image patch according to their attention score with the text content. AMO demonstrates a 32.3% and 35.9% improvement in text rendering accuracy on SD3 and Flux without compromising overall image quality or increasing inference cost.

Summary

AI-Generated Summary

PDF32December 4, 2024