HelloMeme：將空間編織關注整合到擴散模型中，以嵌入高層次和豐富保真條件

摘要

我們提出了一種有效的方法，將適配器插入文本到圖像基礎模型中，從而實現執行複雜的下游任務，同時保留基礎模型的泛化能力。該方法的核心思想是優化與2D特徵圖相關的注意機制，從而增強適配器的性能。這種方法在模因視頻生成任務上得到驗證，並取得了顯著的成果。我們希望這項工作能為大型文本到圖像模型的後訓練任務提供一些見解。此外，由於這種方法展示了與SD1.5衍生模型良好的兼容性，對於開源社區具有一定價值。因此，我們將釋出相關代碼（https://songkey.github.io/hellomeme）。

English

We propose an effective method for inserting adapters into text-to-image foundation models, which enables the execution of complex downstream tasks while preserving the generalization ability of the base model. The core idea of this method is to optimize the attention mechanism related to 2D feature maps, which enhances the performance of the adapter. This approach was validated on the task of meme video generation and achieved significant results. We hope this work can provide insights for post-training tasks of large text-to-image models. Additionally, as this method demonstrates good compatibility with SD1.5 derivative models, it holds certain value for the open-source community. Therefore, we will release the related code (https://songkey.github.io/hellomeme).

HelloMeme：將空間編織關注整合到擴散模型中，以嵌入高層次和豐富保真條件

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

摘要

Summary

Support

Support