안녕하세요: 확산 모델에 고수준 및 충실한 조건을 포함하기 위해 공간적 니팅 주의를 통합하기

초록

텍스트-이미지 기반 모델에 어댑터를 삽입하는 효과적인 방법을 제안합니다. 이를 통해 복잡한 하류 작업을 수행하면서 기본 모델의 일반화 능력을 유지할 수 있습니다. 이 방법의 핵심 아이디어는 2D 특성 맵과 관련된 주의 메커니즘을 최적화하여 어댑터의 성능을 향상시키는 것입니다. 이 접근 방식은 미미 비디오 생성 작업에서 검증되었으며 상당한 결과를 달성했습니다. 이 연구가 대규모 텍스트-이미지 모델의 사후 훈련 작업에 대한 통찰을 제공할 수 있기를 희망합니다. 또한, 이 방법이 SD1.5 파생 모델과 호환성이 좋다는 것을 보여줌으로써 오픈 소스 커뮤니티에 일정한 가치를 제공합니다. 따라서 관련 코드를 공개할 것입니다 (https://songkey.github.io/hellomeme).

English

We propose an effective method for inserting adapters into text-to-image foundation models, which enables the execution of complex downstream tasks while preserving the generalization ability of the base model. The core idea of this method is to optimize the attention mechanism related to 2D feature maps, which enhances the performance of the adapter. This approach was validated on the task of meme video generation and achieved significant results. We hope this work can provide insights for post-training tasks of large text-to-image models. Additionally, as this method demonstrates good compatibility with SD1.5 derivative models, it holds certain value for the open-source community. Therefore, we will release the related code (https://songkey.github.io/hellomeme).

안녕하세요: 확산 모델에 고수준 및 충실한 조건을 포함하기 위해 공간적 니팅 주의를 통합하기

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

초록

Summary

Support