ChatPaper.aiChatPaper

HarmoniCa:在擴散Transformer加速中協調訓練和推論以改進特徵緩存

HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration

October 2, 2024
作者: Yushi Huang, Zining Wang, Ruihao Gong, Jing Liu, Xinjie Zhang, Jinyang Guo, Xianglong Liu, Jun Zhang
cs.AI

摘要

擴散Transformer(DiTs)因在生成任務中具有傑出的可擴展性和非凡性能而備受推崇。然而,其相當大的推論成本阻礙了實際部署。特徵快取機制涉及跨時間步存儲和檢索冗餘計算,有望減少擴散模型中每步推論時間。大多數現有的DiT快取方法是手動設計的。儘管基於學習的方法試圖自適應地優化策略,但由於訓練和推論之間存在差異,這既影響了性能又影響了加速比。通過詳細分析,我們指出這些差異主要源於兩個方面:(1)先前時間步忽略,即訓練忽略了在較早時間步中快取使用的影響,以及(2)目標不匹配,即訓練目標(對齊每個時間步中預測的噪聲)偏離了推論目標(生成高質量圖像)。為了減輕這些差異,我們提出了HarmoniCa,這是一種新方法,它通過建立基於逐步去噪訓練(SDT)和圖像誤差代理引導目標(IEPO)的新型基於學習的快取框架,使訓練和推論協調一致。與傳統的訓練範式相比,新提出的SDT保持了去噪過程的連續性,使模型能夠在訓練期間利用先前時間步的信息,類似於推論期間的操作方式。此外,我們設計了IEPO,它集成了一個有效的代理機制來近似由重複使用快取特徵引起的最終圖像誤差。因此,IEPO有助於平衡最終圖像質量和快取利用率,解決了僅考慮每個時間步預測輸出的快取使用影響的訓練問題。
English
Diffusion Transformers (DiTs) have gained prominence for outstanding scalability and extraordinary performance in generative tasks. However, their considerable inference costs impede practical deployment. The feature cache mechanism, which involves storing and retrieving redundant computations across timesteps, holds promise for reducing per-step inference time in diffusion models. Most existing caching methods for DiT are manually designed. Although the learning-based approach attempts to optimize strategies adaptively, it suffers from discrepancies between training and inference, which hampers both the performance and acceleration ratio. Upon detailed analysis, we pinpoint that these discrepancies primarily stem from two aspects: (1) Prior Timestep Disregard, where training ignores the effect of cache usage at earlier timesteps, and (2) Objective Mismatch, where the training target (align predicted noise in each timestep) deviates from the goal of inference (generate the high-quality image). To alleviate these discrepancies, we propose HarmoniCa, a novel method that Harmonizes training and inference with a novel learning-based Caching framework built upon Step-Wise Denoising Training (SDT) and Image Error Proxy-Guided Objective (IEPO). Compared to the traditional training paradigm, the newly proposed SDT maintains the continuity of the denoising process, enabling the model to leverage information from prior timesteps during training, similar to the way it operates during inference. Furthermore, we design IEPO, which integrates an efficient proxy mechanism to approximate the final image error caused by reusing the cached feature. Therefore, IEPO helps balance final image quality and cache utilization, resolving the issue of training that only considers the impact of cache usage on the predicted output at each timestep.

Summary

AI-Generated Summary

PDF52November 16, 2024