EQ-VAE：通过等变性正则化潜在空间提升生成图像建模

摘要

潜在生成模型已成为高质量图像合成的主流方法。这些模型依赖于自编码器将图像压缩至潜在空间，随后通过生成模型学习潜在分布。我们发现现有自编码器缺乏对缩放、旋转等语义保持变换的等变性，导致潜在空间复杂，从而影响生成性能。为此，我们提出EQ-VAE，一种简单的正则化方法，通过在潜在空间强制等变性来降低其复杂度，同时不牺牲重建质量。通过使用EQ-VAE微调预训练的自编码器，我们提升了包括DiT、SiT、REPA和MaskGIT在内的多种最先进生成模型的性能，仅需五个epoch的SD-VAE微调，DiT-XL/2的速度提升了7倍。EQ-VAE兼容连续和离散自编码器，因此为广泛的潜在生成模型提供了通用性增强。项目页面与代码：https://eq-vae.github.io/。

English

Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models. Project page and code: https://eq-vae.github.io/.

EQ-VAE：通过等变性正则化潜在空间提升生成图像建模

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

摘要

Summary

热门论文

1比特LLM时代：所有大型语言模型均为1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Qwen2.5 技术报告
Qwen2.5 Technical Report

DeepSeek-R1：通过强化学习激励LLMs中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Support

摘要

Summary

热门论文

1比特LLM时代：所有大型语言模型均为1.58比特。The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Qwen2.5 技术报告Qwen2.5 Technical Report

DeepSeek-R1：通过强化学习激励LLMs中的推理能力DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

1比特LLM时代：所有大型语言模型均为1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Qwen2.5 技术报告
Qwen2.5 Technical Report

DeepSeek-R1：通过强化学习激励LLMs中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning