SphereDiff:基于球面潜在表示的无调优全向全景图像与视频生成
SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation
April 19, 2025
作者: Minho Park, Taewoong Kang, Jooyeol Yun, Sungwon Hwang, Jaegul Choo
cs.AI
摘要
随着AR/VR应用需求的日益增长,对高质量360度全景内容的需求也愈发凸显。然而,由于等距柱状投影(ERP)引入的严重失真,生成高质量的360度全景图像和视频仍是一项具有挑战性的任务。现有方法要么在有限的ERP数据集上微调预训练的扩散模型,要么尝试无需调优的方法,但这些方法仍依赖于ERP潜在表示,导致在极点附近出现不连续现象。本文提出了一种名为SphereDiff的新方法,它利用最先进的扩散模型,无需额外调优即可实现无缝的360度全景图像和视频生成。我们定义了一种球面潜在表示,确保所有视角上的均匀分布,从而缓解ERP固有的失真问题。我们将多扩散方法扩展至球面潜在空间,并提出了一种球面潜在采样方法,使得预训练扩散模型能够直接使用。此外,我们还引入了失真感知加权平均技术,以进一步提升投影过程中的生成质量。我们的方法在生成360度全景内容方面优于现有方法,同时保持了高保真度,为沉浸式AR/VR应用提供了一个稳健的解决方案。代码已公开,详见:https://github.com/pmh9960/SphereDiff。
English
The increasing demand for AR/VR applications has highlighted the need for
high-quality 360-degree panoramic content. However, generating high-quality
360-degree panoramic images and videos remains a challenging task due to the
severe distortions introduced by equirectangular projection (ERP). Existing
approaches either fine-tune pretrained diffusion models on limited ERP datasets
or attempt tuning-free methods that still rely on ERP latent representations,
leading to discontinuities near the poles. In this paper, we introduce
SphereDiff, a novel approach for seamless 360-degree panoramic image and video
generation using state-of-the-art diffusion models without additional tuning.
We define a spherical latent representation that ensures uniform distribution
across all perspectives, mitigating the distortions inherent in ERP. We extend
MultiDiffusion to spherical latent space and propose a spherical latent
sampling method to enable direct use of pretrained diffusion models. Moreover,
we introduce distortion-aware weighted averaging to further improve the
generation quality in the projection process. Our method outperforms existing
approaches in generating 360-degree panoramic content while maintaining high
fidelity, making it a robust solution for immersive AR/VR applications. The
code is available here. https://github.com/pmh9960/SphereDiffSummary
AI-Generated Summary