GPS作为图像生成的控制信号

摘要

我们展示了照片元数据中包含的GPS标签为图像生成提供了有用的控制信号。我们训练了GPS到图像的模型，并将其用于需要对城市内图像变化进行细粒度理解的任务。特别是，我们训练了一个扩散模型，以GPS和文本为条件生成图像。学习的模型生成捕捉不同街区、公园和地标的独特外观的图像。我们还通过得分蒸馏采样从2D GPS到图像模型中提取3D模型，利用GPS条件来约束从每个视角重建的外观。我们的评估表明，我们的GPS条件模型成功学习生成基于位置变化的图像，并且GPS条件改善了估计的3D结构。

English

We show that the GPS tags contained in photo metadata provide a useful control signal for image generation. We train GPS-to-image models and use them for tasks that require a fine-grained understanding of how images vary within a city. In particular, we train a diffusion model to generate images conditioned on both GPS and text. The learned model generates images that capture the distinctive appearance of different neighborhoods, parks, and landmarks. We also extract 3D models from 2D GPS-to-image models through score distillation sampling, using GPS conditioning to constrain the appearance of the reconstruction from each viewpoint. Our evaluations suggest that our GPS-conditioned models successfully learn to generate images that vary based on location, and that GPS conditioning improves estimated 3D structure.

GPS作为图像生成的控制信号

GPS as a Control Signal for Image Generation

摘要

Summary

Support

Support