OmniSVG:统一的可缩放矢量图形生成模型
OmniSVG: A Unified Scalable Vector Graphics Generation Model
April 8, 2025
作者: Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, Yu-Gang Jiang
cs.AI
摘要
可缩放矢量图形(SVG)作为一种重要的图像格式,因其分辨率独立性和可编辑性在图形设计领域得到广泛应用。生成高质量SVG的研究持续吸引着AIGC社区中设计师与研究者的关注。然而,现有方法要么生成非结构化输出且计算成本高昂,要么仅限于生成结构过于简化的单色图标。为生成高质量且复杂的SVG,我们提出了OmniSVG,一个利用预训练视觉-语言模型(VLMs)进行端到端多模态SVG生成的统一框架。通过将SVG命令和坐标参数化为离散令牌,OmniSVG在保持复杂SVG结构表现力的同时,将结构逻辑与底层几何解耦,实现了高效训练。为进一步推动SVG合成技术的发展,我们引入了MMSVG-2M,一个包含两百万个丰富标注SVG资源的多模态数据集,并制定了条件SVG生成任务的标准化评估协议。大量实验表明,OmniSVG超越了现有方法,展现了其融入专业SVG设计流程的潜力。
English
Scalable Vector Graphics (SVG) is an important image format widely adopted in
graphic design because of their resolution independence and editability. The
study of generating high-quality SVG has continuously drawn attention from both
designers and researchers in the AIGC community. However, existing methods
either produces unstructured outputs with huge computational cost or is limited
to generating monochrome icons of over-simplified structures. To produce
high-quality and complex SVG, we propose OmniSVG, a unified framework that
leverages pre-trained Vision-Language Models (VLMs) for end-to-end multimodal
SVG generation. By parameterizing SVG commands and coordinates into discrete
tokens, OmniSVG decouples structural logic from low-level geometry for
efficient training while maintaining the expressiveness of complex SVG
structure. To further advance the development of SVG synthesis, we introduce
MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets,
along with a standardized evaluation protocol for conditional SVG generation
tasks. Extensive experiments show that OmniSVG outperforms existing methods and
demonstrates its potential for integration into professional SVG design
workflows.Summary
AI-Generated Summary