OmniSVG:統一的可縮放向量圖形生成模型
OmniSVG: A Unified Scalable Vector Graphics Generation Model
April 8, 2025
作者: Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, Yu-Gang Jiang
cs.AI
摘要
可縮放向量圖形(SVG)因其解析度獨立性和可編輯性,在圖形設計領域被廣泛採用,成為一種重要的圖像格式。生成高質量SVG的研究持續吸引著AIGC社群中設計師和研究者的關注。然而,現有方法要么產生非結構化輸出且計算成本高昂,要么僅限於生成結構過於簡化的單色圖標。為生成高質量且複雜的SVG,我們提出了OmniSVG,這是一個利用預訓練視覺-語言模型(VLMs)進行端到端多模態SVG生成的統一框架。通過將SVG命令和座標參數化為離散標記,OmniSVG將結構邏輯與低層幾何解耦,實現高效訓練的同時保持複雜SVG結構的表達能力。為進一步推動SVG合成的發展,我們引入了MMSVG-2M,這是一個包含兩百萬個豐富註釋SVG資產的多模態數據集,並為條件式SVG生成任務制定了標準化評估協議。大量實驗表明,OmniSVG優於現有方法,並展示了其融入專業SVG設計工作流程的潛力。
English
Scalable Vector Graphics (SVG) is an important image format widely adopted in
graphic design because of their resolution independence and editability. The
study of generating high-quality SVG has continuously drawn attention from both
designers and researchers in the AIGC community. However, existing methods
either produces unstructured outputs with huge computational cost or is limited
to generating monochrome icons of over-simplified structures. To produce
high-quality and complex SVG, we propose OmniSVG, a unified framework that
leverages pre-trained Vision-Language Models (VLMs) for end-to-end multimodal
SVG generation. By parameterizing SVG commands and coordinates into discrete
tokens, OmniSVG decouples structural logic from low-level geometry for
efficient training while maintaining the expressiveness of complex SVG
structure. To further advance the development of SVG synthesis, we introduce
MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets,
along with a standardized evaluation protocol for conditional SVG generation
tasks. Extensive experiments show that OmniSVG outperforms existing methods and
demonstrates its potential for integration into professional SVG design
workflows.Summary
AI-Generated Summary