ChatPaper.aiChatPaper

Step1X-Edit:通用圖像編輯的實用框架

Step1X-Edit: A Practical Framework for General Image Editing

April 24, 2025
作者: Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, Guopeng Li, Yuang Peng, Quan Sun, Jingwei Wu, Yan Cai, Zheng Ge, Ranchen Ming, Lei Xia, Xianfang Zeng, Yibo Zhu, Binxing Jiao, Xiangyu Zhang, Gang Yu, Daxin Jiang
cs.AI

摘要

近年來,圖像編輯模型取得了顯著且迅速的發展。隨著GPT-4o和Gemini2 Flash等尖端多模態模型的推出,這些模型展現了極具前景的圖像編輯能力,能夠滿足絕大多數用戶驅動的編輯需求,標誌著圖像處理領域的重大進步。然而,開源算法與這些閉源模型之間仍存在巨大差距。因此,本文旨在發布一款名為Step1X-Edit的頂尖圖像編輯模型,其性能可與GPT-4o和Gemini2 Flash等閉源模型相媲美。具體而言,我們採用多模態大語言模型(Multimodal LLM)來處理參考圖像和用戶的編輯指令,提取潛在嵌入並將其與擴散圖像解碼器結合,以生成目標圖像。為訓練該模型,我們構建了一個數據生成管道,以生產高質量的數據集。在評估方面,我們開發了GEdit-Bench,這是一個基於真實用戶指令的新型基準測試。GEdit-Bench上的實驗結果表明,Step1X-Edit大幅超越了現有的開源基準模型,並接近領先的專有模型性能,從而為圖像編輯領域做出了重要貢獻。
English
In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly promising image editing capabilities. These models demonstrate an impressive aptitude for fulfilling a vast majority of user-driven editing requirements, marking a significant advancement in the field of image manipulation. However, there is still a large gap between the open-source algorithm with these closed-source models. Thus, in this paper, we aim to release a state-of-the-art image editing model, called Step1X-Edit, which can provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash. More specifically, we adopt the Multimodal LLM to process the reference image and the user's editing instruction. A latent embedding has been extracted and integrated with a diffusion image decoder to obtain the target image. To train the model, we build a data generation pipeline to produce a high-quality dataset. For evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions. Experimental results on GEdit-Bench demonstrate that Step1X-Edit outperforms existing open-source baselines by a substantial margin and approaches the performance of leading proprietary models, thereby making significant contributions to the field of image editing.

Summary

AI-Generated Summary

PDF783April 25, 2025