超越像素的想象:基于推理的视觉编辑基准测试
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
April 3, 2025
作者: Xiangyu Zhao, Peiyuan Zhang, Kexian Tang, Hao Li, Zicheng Zhang, Guangtao Zhai, Junchi Yan, Hua Yang, Xue Yang, Haodong Duan
cs.AI
摘要
大型多模态模型(LMMs)在视觉理解和生成方面取得了显著进展,但在通用视觉编辑领域仍面临挑战,尤其是在遵循复杂指令、保持外观一致性以及支持灵活输入格式方面。为填补这一空白,我们推出了RISEBench,这是首个用于评估推理引导视觉编辑(RISE)的基准。RISEBench聚焦于四种关键推理类型:时序推理、因果推理、空间推理和逻辑推理。我们为每种类别精心策划了高质量测试案例,并提出了一个评估框架,该框架结合人类评审与LMM作为评审的方法,从指令推理、外观一致性和视觉合理性三个维度进行评估。实验表明,尽管GPT-4o-Native显著优于其他开源和专有模型,但即便是这一顶尖系统在逻辑推理任务上仍显吃力,凸显了该领域尚待深入探索。作为初步尝试,RISEBench旨在为推理感知的视觉编辑提供基础性洞见,并推动未来研究。尽管仍处于早期阶段,我们承诺将持续扩展和完善该基准,以支持对下一代多模态系统进行更全面、可靠和可扩展的评估。我们的代码和数据将在https://github.com/PhoenixZ810/RISEBench发布。
English
Large Multi-modality Models (LMMs) have made significant progress in visual
understanding and generation, but they still face challenges in General Visual
Editing, particularly in following complex instructions, preserving appearance
consistency, and supporting flexible input formats. To address this gap, we
introduce RISEBench, the first benchmark for evaluating Reasoning-Informed
viSual Editing (RISE). RISEBench focuses on four key reasoning types: Temporal,
Causal, Spatial, and Logical Reasoning. We curate high-quality test cases for
each category and propose an evaluation framework that assesses Instruction
Reasoning, Appearance Consistency, and Visual Plausibility with both human
judges and an LMM-as-a-judge approach. Our experiments reveal that while
GPT-4o-Native significantly outperforms other open-source and proprietary
models, even this state-of-the-art system struggles with logical reasoning
tasks, highlighting an area that remains underexplored. As an initial effort,
RISEBench aims to provide foundational insights into reasoning-aware visual
editing and to catalyze future research. Though still in its early stages, we
are committed to continuously expanding and refining the benchmark to support
more comprehensive, reliable, and scalable evaluations of next-generation
multimodal systems. Our code and data will be released at
https://github.com/PhoenixZ810/RISEBench.Summary
AI-Generated Summary