and Reasoning Tasks MMKE-Bench：面向多样化视觉知识与推理任务的多模态编辑基准

摘要

知识编辑技术已成为更新大型语言模型（LLMs）和多模态模型（LMMs）事实知识的关键工具，使它们能够在不从头训练的情况下纠正过时或错误的信息。然而，现有的多模态知识编辑基准主要关注以简单三元组表示的实体级知识，未能捕捉现实世界多模态信息的复杂性。为解决这一问题，我们引入了MMKE-Bench，一个全面的多模态知识编辑基准，旨在评估LMMs在真实场景中编辑多样化视觉知识的能力。MMKE-Bench通过整合三种编辑任务来应对这些局限：视觉实体编辑、视觉语义编辑和用户特定编辑。此外，MMKE-Bench采用自由形式的自然语言来表示和编辑知识，提供了一种更为灵活有效的格式。该基准包含33个广泛类别下的2,940条知识和8,363张图像，评估问题自动生成并经人工验证。我们在三个领先的LMMs上评估了五种最先进的知识编辑方法，发现没有一种方法在所有标准上都表现出色，且视觉和用户特定编辑尤为具有挑战性。MMKE-Bench为评估多模态知识编辑技术的稳健性设定了新标准，推动这一快速发展领域的进步。

English

Knowledge editing techniques have emerged as essential tools for updating the factual knowledge of large language models (LLMs) and multimodal models (LMMs), allowing them to correct outdated or inaccurate information without retraining from scratch. However, existing benchmarks for multimodal knowledge editing primarily focus on entity-level knowledge represented as simple triplets, which fail to capture the complexity of real-world multimodal information. To address this issue, we introduce MMKE-Bench, a comprehensive MultiModal Knowledge Editing Benchmark, designed to evaluate the ability of LMMs to edit diverse visual knowledge in real-world scenarios. MMKE-Bench addresses these limitations by incorporating three types of editing tasks: visual entity editing, visual semantic editing, and user-specific editing. Besides, MMKE-Bench uses free-form natural language to represent and edit knowledge, offering a more flexible and effective format. The benchmark consists of 2,940 pieces of knowledge and 8,363 images across 33 broad categories, with evaluation questions automatically generated and human-verified. We assess five state-of-the-art knowledge editing methods on three prominent LMMs, revealing that no method excels across all criteria, and that visual and user-specific edits are particularly challenging. MMKE-Bench sets a new standard for evaluating the robustness of multimodal knowledge editing techniques, driving progress in this rapidly evolving field.

and Reasoning Tasks MMKE-Bench：面向多样化视觉知识与推理任务的多模态编辑基准

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

摘要

Summary

Support

Support