ChatPaper.aiChatPaper

OmniAlign-V:迈向多模态大语言模型与人类偏好的深度对齐

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

February 25, 2025
作者: Xiangyu Zhao, Shengyuan Ding, Zicheng Zhang, Haian Huang, Maosong Cao, Weiyun Wang, Jiaqi Wang, Xinyu Fang, Wenhai Wang, Guangtao Zhai, Haodong Duan, Hua Yang, Kai Chen
cs.AI

摘要

近期,开源多模态大语言模型(MLLMs)的研究进展主要集中在提升基础能力上,而在与人类偏好对齐方面仍存在显著空白。本文介绍了OmniAlign-V,一个包含20万高质量训练样本的综合性数据集,涵盖了多样化的图像、复杂的问题及多种响应格式,旨在提升MLLMs与人类偏好的对齐度。同时,我们推出了MM-AlignBench,这是一个专门设计用于评估MLLMs与人类价值观对齐程度的人工标注基准。实验结果表明,通过监督微调(SFT)或直接偏好优化(DPO)方法,利用OmniAlign-V对MLLMs进行微调,不仅能显著增强其与人类偏好的对齐,还能保持乃至提升在标准视觉问答(VQA)基准上的性能,确保其基础能力不受影响。我们的数据集、基准、代码及模型检查点已发布于https://github.com/PhoenixZ810/OmniAlign-V。
English
Recent advancements in open-source multi-modal large language models (MLLMs) have primarily focused on enhancing foundational capabilities, leaving a significant gap in human preference alignment. This paper introduces OmniAlign-V, a comprehensive dataset of 200K high-quality training samples featuring diverse images, complex questions, and varied response formats to improve MLLMs' alignment with human preferences. We also present MM-AlignBench, a human-annotated benchmark specifically designed to evaluate MLLMs' alignment with human values. Experimental results show that finetuning MLLMs with OmniAlign-V, using Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO), significantly enhances human preference alignment while maintaining or enhancing performance on standard VQA benchmarks, preserving their fundamental capabilities. Our datasets, benchmark, code and checkpoints have been released at https://github.com/PhoenixZ810/OmniAlign-V.

Summary

AI-Generated Summary

PDF692February 26, 2025