ChatPaper.aiChatPaper

LLaVA-Critic:學習評估多模型

LLaVA-Critic: Learning to Evaluate Multimodal Models

October 3, 2024
作者: Tianyi Xiong, Xiyao Wang, Dong Guo, Qinghao Ye, Haoqi Fan, Quanquan Gu, Heng Huang, Chunyuan Li
cs.AI

摘要

我們介紹了LLaVA-Critic,這是第一個開源的大型多模態模型(LMM),旨在作為一個通用評估器,用於評估各種多模態任務的性能。LLaVA-Critic是使用高質量的評論指示遵循數據集進行訓練的,該數據集包含多樣的評估標準和場景。我們的實驗證明了該模型在兩個關鍵領域的有效性:(1)LMM作為評判,LLaVA-Critic提供可靠的評估分數,在多個評估基準上表現媲美或超越GPT模型;以及(2)偏好學習,它為偏好學習生成獎勵信號,增強模型對齊能力。這項工作突顯了開源LMM在自我評論和評估中的潛力,為未來研究提供了舞台,以探索可擴展的、超人類對齊反饋機制用於LMM。
English
We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks. LLaVA-Critic is trained using a high-quality critic instruction-following dataset that incorporates diverse evaluation criteria and scenarios. Our experiments demonstrate the model's effectiveness in two key areas: (1) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation scores, performing on par with or surpassing GPT models on multiple evaluation benchmarks; and (2) Preference Learning, where it generates reward signals for preference learning, enhancing model alignment capabilities. This work underscores the potential of open-source LMMs in self-critique and evaluation, setting the stage for future research into scalable, superhuman alignment feedback mechanisms for LMMs.

Summary

AI-Generated Summary

PDF363November 16, 2024