個性化多模式大型語言模型:一項調查
Personalized Multimodal Large Language Models: A Survey
December 3, 2024
作者: Junda Wu, Hanjia Lyu, Yu Xia, Zhehao Zhang, Joe Barrow, Ishita Kumar, Mehrnoosh Mirtaheri, Hongjie Chen, Ryan A. Rossi, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Namyong Park, Sungchul Kim, Huanrui Yang, Subrata Mitra, Zhengmian Hu, Nedim Lipka, Dang Nguyen, Yue Zhao, Jiebo Luo, Julian McAuley
cs.AI
摘要
由於其最先進的表現和整合多種數據模態(如文本、圖像和音頻)以高準確度執行複雜任務的能力,多模式大型語言模型(MLLMs)變得日益重要。本文提出了一份關於個性化多模式大型語言模型的全面調查,重點在於它們的架構、訓練方法和應用。我們提出了一個直觀的分類法,用於將用於個性化MLLMs的技術進行分類,並相應地討論這些技術。此外,我們討論了這些技術在適當時如何結合或適應,突出它們的優勢和基本原理。我們還提供了現有研究中探討的個性化任務的簡要摘要,以及常用的評估指標。此外,我們總結了用於基準測試個性化MLLMs的數據集。最後,我們概述了關鍵的開放挑戰。本調查旨在成為研究人員和實踐者理解和推進個性化多模式大型語言模型發展的寶貴資源。
English
Multimodal Large Language Models (MLLMs) have become increasingly important
due to their state-of-the-art performance and ability to integrate multiple
data modalities, such as text, images, and audio, to perform complex tasks with
high accuracy. This paper presents a comprehensive survey on personalized
multimodal large language models, focusing on their architecture, training
methods, and applications. We propose an intuitive taxonomy for categorizing
the techniques used to personalize MLLMs to individual users, and discuss the
techniques accordingly. Furthermore, we discuss how such techniques can be
combined or adapted when appropriate, highlighting their advantages and
underlying rationale. We also provide a succinct summary of personalization
tasks investigated in existing research, along with the evaluation metrics
commonly used. Additionally, we summarize the datasets that are useful for
benchmarking personalized MLLMs. Finally, we outline critical open challenges.
This survey aims to serve as a valuable resource for researchers and
practitioners seeking to understand and advance the development of personalized
multimodal large language models.Summary
AI-Generated Summary