ChatPaper.aiChatPaper

利用人類反饋進行語言、語音和視覺任務的偏好調整:一項調查

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

September 17, 2024
作者: Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu
cs.AI

摘要

偏好調整是對齊深度生成模型與人類偏好的關鍵過程。本調查提供了對偏好調整和整合人類反饋的最新進展的全面概述。本文分為三個主要部分:1)介紹和基礎知識:介紹了強化學習框架、偏好調整任務、模型和不同模態下的數據集:語言、語音和視覺,以及不同的策略方法;2)對每種偏好調整方法進行深入研究:詳細分析了偏好調整中使用的方法;以及3)應用、討論和未來方向:探討了偏好調整在下游任務中的應用,包括不同模態下的評估方法,並展望未來的研究方向。我們的目標是呈現偏好調整和模型對齊的最新方法,以增進研究人員和從業者對這一領域的理解。我們希望鼓勵在這一領域進一步參與和創新。
English
Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and datasets across various modalities: language, speech, and vision, as well as different policy approaches, 2) in-depth examination of each preference tuning approach: a detailed analysis of the methods used in preference tuning, and 3) applications, discussion, and future directions: an exploration of the applications of preference tuning in downstream tasks, including evaluation methods for different modalities, and an outlook on future research directions. Our objective is to present the latest methodologies in preference tuning and model alignment, enhancing the understanding of this field for researchers and practitioners. We hope to encourage further engagement and innovation in this area.

Summary

AI-Generated Summary

PDF212November 16, 2024