DPO 核心:一種具語義感知、增強核心且豐富差異性的直接偏好優化範式

DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

January 5, 2025
作者: Amitava Das, Suranjana Trivedy, Danush Khanna, Rajarshi Roy, Gurpreet Singh, Basab Ghosh, Yaswanth Narsupalli, Vinija Jain, Vasu Sharma, Aishwarya Naresh Reganti, Aman Chadha
cs.AI

摘要

大型語言模型(LLMs)的快速崛起開啟了許多應用,但也凸顯了將它們與多樣價值觀和偏好相協調的挑戰。直接偏好優化(DPO)對於協調至關重要,但受到固定差異和有限特徵轉換的限制。我們提出了DPO-Kernels,它融合了核方法來應對這些問題,通過四個關鍵貢獻:(i)使用多項式、RBF、馬氏距離和譜核的核化表示,實現更豐富的轉換,並結合基於嵌入和基於概率的目標的混合損失;(ii)差異替代方案(Jensen-Shannon、Hellinger、Renyi、Bhattacharyya、Wasserstein 和 f-差異)以提高穩定性;(iii)數據驅動的選擇指標,自動選擇最佳的核-差異配對;以及(iv)用於局部精度和全局建模的分層核混合。在12個數據集上的評估顯示,在事實性、安全性、推理和指令遵循方面實現了最先進的性能。基於重尾自正則化的基礎,DPO-Kernels 為LLMs保持了強健的泛化能力,為進一步的協調研究提供了全面的資源。
English
The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations. We propose DPO-Kernels, which integrates kernel methods to address these issues through four key contributions: (i) Kernelized Representations with polynomial, RBF, Mahalanobis, and spectral kernels for richer transformations, plus a hybrid loss combining embedding-based and probability-based objectives; (ii) Divergence Alternatives (Jensen-Shannon, Hellinger, Renyi, Bhattacharyya, Wasserstein, and f-divergences) for greater stability; (iii) Data-Driven Selection metrics that automatically choose the best kernel-divergence pair; and (iv) a Hierarchical Mixture of Kernels for both local precision and global modeling. Evaluations on 12 datasets demonstrate state-of-the-art performance in factuality, safety, reasoning, and instruction following. Grounded in Heavy-Tailed Self-Regularization, DPO-Kernels maintains robust generalization for LLMs, offering a comprehensive resource for further alignment research.

Summary

AI-Generated Summary

PDF112January 9, 2025