DPO核:一种具有语义意识、增强核和丰富差异性的直接偏好优化范式

DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

January 5, 2025
作者: Amitava Das, Suranjana Trivedy, Danush Khanna, Rajarshi Roy, Gurpreet Singh, Basab Ghosh, Yaswanth Narsupalli, Vinija Jain, Vasu Sharma, Aishwarya Naresh Reganti, Aman Chadha
cs.AI

摘要

大型语言模型(LLMs)的快速崛起开启了许多应用,但也凸显了将它们与多样化价值观和偏好对齐的挑战。直接偏好优化(DPO)对齐至关重要,但受到固定差异和有限特征转换的限制。我们提出了DPO-Kernels,它整合了核方法来解决这些问题,具有四个关键贡献:(i)使用多项式、RBF、马氏、和谱核的核化表示,实现更丰富的转换,以及结合基于嵌入和基于概率的目标的混合损失;(ii)差异替代方案(Jensen-Shannon、Hellinger、Renyi、Bhattacharyya、Wasserstein 和 f-差异)以提高稳定性;(iii)数据驱动的选择度量,自动选择最佳的核-差异对;以及(iv)用于局部精度和全局建模的分层核混合。在12个数据集上的评估表明,在事实性、安全性、推理和指令遵循方面表现出最先进的性能。基于重尾自正则化,DPO-Kernels 保持了对LLMs的强大泛化能力,为进一步的对齐研究提供了全面的资源。
English
The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations. We propose DPO-Kernels, which integrates kernel methods to address these issues through four key contributions: (i) Kernelized Representations with polynomial, RBF, Mahalanobis, and spectral kernels for richer transformations, plus a hybrid loss combining embedding-based and probability-based objectives; (ii) Divergence Alternatives (Jensen-Shannon, Hellinger, Renyi, Bhattacharyya, Wasserstein, and f-divergences) for greater stability; (iii) Data-Driven Selection metrics that automatically choose the best kernel-divergence pair; and (iv) a Hierarchical Mixture of Kernels for both local precision and global modeling. Evaluations on 12 datasets demonstrate state-of-the-art performance in factuality, safety, reasoning, and instruction following. Grounded in Heavy-Tailed Self-Regularization, DPO-Kernels maintains robust generalization for LLMs, offering a comprehensive resource for further alignment research.

Summary

AI-Generated Summary

PDF112January 9, 2025