利用前綴共享加速直接偏好優化

摘要

離線配對偏好優化演算法已成為微調偏好數據的流行方法，勝過傳統監督式微調在各種任務中的表現。然而，傳統實作通常涉及冗餘計算，尤其是對於具有長共享提示的任務。我們引入了前綴共享以進行偏好微調，這是一種新穎的技術，將所選和被拒絕的回應作為具有共享前綴的一個序列來處理。為了防止跨回應污染，我們使用自定義的區塊稀疏注意力遮罩。我們的方法在流行的DPO數據集上實現了1.1-1.5倍的訓練吞吐量提升，而不會對收斂產生任何影響。當與序列打包結合時，我們觀察到持續的1.3-1.6倍加速，甚至有助於具有較短序列長度的數據集。雖然我們專注於直接偏好優化（DPO），但我們的方法適用於其他配對偏好微調方法。通過增強計算效率，我們的工作有助於使基於偏好的微調更易於應用於更廣泛的應用和模型尺寸。我們在https://github.com/frankxwang/dpo-prefix-sharing 開源我們的代碼。

English

Offline paired preference optimization algorithms have become a popular approach for fine-tuning on preference data, outperforming traditional supervised fine-tuning in various tasks. However, traditional implementations often involve redundant computations, especially for tasks with long shared prompts. We introduce prefix sharing for preference tuning, a novel technique that processes chosen and rejected responses as one sequence with a shared prefix. To prevent cross-response contamination, we use a custom block-sparse attention mask. Our method achieves 1.1-1.5times improvement in training throughput on popular DPO datasets, without any effect on convergence. When combined with sequence packing, we observe consistent 1.3-1.6times speedups, benefiting even datasets with smaller sequence lengths. While we focus on Direct Preference Optimization (DPO), our approach is applicable to other paired preference tuning methods. By enhancing computational efficiency, our work contributes to making preference-based fine-tuning more accessible for a wider range of applications and model sizes. We open-source our code at https://github.com/frankxwang/dpo-prefix-sharing.

利用前綴共享加速直接偏好優化

Accelerating Direct Preference Optimization with Prefix Sharing

摘要

Summary

Support

Support