揭開金融LLM模型領域適應後訓練的神秘面紗

摘要

大型語言模型（LLM）的領域自適應後訓練已成為專業領域（如醫學和金融）的一種有前途的方法。然而，在識別跨不同數據和模型配置的最佳適應標準和訓練策略方面仍存在重大挑戰。為應對這些挑戰，我們引入了FINDAP，進行了對金融領域的LLM領域自適應後訓練的系統性和細緻調查。我們的方法始於識別目標領域所需的核心能力，並設計了與這些需求一致的全面評估套件。然後，我們分析了關鍵後訓練階段的有效性，包括持續預訓練、指導調整和偏好對齊。基於這些見解，我們提出了一種以新穎的偏好數據提煉方法為核心的有效訓練配方，該方法利用來自生成性獎勵模型的過程信號。結果模型Llama-Fin在各種金融任務中實現了最先進的性能。我們的分析還突顯了每個後訓練階段如何為不同的能力做出貢獻，揭示了具體的挑戰和有效的解決方案，為LLM的領域自適應提供了寶貴的見解。項目頁面：https://github.com/SalesforceAIResearch/FinDap

English

Domain-adaptive post-training of large language models (LLMs) has emerged as a promising approach for specialized domains such as medicine and finance. However, significant challenges remain in identifying optimal adaptation criteria and training strategies across varying data and model configurations. To address these challenges, we introduce FINDAP, a systematic and fine-grained investigation into domain-adaptive post-training of LLMs for the finance domain. Our approach begins by identifying the core capabilities required for the target domain and designing a comprehensive evaluation suite aligned with these needs. We then analyze the effectiveness of key post-training stages, including continual pretraining, instruction tuning, and preference alignment. Building on these insights, we propose an effective training recipe centered on a novel preference data distillation method, which leverages process signals from a generative reward model. The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks. Our analysis also highlights how each post-training stage contributes to distinct capabilities, uncovering specific challenges and effective solutions, providing valuable insights for domain adaptation of LLMs. Project page: https://github.com/SalesforceAIResearch/FinDap

揭開金融LLM模型領域適應後訓練的神秘面紗

Demystifying Domain-adaptive Post-training for Financial LLMs

摘要

Summary

Support