揭秘面向金融LLM的领域自适应后训练

摘要

领域自适应后训练大型语言模型（LLMs）已成为专业领域（如医学和金融）的一种有前途的方法。然而，在跨不同数据和模型配置识别最佳适应标准和训练策略方面仍存在重大挑战。为了解决这些挑战，我们引入了FINDAP，这是一个系统化且细致入微的研究，针对金融领域的LLMs进行领域自适应后训练。我们的方法首先确定目标领域所需的核心能力，并设计了一个与这些需求对齐的全面评估套件。然后，我们分析了关键后训练阶段的有效性，包括持续预训练、指导调整和偏好对齐。基于这些见解，我们提出了一个有效的训练配方，重点是一种新颖的偏好数据蒸馏方法，利用了来自生成奖励模型的过程信号。由此产生的模型，Llama-Fin，在广泛的金融任务中实现了最先进的性能。我们的分析还突出了每个后训练阶段如何为不同的能力做出贡献，揭示了具体的挑战和有效的解决方案，为LLMs的领域自适应提供了宝贵的见解。项目页面：https://github.com/SalesforceAIResearch/FinDap

English

Domain-adaptive post-training of large language models (LLMs) has emerged as a promising approach for specialized domains such as medicine and finance. However, significant challenges remain in identifying optimal adaptation criteria and training strategies across varying data and model configurations. To address these challenges, we introduce FINDAP, a systematic and fine-grained investigation into domain-adaptive post-training of LLMs for the finance domain. Our approach begins by identifying the core capabilities required for the target domain and designing a comprehensive evaluation suite aligned with these needs. We then analyze the effectiveness of key post-training stages, including continual pretraining, instruction tuning, and preference alignment. Building on these insights, we propose an effective training recipe centered on a novel preference data distillation method, which leverages process signals from a generative reward model. The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks. Our analysis also highlights how each post-training stage contributes to distinct capabilities, uncovering specific challenges and effective solutions, providing valuable insights for domain adaptation of LLMs. Project page: https://github.com/SalesforceAIResearch/FinDap

揭秘面向金融LLM的领域自适应后训练

Demystifying Domain-adaptive Post-training for Financial LLMs

摘要

Support