ChatPaper.aiChatPaper

超越微調:釋放臨床LLM的持續預訓練潛力

Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

September 23, 2024
作者: Clément Christophe, Tathagata Raha, Svetlana Maslenkova, Muhammad Umar Salman, Praveen K Kanithi, Marco AF Pimentel, Shadab Khan
cs.AI

摘要

大型語言模型(LLMs)已展示在轉化臨床應用方面具有顯著潛力。本研究探討了四種技術在適應LLMs用於臨床用例時的效力:持續預訓練、指導微調、NEFTune和提示工程。我們在Mistral 7B和Mixtral 8x7B模型上應用這些方法,利用一個包含500億令牌的大規模臨床預訓練數據集和一個包含5億令牌的指導微調數據集。我們在各種臨床任務上的評估顯示了每種技術的影響。雖然超過2500億令牌的持續預訓練本身僅帶來輕微改進,但它為指導微調奠定了堅實基礎。值得注意的是,NEFTune主要設計用於提高生成質量,但在我們的基準上驚人地展現了額外的增益。複雜的提示工程方法進一步提升了性能。這些發現顯示了量身定制微調策略和探索創新技術以優化LLMs在臨床領域性能的重要性。
English
Large Language Models (LLMs) have demonstrated significant potential in transforming clinical applications. In this study, we investigate the efficacy of four techniques in adapting LLMs for clinical use-cases: continuous pretraining, instruct fine-tuning, NEFTune, and prompt engineering. We employ these methods on Mistral 7B and Mixtral 8x7B models, leveraging a large-scale clinical pretraining dataset of 50 billion tokens and an instruct fine-tuning dataset of 500 million tokens. Our evaluation across various clinical tasks reveals the impact of each technique. While continuous pretraining beyond 250 billion tokens yields marginal improvements on its own, it establishes a strong foundation for instruct fine-tuning. Notably, NEFTune, designed primarily to enhance generation quality, surprisingly demonstrates additional gains on our benchmark. Complex prompt engineering methods further enhance performance. These findings show the importance of tailoring fine-tuning strategies and exploring innovative techniques to optimize LLM performance in the clinical domain.

Summary

AI-Generated Summary

PDF242November 16, 2024