Zebra-Llama:一個考慮上下文的大型語言模型,旨在普及罕見疾病知識。
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
November 4, 2024
作者: Karthik Soman, Andrew Langdon, Catalina Villouta, Chinmay Agrawal, Lashaw Salta, Braian Peetoom, Gianmarco Bellucci, Orion J Buske
cs.AI
摘要
罕見疾病在醫療保健中面臨獨特挑戰,常常受到延遲診斷和碎片化信息環境的困擾。在這些情況下可靠知識的稀缺性為大型語言模型(LLMs)提供了一個獨特挑戰,支持臨床管理並提供精確患者信息,強調對這些“斑馬”案例進行專注培訓的必要性。我們提出了Zebra-Llama,這是一個專門的上下文感知語言模型,具有高精度的檢索增強生成(RAG)能力,專注於埃勒斯-丹洛斯綜合症(EDS)作為我們的案例研究。EDS影響每5,000人中的1人,通過在醫學文獻、患者經驗和臨床資源中提取的問題進行訓練的新型上下文感知微調方法的實施,以及經過專家精心策劃的回答,Zebra-Llama在處理與EDS相關的查詢方面展示了前所未有的能力。在從EDS患者和臨床醫生收集的實際問題測試集上,醫學專家評估了兩個模型生成的回答,揭示了Zebra-Llama在全面性(77.5%對70.1%)、準確性(83.0%對78.8%)、清晰度(74.7%對72.0%)和引文可靠性(70.6%對52.3%)方面相對於基礎模型(Llama 3.1-8B-Instruct)的顯著改進。作為一個開源資源發布,Zebra-Llama不僅提供更易獲得和可靠的EDS信息,還為開發其他罕見疾病的專門AI解決方案奠定了框架。這項工作代表了向民主化罕見疾病管理中的專家級知識邁出的重要一步,潛在地改變了醫療提供者和患者如何應對罕見疾病複雜環境的方式。
English
Rare diseases present unique challenges in healthcare, often suffering from
delayed diagnosis and fragmented information landscapes. The scarcity of
reliable knowledge in these conditions poses a distinct challenge for Large
Language Models (LLMs) in supporting clinical management and delivering precise
patient information underscoring the need for focused training on these 'zebra'
cases. We present Zebra-Llama, a specialized context-aware language model with
high precision Retrieval Augmented Generation (RAG) capability, focusing on
Ehlers-Danlos Syndrome (EDS) as our case study. EDS, affecting 1 in 5,000
individuals, exemplifies the complexities of rare diseases with its diverse
symptoms, multiple subtypes, and evolving diagnostic criteria. By implementing
a novel context-aware fine-tuning methodology trained on questions derived from
medical literature, patient experiences, and clinical resources, along with
expertly curated responses, Zebra-Llama demonstrates unprecedented capabilities
in handling EDS-related queries. On a test set of real-world questions
collected from EDS patients and clinicians, medical experts evaluated the
responses generated by both models, revealing Zebra-Llama's substantial
improvements over base model (Llama 3.1-8B-Instruct) in thoroughness (77.5% vs.
70.1%), accuracy (83.0% vs. 78.8%), clarity (74.7% vs. 72.0%) and citation
reliability (70.6% vs. 52.3%). Released as an open-source resource, Zebra-Llama
not only provides more accessible and reliable EDS information but also
establishes a framework for developing specialized AI solutions for other rare
conditions. This work represents a crucial step towards democratizing
expert-level knowledge in rare disease management, potentially transforming how
healthcare providers and patients navigate the complex landscape of rare
diseases.Summary
AI-Generated Summary