ChatPaper.aiChatPaper

HealthGPT:通过异构知识适配统一理解与生成的医疗大视觉语言模型

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

February 14, 2025
作者: Tianwei Lin, Wenqiao Zhang, Sijing Li, Yuqian Yuan, Binhe Yu, Haoyuan Li, Wanggui He, Hao Jiang, Mengze Li, Xiaohui Song, Siliang Tang, Jun Xiao, Hui Lin, Yueting Zhuang, Beng Chin Ooi
cs.AI

摘要

我们推出HealthGPT,一款强大的医疗大型视觉-语言模型(Med-LVLM),它将医疗视觉理解与生成能力整合于统一的自回归框架中。我们的引导理念是逐步将异质的理解与生成知识适配至预训练的大型语言模型(LLMs),这一过程通过创新的异质低秩适配(H-LoRA)技术实现,并辅以定制化的层次视觉感知方法和三阶段学习策略。为了高效训练HealthGPT,我们构建了一个全面的医疗领域专用理解与生成数据集,命名为VL-Health。实验结果表明,HealthGPT在医疗视觉统一任务中展现出卓越的性能与可扩展性。本项目可通过https://github.com/DCDmllm/HealthGPT访问。
English
We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

Summary

AI-Generated Summary

PDF102February 19, 2025