Mol-LLaMA:迈向大分子语言模型中对分子的通用理解
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model
February 19, 2025
作者: Dongki Kim, Wonbin Lee, Sung Ju Hwang
cs.AI
摘要
理解分子是理解生物体及推动药物研发进步的关键,这需要跨越化学与生物学的跨学科知识。尽管大型分子语言模型在解析分子结构方面已取得显著成功,但其指令数据集仅限于任务导向数据集中的特定知识,未能全面涵盖分子的基本特性,从而限制了其作为通用分子助手的能力。为解决这一问题,我们提出了Mol-LLaMA,一个通过多模态指令调优掌握以分子为核心通用知识的大型分子语言模型。为此,我们设计了包含分子基本特征的关键数据类型,整合了分子结构中的核心知识。此外,为提升对分子特征的理解,我们引入了一个模块,该模块融合了来自不同分子编码器的互补信息,充分利用了不同分子表示方法的独特优势。实验结果表明,Mol-LLaMA能够理解分子的通用特征,并针对用户查询生成包含详细解释的相关响应,展现了其作为分子分析通用助手的潜力。
English
Understanding molecules is key to understanding organisms and driving
advances in drug discovery, requiring interdisciplinary knowledge across
chemistry and biology. Although large molecular language models have achieved
notable success in interpreting molecular structures, their instruction
datasets are limited to the specific knowledge from task-oriented datasets and
do not fully cover the fundamental characteristics of molecules, hindering
their abilities as general-purpose molecular assistants. To address this issue,
we propose Mol-LLaMA, a large molecular language model that grasps the general
knowledge centered on molecules via multi-modal instruction tuning. To this
end, we design key data types that encompass the fundamental features of
molecules, incorporating essential knowledge from molecular structures. In
addition, to improve understanding of molecular features, we introduce a module
that integrates complementary information from different molecular encoders,
leveraging the distinct advantages of different molecular representations. Our
experimental results demonstrate that Mol-LLaMA is capable of comprehending the
general features of molecules and generating relevant responses to users'
queries with detailed explanations, implying its potential as a general-purpose
assistant for molecular analysis.Summary
AI-Generated Summary