一种多模态人工智能副驾驶员,用于单细胞分析指导。

A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

January 14, 2025
作者: Yin Fang, Xinle Deng, Kangwei Liu, Ningyu Zhang, Jingyang Qian, Penghui Yang, Xiaohui Fan, Huajun Chen
cs.AI

摘要

大型语言模型擅长解释复杂的自然语言指令,使它们能够执行各种任务。在生命科学领域,单细胞RNA测序(scRNA-seq)数据被视为细胞生物学的“语言”,捕捉了单细胞水平上复杂的基因表达模式。然而,通过传统工具与这种“语言”进行交互通常效率低下且不直观,给研究人员带来挑战。为了解决这些限制,我们提出了InstructCell,这是一个多模态人工智能副驾驶,利用自然语言作为更直接和灵活的单细胞分析媒介。我们构建了一个全面的多模态指令数据集,将基于文本的指令与来自不同组织和物种的scRNA-seq文件配对。在此基础上,我们开发了一个多模态细胞语言架构,能够同时解释和处理两种模态。InstructCell使研究人员能够使用简单的自然语言命令完成关键任务,如细胞类型注释、条件伪细胞生成和药物敏感性预测。广泛的评估表明,InstructCell始终达到或超过现有单细胞基础模型的性能,同时适应各种实验条件。更重要的是,InstructCell为探索复杂的单细胞数据提供了一个易于访问和直观的工具,降低了技术门槛,促进了更深入的生物学洞察。
English
Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. In the life sciences, single-cell RNA sequencing (scRNA-seq) data serves as the "language of cellular biology", capturing intricate gene expression patterns at the single-cell level. However, interacting with this "language" through conventional tools is often inefficient and unintuitive, posing challenges for researchers. To address these limitations, we present InstructCell, a multi-modal AI copilot that leverages natural language as a medium for more direct and flexible single-cell analysis. We construct a comprehensive multi-modal instruction dataset that pairs text-based instructions with scRNA-seq profiles from diverse tissues and species. Building on this, we develop a multi-modal cell language architecture capable of simultaneously interpreting and processing both modalities. InstructCell empowers researchers to accomplish critical tasks-such as cell type annotation, conditional pseudo-cell generation, and drug sensitivity prediction-using straightforward natural language commands. Extensive evaluations demonstrate that InstructCell consistently meets or exceeds the performance of existing single-cell foundation models, while adapting to diverse experimental conditions. More importantly, InstructCell provides an accessible and intuitive tool for exploring complex single-cell data, lowering technical barriers and enabling deeper biological insights.

Summary

AI-Generated Summary

PDF242January 15, 2025