TOMG-Bench:评估基于文本的开放式分子生成语言模型
TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation
December 19, 2024
作者: Jiatong Li, Junxian Li, Yunqing Liu, Dongzhan Zhou, Qing Li
cs.AI
摘要
本文提出了基于文本的开放式分子生成基准(TOMG-Bench),这是第一个用于评估LLM的开放领域分子生成能力的基准。TOMG-Bench包含三个主要任务的数据集:分子编辑(MolEdit)、分子优化(MolOpt)和定制分子生成(MolCustom)。每个任务进一步包含三个子任务,每个子任务包含5,000个测试样本。鉴于开放式分子生成的固有复杂性,我们还开发了一个自动评估系统,帮助衡量生成的分子的质量和准确性。我们对25个LLM进行了全面的基准测试,揭示了文本引导的分子发现中当前的局限性和潜在改进领域。此外,借助OpenMolIns的帮助,这是一个专门用于解决TOMG-Bench提出的挑战的指令调整数据集,Llama3.1-8B能够胜过所有开源的通用LLM,甚至比GPT-3.5-turbo在TOMG-Bench上提高了46.5%。我们的代码和数据集可通过https://github.com/phenixace/TOMG-Bench获得。
English
In this paper, we propose Text-based Open Molecule Generation Benchmark
(TOMG-Bench), the first benchmark to evaluate the open-domain molecule
generation capability of LLMs. TOMG-Bench encompasses a dataset of three major
tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and
customized molecule generation (MolCustom). Each task further contains three
subtasks, with each subtask comprising 5,000 test samples. Given the inherent
complexity of open molecule generation, we have also developed an automated
evaluation system that helps measure both the quality and the accuracy of the
generated molecules. Our comprehensive benchmarking of 25 LLMs reveals the
current limitations and potential areas for improvement in text-guided molecule
discovery. Furthermore, with the assistance of OpenMolIns, a specialized
instruction tuning dataset proposed for solving challenges raised by
TOMG-Bench, Llama3.1-8B could outperform all the open-source general LLMs, even
surpassing GPT-3.5-turbo by 46.5\% on TOMG-Bench. Our codes and datasets are
available through https://github.com/phenixace/TOMG-Bench.Summary
AI-Generated Summary