TOMG-Bench：评估基于文本的开放式分子生成语言模型

摘要

本文提出了基于文本的开放式分子生成基准（TOMG-Bench），这是第一个用于评估LLM的开放领域分子生成能力的基准。TOMG-Bench包含三个主要任务的数据集：分子编辑（MolEdit）、分子优化（MolOpt）和定制分子生成（MolCustom）。每个任务进一步包含三个子任务，每个子任务包含5,000个测试样本。鉴于开放式分子生成的固有复杂性，我们还开发了一个自动评估系统，帮助衡量生成的分子的质量和准确性。我们对25个LLM进行了全面的基准测试，揭示了文本引导的分子发现中当前的局限性和潜在改进领域。此外，借助OpenMolIns的帮助，这是一个专门用于解决TOMG-Bench提出的挑战的指令调整数据集，Llama3.1-8B能够胜过所有开源的通用LLM，甚至比GPT-3.5-turbo在TOMG-Bench上提高了46.5％。我们的代码和数据集可通过https://github.com/phenixace/TOMG-Bench获得。

English

In this paper, we propose Text-based Open Molecule Generation Benchmark (TOMG-Bench), the first benchmark to evaluate the open-domain molecule generation capability of LLMs. TOMG-Bench encompasses a dataset of three major tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and customized molecule generation (MolCustom). Each task further contains three subtasks, with each subtask comprising 5,000 test samples. Given the inherent complexity of open molecule generation, we have also developed an automated evaluation system that helps measure both the quality and the accuracy of the generated molecules. Our comprehensive benchmarking of 25 LLMs reveals the current limitations and potential areas for improvement in text-guided molecule discovery. Furthermore, with the assistance of OpenMolIns, a specialized instruction tuning dataset proposed for solving challenges raised by TOMG-Bench, Llama3.1-8B could outperform all the open-source general LLMs, even surpassing GPT-3.5-turbo by 46.5\% on TOMG-Bench. Our codes and datasets are available through https://github.com/phenixace/TOMG-Bench.

TOMG-Bench：评估基于文本的开放式分子生成语言模型

TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation

摘要

Summary

Support