TOMG-Bench：評估基於文本的開放式分子生成語言模型

摘要

本文提出了基於文本的開放式分子生成基準（TOMG-Bench），這是第一個用於評估大型語言模型（LLMs）在開放領域分子生成能力的基準。TOMG-Bench 包含三個主要任務的數據集：分子編輯（MolEdit）、分子優化（MolOpt）和定制分子生成（MolCustom）。每個任務進一步包含三個子任務，每個子任務包含 5,000 個測試樣本。鑒於開放式分子生成的固有複雜性，我們還開發了一個自動評估系統，有助於測量生成分子的質量和準確性。我們對 25 個 LLMs 進行了全面的基準測試，揭示了文本引導的分子探索中目前的限制和潛在改進領域。此外，借助 OpenMolIns 的幫助，這是一個專門用於解決 TOMG-Bench 提出挑戰的指令調整數據集，Llama3.1-8B 能夠優於所有開源通用 LLMs，甚至在 TOMG-Bench 上超越 GPT-3.5-turbo 46.5％。我們的代碼和數據集可通過 https://github.com/phenixace/TOMG-Bench 獲取。

English

In this paper, we propose Text-based Open Molecule Generation Benchmark (TOMG-Bench), the first benchmark to evaluate the open-domain molecule generation capability of LLMs. TOMG-Bench encompasses a dataset of three major tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and customized molecule generation (MolCustom). Each task further contains three subtasks, with each subtask comprising 5,000 test samples. Given the inherent complexity of open molecule generation, we have also developed an automated evaluation system that helps measure both the quality and the accuracy of the generated molecules. Our comprehensive benchmarking of 25 LLMs reveals the current limitations and potential areas for improvement in text-guided molecule discovery. Furthermore, with the assistance of OpenMolIns, a specialized instruction tuning dataset proposed for solving challenges raised by TOMG-Bench, Llama3.1-8B could outperform all the open-source general LLMs, even surpassing GPT-3.5-turbo by 46.5\% on TOMG-Bench. Our codes and datasets are available through https://github.com/phenixace/TOMG-Bench.

TOMG-Bench：評估基於文本的開放式分子生成語言模型

TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation

摘要

Summary

Support