TOMG-Bench:評估基於文本的開放式分子生成語言模型
TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation
December 19, 2024
作者: Jiatong Li, Junxian Li, Yunqing Liu, Dongzhan Zhou, Qing Li
cs.AI
摘要
本文提出了基於文本的開放式分子生成基準(TOMG-Bench),這是第一個用於評估大型語言模型(LLMs)在開放領域分子生成能力的基準。TOMG-Bench 包含三個主要任務的數據集:分子編輯(MolEdit)、分子優化(MolOpt)和定制分子生成(MolCustom)。每個任務進一步包含三個子任務,每個子任務包含 5,000 個測試樣本。鑒於開放式分子生成的固有複雜性,我們還開發了一個自動評估系統,有助於測量生成分子的質量和準確性。我們對 25 個 LLMs 進行了全面的基準測試,揭示了文本引導的分子探索中目前的限制和潛在改進領域。此外,借助 OpenMolIns 的幫助,這是一個專門用於解決 TOMG-Bench 提出挑戰的指令調整數據集,Llama3.1-8B 能夠優於所有開源通用 LLMs,甚至在 TOMG-Bench 上超越 GPT-3.5-turbo 46.5%。我們的代碼和數據集可通過 https://github.com/phenixace/TOMG-Bench 獲取。
English
In this paper, we propose Text-based Open Molecule Generation Benchmark
(TOMG-Bench), the first benchmark to evaluate the open-domain molecule
generation capability of LLMs. TOMG-Bench encompasses a dataset of three major
tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and
customized molecule generation (MolCustom). Each task further contains three
subtasks, with each subtask comprising 5,000 test samples. Given the inherent
complexity of open molecule generation, we have also developed an automated
evaluation system that helps measure both the quality and the accuracy of the
generated molecules. Our comprehensive benchmarking of 25 LLMs reveals the
current limitations and potential areas for improvement in text-guided molecule
discovery. Furthermore, with the assistance of OpenMolIns, a specialized
instruction tuning dataset proposed for solving challenges raised by
TOMG-Bench, Llama3.1-8B could outperform all the open-source general LLMs, even
surpassing GPT-3.5-turbo by 46.5\% on TOMG-Bench. Our codes and datasets are
available through https://github.com/phenixace/TOMG-Bench.Summary
AI-Generated Summary