TOMG-Bench：テキストベースのオープン分子生成におけるLLMの評価

要旨

本論文では、LLM（Large Language Models）のオープンドメインにおける分子生成能力を評価する初のベンチマークであるText-based Open Molecule Generation Benchmark（TOMG-Bench）を提案します。TOMG-Benchには、分子編集（MolEdit）、分子最適化（MolOpt）、およびカスタマイズされた分子生成（MolCustom）の3つの主要タスクのデータセットが含まれます。各タスクにはさらに3つのサブタスクがあり、各サブタスクには5,000のテストサンプルが含まれています。オープンな分子生成の固有の複雑さを考慮して、生成された分子の品質と精度の両方を測定するのに役立つ自動評価システムも開発しました。25のLLMを包括的にベンチマーク化することで、テキストによる分子探索の現在の制限と改善の可能性が明らかになります。さらに、TOMG-Benchで提起された課題を解決するために提案された専門の指示チューニングデータセットであるOpenMolInsの支援を受けて、Llama3.1-8Bはすべてのオープンソースの一般的なLLMを上回り、さらにGPT-3.5-turboを46.5％上回る結果をTOMG-Benchで達成しました。コードとデータセットは、https://github.com/phenixace/TOMG-Bench から入手可能です。

English

In this paper, we propose Text-based Open Molecule Generation Benchmark (TOMG-Bench), the first benchmark to evaluate the open-domain molecule generation capability of LLMs. TOMG-Bench encompasses a dataset of three major tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and customized molecule generation (MolCustom). Each task further contains three subtasks, with each subtask comprising 5,000 test samples. Given the inherent complexity of open molecule generation, we have also developed an automated evaluation system that helps measure both the quality and the accuracy of the generated molecules. Our comprehensive benchmarking of 25 LLMs reveals the current limitations and potential areas for improvement in text-guided molecule discovery. Furthermore, with the assistance of OpenMolIns, a specialized instruction tuning dataset proposed for solving challenges raised by TOMG-Bench, Llama3.1-8B could outperform all the open-source general LLMs, even surpassing GPT-3.5-turbo by 46.5\% on TOMG-Bench. Our codes and datasets are available through https://github.com/phenixace/TOMG-Bench.

TOMG-Bench：テキストベースのオープン分子生成におけるLLMの評価

TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation

要旨

Summary

Support

Support