迈向全自动材料发现:基于大规模合成数据集与专家级大语言模型评判
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge
February 23, 2025
作者: Heegyu Kim, Taeyang Jeon, Seungtaek Choi, Jihoon Hong, Dongwon Jeon, Sungbum Cho, Ga-Yeon Baek, Kyung-Won Kwak, Dong-Hee Lee, Sun-Jin Choi, Jisu Bae, Chihoon Lee, Yunseo Kim, Jinsung Park, Hyunsouk Cho
cs.AI
摘要
材料合成对于能源存储、催化、电子和生物医学设备等领域的创新至关重要。然而,这一过程主要依赖于基于专家直觉的经验性试错方法。我们的工作旨在通过提供一个实用的、数据驱动的资源来支持材料科学界。我们整理了一个包含17,000条专家验证的合成配方数据集,这些数据源自开放获取的文献,构成了我们新开发的基准测试AlchemyBench的基础。AlchemyBench提供了一个端到端的框架,支持应用于合成预测的大型语言模型研究。它涵盖了关键任务,包括原材料与设备预测、合成程序生成以及表征结果预测。我们提出了一个LLM-as-a-Judge框架,利用大型语言模型进行自动化评估,显示出与专家评估的高度统计一致性。总体而言,我们的贡献为探索大型语言模型在预测和指导材料合成方面的能力提供了支持性基础,最终为更高效的实验设计和加速材料科学创新铺平了道路。
English
Materials synthesis is vital for innovations such as energy storage,
catalysis, electronics, and biomedical devices. Yet, the process relies heavily
on empirical, trial-and-error methods guided by expert intuition. Our work aims
to support the materials science community by providing a practical,
data-driven resource. We have curated a comprehensive dataset of 17K
expert-verified synthesis recipes from open-access literature, which forms the
basis of our newly developed benchmark, AlchemyBench. AlchemyBench offers an
end-to-end framework that supports research in large language models applied to
synthesis prediction. It encompasses key tasks, including raw materials and
equipment prediction, synthesis procedure generation, and characterization
outcome forecasting. We propose an LLM-as-a-Judge framework that leverages
large language models for automated evaluation, demonstrating strong
statistical agreement with expert assessments. Overall, our contributions offer
a supportive foundation for exploring the capabilities of LLMs in predicting
and guiding materials synthesis, ultimately paving the way for more efficient
experimental design and accelerated innovation in materials science.Summary
AI-Generated Summary