ChatPaper.aiChatPaper

LLM-SRBench:面向大语言模型的科学方程发现新基准

LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

April 14, 2025
作者: Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D Doan, Chandan K Reddy
cs.AI

摘要

科学方程发现是科学进步历程中的一项基础性任务,它能够推导出支配自然现象的基本规律。近年来,大型语言模型(LLMs)因其利用内嵌科学知识进行假设生成的潜力,在这一任务上引起了广泛关注。然而,评估这些方法的真实发现能力仍具挑战性,因为现有基准测试往往依赖于LLMs可能通过记忆掌握的常见方程,导致性能指标虚高,无法真实反映发现过程。本文中,我们提出了LLM-SRBench,一个包含四个科学领域共239个挑战性问题的综合性基准测试,专门设计用于评估基于LLM的科学方程发现方法,同时避免简单的记忆效应。我们的基准测试主要包括两大类:LSR-Transform,它将常见的物理模型转化为不常见的数学表达,以测试超越记忆形式的推理能力;以及LSR-Synth,它引入了合成的、以发现为导向的问题,要求数据驱动的推理。通过对多种最先进方法的广泛评估,包括开放和封闭的LLMs,我们发现迄今为止表现最佳的系统仅达到31.5%的符号准确率。这些发现凸显了科学方程发现的挑战,确立了LLM-SRBench作为未来研究宝贵资源的地位。
English
Scientific equation discovery is a fundamental task in the history of scientific progress, enabling the derivation of laws governing natural phenomena. Recently, Large Language Models (LLMs) have gained interest for this task due to their potential to leverage embedded scientific knowledge for hypothesis generation. However, evaluating the true discovery capabilities of these methods remains challenging, as existing benchmarks often rely on common equations that are susceptible to memorization by LLMs, leading to inflated performance metrics that do not reflect discovery. In this paper, we introduce LLM-SRBench, a comprehensive benchmark with 239 challenging problems across four scientific domains specifically designed to evaluate LLM-based scientific equation discovery methods while preventing trivial memorization. Our benchmark comprises two main categories: LSR-Transform, which transforms common physical models into less common mathematical representations to test reasoning beyond memorized forms, and LSR-Synth, which introduces synthetic, discovery-driven problems requiring data-driven reasoning. Through extensive evaluation of several state-of-the-art methods, using both open and closed LLMs, we find that the best-performing system so far achieves only 31.5% symbolic accuracy. These findings highlight the challenges of scientific equation discovery, positioning LLM-SRBench as a valuable resource for future research.

Summary

AI-Generated Summary

PDF82April 15, 2025