LLM-SRBench:基於大型語言模型的科學方程式發現新基準
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
April 14, 2025
作者: Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D Doan, Chandan K Reddy
cs.AI
摘要
科學方程式的發現是科學進步史上的一項基礎任務,它使得我們能夠推導出支配自然現象的定律。近年來,大型語言模型(LLMs)因其利用內嵌科學知識進行假設生成的潛力,在這一任務中引起了廣泛關注。然而,評估這些方法的真實發現能力仍然具有挑戰性,因為現有的基準測試往往依賴於LLMs可能通過記憶掌握的常見方程式,這導致了性能指標的虛高,無法真實反映發現過程。本文介紹了LLM-SRBench,這是一個包含四個科學領域共239個挑戰性問題的綜合基準測試,專門設計用於評估基於LLM的科學方程式發現方法,同時防止簡單的記憶。我們的基準測試主要包括兩大類別:LSR-Transform,它將常見的物理模型轉化為不常見的數學表示,以測試超越記憶形式的推理能力;以及LSR-Synth,它引入了需要數據驅動推理的合成、發現導向的問題。通過對多種最先進方法(包括開放和封閉的LLMs)的廣泛評估,我們發現迄今為止表現最佳的系統僅達到31.5%的符號準確率。這些發現凸顯了科學方程式發現的挑戰,將LLM-SRBench定位為未來研究的寶貴資源。
English
Scientific equation discovery is a fundamental task in the history of
scientific progress, enabling the derivation of laws governing natural
phenomena. Recently, Large Language Models (LLMs) have gained interest for this
task due to their potential to leverage embedded scientific knowledge for
hypothesis generation. However, evaluating the true discovery capabilities of
these methods remains challenging, as existing benchmarks often rely on common
equations that are susceptible to memorization by LLMs, leading to inflated
performance metrics that do not reflect discovery. In this paper, we introduce
LLM-SRBench, a comprehensive benchmark with 239 challenging problems across
four scientific domains specifically designed to evaluate LLM-based scientific
equation discovery methods while preventing trivial memorization. Our benchmark
comprises two main categories: LSR-Transform, which transforms common physical
models into less common mathematical representations to test reasoning beyond
memorized forms, and LSR-Synth, which introduces synthetic, discovery-driven
problems requiring data-driven reasoning. Through extensive evaluation of
several state-of-the-art methods, using both open and closed LLMs, we find that
the best-performing system so far achieves only 31.5% symbolic accuracy. These
findings highlight the challenges of scientific equation discovery, positioning
LLM-SRBench as a valuable resource for future research.Summary
AI-Generated Summary