The Well:用於機器學習的大規模多樣性物理模擬數據集
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
November 30, 2024
作者: Ruben Ohana, Michael McCabe, Lucas Meyer, Rudy Morel, Fruzsina J. Agocs, Miguel Beneitez, Marsha Berger, Blakesley Burkhart, Stuart B. Dalziel, Drummond B. Fielding, Daniel Fortunato, Jared A. Goldberg, Keiya Hirashima, Yan-Fei Jiang, Rich R. Kerswell, Suryanarayana Maddu, Jonah Miller, Payel Mukhopadhyay, Stefan S. Nixon, Jeff Shen, Romain Watteaux, Bruno Régaldo-Saint Blancard, François Rozet, Liam H. Parker, Miles Cranmer, Shirley Ho
cs.AI
摘要
基於機器學習的替代模型為研究人員提供了強大的工具,加速基於模擬的工作流程。然而,在這個領域中,由於標準數據集通常僅涵蓋少量物理行為類別,因此評估新方法的有效性可能會很困難。為彌補這一差距,我們引入了 Well:一個大規模的數據集合,包含各種時空物理系統的數值模擬。Well 匯集了領域專家和數值軟件開發人員的資源,提供了涵蓋生物系統、流體動力學、聲學散射以及星系外流體或超新星爆炸等多個領域的 16 個數據集,總共 15TB 的數據。這些數據集可以單獨使用,也可以作為更廣泛基準套件的一部分。為了方便使用 Well,我們提供了統一的 PyTorch 接口,用於模型的訓練和評估。我們通過引入突顯 Well 複雜動態所帶來的新挑戰的示例基準,展示了此庫的功能。代碼和數據可在 https://github.com/PolymathicAI/the_well 上找到。
English
Machine learning based surrogate models offer researchers powerful tools for
accelerating simulation-based workflows. However, as standard datasets in this
space often cover small classes of physical behavior, it can be difficult to
evaluate the efficacy of new approaches. To address this gap, we introduce the
Well: a large-scale collection of datasets containing numerical simulations of
a wide variety of spatiotemporal physical systems. The Well draws from domain
experts and numerical software developers to provide 15TB of data across 16
datasets covering diverse domains such as biological systems, fluid dynamics,
acoustic scattering, as well as magneto-hydrodynamic simulations of
extra-galactic fluids or supernova explosions. These datasets can be used
individually or as part of a broader benchmark suite. To facilitate usage of
the Well, we provide a unified PyTorch interface for training and evaluating
models. We demonstrate the function of this library by introducing example
baselines that highlight the new challenges posed by the complex dynamics of
the Well. The code and data is available at
https://github.com/PolymathicAI/the_well.Summary
AI-Generated Summary