《The Well:用于机器学习的大规模多样化物理模拟数据集》
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
November 30, 2024
作者: Ruben Ohana, Michael McCabe, Lucas Meyer, Rudy Morel, Fruzsina J. Agocs, Miguel Beneitez, Marsha Berger, Blakesley Burkhart, Stuart B. Dalziel, Drummond B. Fielding, Daniel Fortunato, Jared A. Goldberg, Keiya Hirashima, Yan-Fei Jiang, Rich R. Kerswell, Suryanarayana Maddu, Jonah Miller, Payel Mukhopadhyay, Stefan S. Nixon, Jeff Shen, Romain Watteaux, Bruno Régaldo-Saint Blancard, François Rozet, Liam H. Parker, Miles Cranmer, Shirley Ho
cs.AI
摘要
基于机器学习的代理模型为加速基于模拟的工作流程提供了强大的工具。然而,由于该领域的标准数据集通常涵盖物理行为的小类别,因此评估新方法的有效性可能会很困难。为了弥补这一差距,我们引入了Well:一个大规模数据集合,包含了各种时空物理系统的数值模拟。Well汇集了领域专家和数值软件开发人员的力量,提供了来自16个数据集的总计15TB数据,涵盖生物系统、流体动力学、声学散射以及超银河流体或超新星爆炸等多样领域。这些数据集可以单独使用,也可作为更广泛基准套件的一部分。为了便于使用Well,我们提供了一个统一的PyTorch接口,用于训练和评估模型。我们通过引入突出Well复杂动态所带来的新挑战的示例基线,展示了该库的功能。代码和数据可在https://github.com/PolymathicAI/the_well 获取。
English
Machine learning based surrogate models offer researchers powerful tools for
accelerating simulation-based workflows. However, as standard datasets in this
space often cover small classes of physical behavior, it can be difficult to
evaluate the efficacy of new approaches. To address this gap, we introduce the
Well: a large-scale collection of datasets containing numerical simulations of
a wide variety of spatiotemporal physical systems. The Well draws from domain
experts and numerical software developers to provide 15TB of data across 16
datasets covering diverse domains such as biological systems, fluid dynamics,
acoustic scattering, as well as magneto-hydrodynamic simulations of
extra-galactic fluids or supernova explosions. These datasets can be used
individually or as part of a broader benchmark suite. To facilitate usage of
the Well, we provide a unified PyTorch interface for training and evaluating
models. We demonstrate the function of this library by introducing example
baselines that highlight the new challenges posed by the complex dynamics of
the Well. The code and data is available at
https://github.com/PolymathicAI/the_well.Summary
AI-Generated Summary