Applying the representational power of machine learning to the prediction of complex fluid dynamics has been a relevant subject of study for years. However, the amount of available fluid simulation data does not match the notoriously high requirements of machine learning methods. Researchers have typically addressed this issue by generating their own datasets, preventing a consistent evaluation of their proposed approaches. Our work introduces a generation procedure for synthetic multi-modal fluid simulations datasets. By leveraging a GPU implementation, our procedure is also efficient enough that no data needs to be exchanged between users, except for configuration files required to reproduce the dataset. Furthermore, our procedure allows multiple modalities (generating both geometry and photorealistic renderings) and is general enough for it to be applied to various tasks in data-driven fluid simulation. We then employ our framework to generate a set of thoughtfully designed benchmark datasets, which attempt to span specific fluid simulation scenarios in a meaningful way. The properties of our contributions are demonstrated by evaluating recently published algorithms for the neural fluid simulation and fluid inverse rendering tasks using our benchmark datasets. Our contribution aims to fulfill the community's need for standardized benchmarks, fostering research that is more reproducible and robust than previous endeavors.
翻译:将机器学习强大的表征能力应用于复杂流体动力学预测已成为多年来的研究热点。然而,现有流体仿真数据量难以满足机器学习方法众所周知的高数据需求。研究者通常通过自行生成数据集来解决该问题,但这阻碍了对所提方法的统一评估。本文提出了一种合成多模态流体仿真数据集的生成流程。通过利用GPU实现,该流程具有足够高的效率,用户间仅需交换用于复现数据集的配置文件,而无需传输实际数据。此外,该流程支持多模态(同时生成几何数据与逼真渲染结果),且具有通用性,可应用于数据驱动流体仿真的各类任务。我们利用该框架生成一组精心设计的基准数据集,旨在以有意义的方式覆盖特定流体仿真场景。通过采用我们的基准数据集评估近期发布的神经流体仿真与流体逆渲染算法,验证了所提方法的特性。这项工作旨在满足领域对标准化基准的需求,推动研究相较于既往工作更具可复现性与鲁棒性。