WAKESET: A Large-Scale, High-Reynolds Number Flow Dataset for Machine Learning of Turbulent Wake Dynamics

Machine learning (ML) offers transformative potential for computational fluid dynamics (CFD), promising to accelerate simulations, improve turbulence modelling, and enable real-time flow prediction and control-capabilities that could fundamentally change how engineers approach fluid dynamics problems. However, the exploration of ML in fluid dynamics is critically hampered by the scarcity of large, diverse, and high-fidelity datasets suitable for training robust models. This limitation is particularly acute for highly turbulent flows, which dominate practical engineering applications yet remain computationally prohibitive to simulate at scale. High-Reynolds number turbulent datasets are essential for ML models to learn the complex, multi-scale physics characteristic of real-world flows, enabling generalisation beyond the simplified, low-Reynolds number regimes often represented in existing datasets. This paper introduces WAKESET, a novel, large-scale CFD dataset of highly turbulent flows, designed to address this critical gap. The dataset captures the complex hydrodynamic interactions during the underwater recovery of an autonomous underwater vehicle by a larger extra-large uncrewed underwater vehicle. It comprises 1,091 high-fidelity Reynolds-Averaged Navier-Stokes simulations, augmented to 4,364 instances, covering a wide operational envelope of speeds (up to Reynolds numbers of 1.09 x 10^8) and turning angles. This work details the motivation for this new dataset by reviewing existing resources, outlines the hydrodynamic modelling and validation underpinning its creation, and describes its structure. The dataset's focus on a practical engineering problem, its scale, and its high turbulence characteristics make it a valuable resource for developing and benchmarking ML models for flow field prediction, surrogate modelling, and autonomous navigation in complex underwater environments.

翻译：机器学习（ML）为计算流体力学（CFD）带来了变革性潜力，有望加速模拟、改进湍流建模，并实现实时流动预测与控制——这些能力可能从根本上改变工程师处理流体动力学问题的方式。然而，流体动力学中机器学习的探索受到一个关键制约：缺乏适用于训练鲁棒模型的大规模、多样化且高保真度的数据集。这一限制对于高度湍流尤为突出，此类流动在实际工程应用中占主导地位，但大规模模拟的计算成本仍然过高。高雷诺数湍流数据集对于机器学习模型学习真实流动中复杂的多尺度物理特性至关重要，使其能够泛化到现有数据集通常代表的简化低雷诺数流态之外。本文介绍了WAKESET，一个新颖的、针对高度湍流的大规模CFD数据集，旨在填补这一关键空白。该数据集捕捉了大型超大型无人水下航行器回收自主水下航行器过程中的复杂水动力相互作用。它包含1,091个高保真雷诺平均Navier-Stokes模拟，并增强至4,364个实例，覆盖了宽广的运行包线，包括速度（雷诺数高达1.09 × 10^8）和转向角。本文通过回顾现有资源详述了构建该数据集的动机，概述了支撑其创建的水动力建模与验证过程，并描述了其结构。该数据集聚焦于实际工程问题，其规模及高湍流特性，使其成为在复杂水下环境中开发与评估用于流场预测、代理建模及自主导航的机器学习模型的宝贵资源。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【斯坦福博士论文】可扩展、高效且安全的机器学习数据系统

专知会员服务

21+阅读 · 2025年6月9日

【斯坦福博士论文】超越最大似然估计：分布感知机器学习

专知会员服务

30+阅读 · 2024年9月7日

《计算流体力学中的机器学习最新进展》综述

专知会员服务

37+阅读 · 2024年8月24日

【ETHZ博士论文】机器学习系统的概率鲁棒性保证，312页pdf

专知会员服务

37+阅读 · 2024年2月19日