Distributed computing for physics-based data-driven reduced modeling at scale: Application to a rotating detonation rocket engine

High-performance computing (HPC) has revolutionized our ability to perform detailed simulations of complex real-world processes. A prominent contemporary example is from aerospace propulsion, where HPC is used for rotating detonation rocket engine (RDRE) simulations in support of the design of next-generation rocket engines; however, these simulations take millions of core hours even on powerful supercomputers, which makes them impractical for engineering tasks like design exploration and risk assessment. Reduced-order models (ROMs) address this limitation by constructing computationally cheap yet sufficiently accurate approximations that serve as surrogates for the high-fidelity model. This paper contributes a new distributed algorithm that achieves fast and scalable construction of predictive physics-based ROMs trained from sparse datasets of extremely large state dimension. The algorithm learns structured physics-based ROMs that approximate the dynamical systems underlying those datasets. This enables model reduction for problems at a scale and complexity that exceeds the capabilities of existing approaches. We demonstrate our algorithm's scalability using up to $2,048$ cores on the Frontera supercomputer at the Texas Advanced Computing Center. We focus on a real-world three-dimensional RDRE for which one millisecond of simulated physical time requires one million core hours on a supercomputer. Using a training dataset of $2,536$ snapshots each of state dimension $76$ million, our distributed algorithm enables the construction of a predictive data-driven reduced model in just $13$ seconds on $2,048$ cores on Frontera.

翻译：高性能计算（HPC）彻底改变了我们对复杂现实世界过程进行详细仿真的能力。一个当代的突出例子来自航空航天推进领域，其中HPC被用于旋转爆震火箭发动机（RDRE）仿真，以支持下一代火箭发动机的设计；然而，即使是在强大的超级计算机上，这些仿真也需要耗费数百万核心小时，这使得它们对于设计探索和风险评估等工程任务而言不切实际。降阶模型（ROMs）通过构建计算成本低廉且足够精确的近似模型作为高保真模型的代理，从而解决了这一局限。本文提出了一种新的分布式算法，能够快速且可扩展地构建基于物理的预测性ROMs，这些模型由状态维度极大的稀疏数据集训练而成。该算法学习结构化的、基于物理的ROMs，以近似这些数据集背后的动力系统。这使得针对规模和复杂性超出已有方法能力范围的问题进行模型降阶成为可能。我们在德克萨斯高级计算中心的Frontera超级计算机上使用多达$2,048$个核心，展示了我们算法的可扩展性。我们聚焦于一个真实世界的三维RDRE案例，其每毫秒的物理时间仿真在超级计算机上需要一百万核心小时。利用一个包含$2,536$个快照（每个快照的状态维度为$7,600$万）的训练数据集，我们的分布式算法能够在Frontera的$2,048$个核心上，仅用$13$秒就构建出一个预测性的数据驱动降阶模型。