We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the PySPOD (https://github.com/MathEXLab/PySPOD) library and makes use of the standard message passing interface (MPI) library, implemented in Python via mpi4py (https://mpi4py.readthedocs.io/en/stable/). An extensive performance evaluation of the parallel package is provided, including strong and weak scalability analyses. The open-source library allows the analysis of large datasets of interest across the scientific community. Here, we present applications in fluid dynamics and geophysics, that are extremely difficult (if not impossible) to achieve without a parallel algorithm. This work opens the path toward modal analyses of big quasi-stationary data, helping to uncover new unexplored spatio-temporal patterns.
翻译:我们提出了一种谱本征正交分解(SPOD)技术的并行(分布式)版本。该并行SPOD算法在保持时间维度的前提下,对数据集的空间维度进行分布式处理。采用此方法旨在保留数据在时间维度上的非分布式快速傅里叶变换,从而避免相关的计算瓶颈。该并行SPOD算法已在PySPOD库(https://github.com/MathEXLab/PySPOD)中实现,并利用标准消息传递接口(MPI)库——通过Python中的mpi4py(https://mpi4py.readthedocs.io/en/stable/)实现。提供了该并行包的全面性能评估,包括强可扩展性和弱可扩展性分析。这一开源库使科学界能够分析感兴趣的大规模数据集。在此,我们展示了流体力学和地球物理学中的应用实例,这些应用在没有并行算法的情况下极其困难(甚至不可能)实现。本工作为大数据准稳态数据的模态分析开辟了道路,有助于揭示未探索的时空新模态。