We present and analyze a parallel implementation of a parallel-in-time collocation method based on $\alpha$-circulant preconditioned Richardson iterations. While many papers explore this family of single-level, time-parallel "all-at-once" integrators from various perspectives, performance results of actual parallel runs are still scarce. This leaves a critical gap, because the efficiency and applicability of any parallel method heavily rely on the actual parallel performance, with only limited guidance from theoretical considerations. Further, challenges like selecting good parameters, finding suitable communication strategies, and performing a fair comparison to sequential time-stepping methods can be easily missed. In this paper, we first extend the original idea of these fixed point iterative approaches based on $\alpha$-circulant preconditioners to high-order collocation methods, adding yet another level of parallelization in time "across the method". We derive an adaptive strategy to select a new $\alpha$-circulant preconditioner for each iteration during runtime for balancing convergence rates, round-off errors, and inexactness of inner system solves for the individual time-steps. After addressing these more theoretical challenges, we present an open-source space- and time-parallel implementation and evaluate its performance for two different test problems.
翻译:本文提出并分析了一种基于α-循环预条件Richardson迭代的时间并行配置法的并行实现。尽管众多文献从不同角度探讨了这一系列单层、时间并行“全耦合”积分器,但实际并行运行的性能结果仍十分稀缺。这留下了一个关键缺口,因为任何并行方法的效率和适用性都高度依赖于实际并行性能,而理论分析的指导作用有限。此外,诸如选择良好的参数、寻找合适的通信策略以及与串行时间步进方法进行公平比较等挑战,很容易被忽视。本文首先将基于α-循环预条件子的不动点迭代方法的原始思想扩展到高阶配置法,从而在时间维度上增加了“跨方法”的并行层次。我们推导出一种自适应策略,在运行时为每次迭代选择新的α-循环预条件子,以平衡收敛速率、舍入误差以及单个时间步内子系统求解的非精确性。在解决这些更具理论性的挑战之后,我们提出了一种开源的空间与时间并行实现,并针对两个不同的测试问题评估其性能。