Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5600 processor cores confirm the practical effectiveness and efficiency of our approach.
翻译:经典并行迭代的收敛性检测通过在每次迭代中执行归约运算来计算相对于潜在解向量的残差误差来实现。为高效运行异步迭代,需避免阻塞通信请求,这使得隔离和处理全局向量变得困难。尽管已有针对异步迭代的终止协议被提出,但其中仅有极少数基于全局残差计算并能保证有效收敛。然而,现有最有效且高效的解决方案需执行两次归约运算,这构成了终止延迟的重要因素。本文提出了一种新的非侵入式协议,可在异步迭代下仅通过一次归约运算计算残差误差。不同通信模型表明,可进一步引入启发式方法并进行形式化评估。在多达5600个处理器核上的大量实验证实了该方法在实践中的有效性与高效性。