The block classical Gram--Schmidt (BCGS) algorithm and its reorthogonalized variant are widely-used methods for computing the economic QR factorization of block columns $X$ due to their lower communication cost compared to other approaches such as modified Gram--Schmidt and Householder QR. To further reduce communication, i.e., synchronization, there has been a long ongoing search for a variant of reorthogonalized BCGS variant that achieves $O(u)$ loss of orthogonality while requiring only \emph{one} synchronization point per block column, where $u$ represents the unit roundoff. Utilizing Pythagorean inner products and delayed normalization techniques, we propose the first provably stable one-synchronization reorthogonalized BCGS variant, demonstrating that it has $O(u)$ loss of orthogonality under the condition $O(u) \kappa^2(X) \leq 1/2$, where $\kappa(\cdot)$ represents the condition number. By incorporating one additional synchronization point, we develop a two-synchronization reorthogonalized BCGS variant which maintains $O(u)$ loss of orthogonality under the improved condition $O(u) \kappa(X) \leq 1/2$. An adaptive strategy is then proposed to combine these two variants, ensuring $O(u)$ loss of orthogonality while using as few synchronization points as possible under the less restrictive condition $O(u) \kappa(X) \leq 1/2$. As an example of where this adaptive approach is beneficial, we show that using the adaptive orthogonalization variant, $s$-step GMRES achieves a backward error comparable to $s$-step GMRES with BCGSI+, also known as BCGS2, both theoretically and numerically, but requires fewer synchronization points.
翻译:分块经典Gram--Schmidt(BCGS)算法及其重正交化变体因其相较于修正Gram--Schmidt和Householder QR等其他方法具有更低的通信开销,被广泛用于计算分块列$X$的经济型QR分解。为了进一步减少通信(即同步开销),学界长期致力于寻找一种重正交化BCGS变体,使其在每分块列仅需\emph{一次}同步点的条件下,仍能实现$O(u)$的正交性损失,其中$u$表示单位舍入误差。通过运用勾股内积与延迟归一化技术,我们首次提出了一种可证明稳定的单同步重正交化BCGS变体,并证明其在条件$O(u) \kappa^2(X) \leq 1/2$下具有$O(u)$的正交性损失,其中$\kappa(\cdot)$表示条件数。通过引入一个额外的同步点,我们进一步开发了一种双同步重正交化BCGS变体,该变体在改进条件$O(u) \kappa(X) \leq 1/2$下仍能保持$O(u)$的正交性损失。随后提出一种自适应策略将这两种变体相结合,在限制更宽松的条件$O(u) \kappa(X) \leq 1/2$下,确保$O(u)$正交性损失的同时尽可能减少同步点使用。作为该自适应方法优势的例证,我们证明采用自适应正交化变体时,$s$步GMRES在理论上和数值上均可获得与使用BCGSI+(亦称BCGS2)的$s$步GMRES相当的后向误差,但所需同步点更少。