Speaker
Description
The block classical Gram--Schmidt (BCGS) algorithm and its reorthogonalized variant are widely used methods for computing the economic QR factorization of block columns X due to their lower communication cost compared to other approaches such as modified Gram--Schmidt and Householder QR. To further reduce communication, i.e., synchronization, there has been a long ongoing search for a variant of reorthogonalized BCGS that achieves O(u) loss of orthogonality while requiring only one synchronization point per block column, where u represents the unit roundoff. Utilizing Pythagorean inner products and delayed normalization techniques, we propose the first provably stable one-synchronization reorthogonalized BCGS variant, demonstrating that it has O(u) loss of orthogonality under the condition O(u)κ^2(X) ≤ 1/2, where κ represents the condition number.
By incorporating one additional synchronization point, we develop a two-synchronization reorthogonalized BCGS variant which maintains O(u) loss of orthogonality under the improved condition O(u)κ(X) ≤ 1/2. An adaptive strategy is then proposed to combine these two variants, ensuring O(u) loss of orthogonality while using as few synchronization points as possible under the less restrictive condition O(u)κ(X) ≤ 1/2. As an example of where this adaptive approach is beneficial, we show that using the adaptive orthogonalization variant, s-step GMRES achieves a backward error comparable to s-step GMRES with BCGSI+, also known as BCGS2, both theoretically and numerically, but requires fewer synchronization points.