We present an efficient algorithm for the application of sequences of planar rotations to a matrix. Applying such sequences efficiently is important in many numerical linear algebra algorithms for eigenvalues. Our algorithm is novel in three main ways. First, we introduce a new kernel that is optimized for register reuse in a novel way. Second, we introduce a blocking and packing scheme that improves the cache efficiency of the algorithm. Finally, we thoroughly analyze the memory operations of the algorithm which leads to important theoretical insights and makes it easier to select good parameters. Numerical experiments show that our algorithm outperforms the state-of-the-art and achieves a flop rate close to the theoretical peak on modern hardware.
翻译:本文提出了一种高效算法,用于将平面旋转序列应用于矩阵。在众多特征值数值线性代数算法中,高效应用此类序列至关重要。我们的算法在三个主要方面具有创新性。首先,我们引入了一种新颖的内核,以创新的方式优化了寄存器重用。其次,我们提出了一种分块与打包方案,提升了算法的缓存效率。最后,我们对算法的内存操作进行了深入分析,这带来了重要的理论见解,并使得选择更优参数更为容易。数值实验表明,我们的算法优于现有技术,并在现代硬件上实现了接近理论峰值的浮点运算速率。