Long-term beamforming substantially reduces the channel estimation and inversion overhead of conventional massive MU-MIMO receivers; yet, its construction still hinges on the inversion of a large Hermitian matrix, whose condition number deteriorates with the per-user SNR dynamic range. When this inversion is approximated in hardware via the conjugate gradient (CG) algorithm, the deterioration directly inflates the iteration count and, consequently, the energy and latency budget. We propose a hardware-friendly low-rank preconditioning framework that targets exactly this bottleneck. The preconditioner is constructed from the top eigenpairs of the long-term covariance matrix through a randomized complex eigenvalue decomposition (RC-EVD), whose inner QR factorizations are realized via a Cholesky-based scheme (QRC), confining the dominant cost to generalized matrix multiplication (GEMM) and small triangular solves that map naturally onto systolic arrays. We further show that performing the preconditioned CG inversion in the beamspace domain induces sparsification of the system matrix and provides additional convergence acceleration at negligible transformation cost. Ray-tracing simulations confirm that the joint scheme reduces the required CG iteration count by two to three while matching the post-equalization SINR of the exact inversion.
翻译:长期波束成形可大幅降低传统大规模MU-MIMO接收机的信道估计与求逆开销;然而,其构造仍依赖于大型厄米特矩阵的求逆,该矩阵的条件数会随用户信噪比动态范围增大而恶化。当通过共轭梯度(CG)算法在硬件中逼近该求逆过程时,条件数恶化将直接导致迭代次数增加,进而推高能耗与时延预算。针对这一瓶颈,我们提出一种硬件友好的低秩预处理框架。该预处理矩阵通过随机复特征值分解(RC-EVD)从长期协方差矩阵的顶部特征对构建,其中内部QR分解采用基于Cholesky的方案(QRC)实现,将主要计算代价约束为广义矩阵乘法(GEMM)和小型三角方程求解,这两种运算可自然映射到脉动阵列。我们进一步证明,在波束域执行预处理CG求逆会诱导系统矩阵稀疏化,并能在可忽略的变换成本下提供额外的收敛加速。光线追踪仿真表明,该联合方案可将所需CG迭代次数减少至原来的三分之一至四分之一,同时保持与精确求逆相同的均衡后信干噪比。