Low-Complexity Beamspace Channel Denoiser for mmWave Massive MIMO with Low-Resolution ADCs

In this paper, we propose a low-complexity beamspace channel denoising algorithm for millimeter-wave (mmWave) massive multi-input multi-output (MIMO) systems with low-resolution analog-to-digital converters (ADCs). The proposed method exploits the inherent sparsity of mmWave channels in the beamspace domain and formulates the denoising problem as a Bayesian binary hypothesis testing under a Bernoulli-complex Gaussian prior. To capture the distortion induced by low-resolution ADCs in a complexity-efficient manner, thermal noise and quantization noise are jointly modeled as a composite noise. Based on this modeling, a closed-form threshold value and a hard-thresholding-based denoising rule are derived to distinguish signal-dominant and noise-dominant components. The resulting algorithm avoids computationally intensive operations such as matrix inversion, iterative optimization, and parameter searching, and achieves near-linear computational complexity with respect to the number of antennas. Furthermore, a hardware-efficient very large-scale integration (VLSI) architecture is developed to enable practical deployment of the proposed algorithm, and is implemented on an AMD-Xilinx Kintex UltraScale+ KCU116 FPGA platform. The design incorporates hardware-aware simplifications and an efficient processing structure, leading to significantly lower latency and reduced hardware resource utilization compared to existing hardware implementations, along with sublinear scaling as the number of antennas increases. Extensive simulation results demonstrate that the proposed method achieves performance comparable to computationally intensive existing approaches while significantly reducing computational complexity.

翻译：本文针对配备低分辨率模数转换器（ADC）的毫米波（mmWave）大规模多输入多输出（MIMO）系统，提出了一种低复杂度的波束域信道去噪算法。该方法利用毫米波信道在波束域中固有的稀疏性，将去噪问题建模为伯努利-复高斯先验下的贝叶斯二元假设检验。为高效捕获低分辨率ADC引入的失真，热噪声与量化噪声被联合建模为复合噪声。基于该建模，推导出闭式阈值及基于硬阈值的去噪准则，以区分信号主导分量与噪声主导分量。所提算法避免了矩阵求逆、迭代优化及参数搜索等计算密集型操作，实现了天线数近线性的计算复杂度。此外，为便于算法实际部署，我们开发了面向硬件高效实现的超大规模集成（VLSI）架构，并在AMD-Xilinx Kintex UltraScale+ KCU116 FPGA平台上完成了实现。该设计融合了硬件感知简化策略与高效处理结构，与现有硬件实现相比，显著降低了延迟与硬件资源消耗，且随天线数增加呈现亚线性扩展。大量仿真结果表明，所提方法在显著降低计算复杂度的同时，实现了与计算密集型现有方法相当的性能。