Millimeter-wave (mmWave) radar captures are band-limited and noisy, making for difficult reconstruction of intelligible full-bandwidth speech. In this work, we propose a two-stage speech reconstruction pipeline for mmWave using a Radar-Aware Dual-conditioned Generative Adversarial Network (RAD-GAN), which is capable of performing bandwidth extension on signals with low signal-to-noise ratios (-5 dB to -1 dB), captured through glass walls. We propose an mmWave-tailored Multi-Mel Discriminator (MMD) and a Residual Fusion Gate (RFG) to enhance the generator input to process multiple conditioning channels. The proposed two-stage pipeline involves pretraining the model on synthetically clipped clean speech and finetuning on fused mel spectrograms generated by the RFG. We empirically show that the proposed method, trained on a limited dataset, with no pre-trained modules, and no data augmentations, outperformed state-of-the-art approaches for this specific task. Audio examples of RAD-GAN are available online at https://rad-gan-demo-site.vercel.app/.
翻译:毫米波雷达捕获的信号具有带宽受限且噪声较大的特点,这使得重建清晰可懂的全带宽语音变得困难。在本工作中,我们提出了一种用于毫米波信号的两阶段语音重建流程,采用雷达感知双条件生成对抗网络(RAD-GAN)。该网络能够对信噪比极低(-5 dB 至 -1 dB)、且透过玻璃墙捕获的信号进行带宽扩展。我们提出了一种专为毫米波设计的多重梅尔判别器(MMD)和一个残差融合门(RFG),以增强生成器处理多路条件输入的能力。所提出的两阶段流程包括:在合成截断的纯净语音上对模型进行预训练,然后在由RFG生成的融合梅尔频谱图上进行微调。我们通过实验证明,所提出的方法在有限数据集上训练,未使用任何预训练模块或数据增强技术,在此特定任务上仍优于最先进的方法。RAD-GAN的音频示例可在 https://rad-gan-demo-site.vercel.app/ 在线获取。