This paper presents a configurable version of Extreme Bandwidth Extension Network (EBEN), a Generative Adversarial Network (GAN) designed to improve audio captured with body-conduction microphones. We show that these microphones significantly reduce environmental noise. However, this insensitivity to ambient noise is at the expense of the bandwidth of the voice signal acquired from the wearer of the devices. The obtained captured signals therefore require the use of signal enhancement techniques to recover the full-bandwidth speech. EBEN leverages a configurable multiband decomposition of the raw captured signal. This decomposition allows the data time domain dimensions to be reduced and the full band signal to be better controlled. The multiband representation of the captured signal is processed through a U-Net-like model, which combines feature and adversarial losses to generate an enhanced speech signal. We also benefit from this original representation in the proposed configurable discriminator architecture. The configurable EBEN approach can achieve state-of-the-art enhancement results on synthetic data with a lightweight generator that allows real-time processing.
翻译:本文提出了一种可配置的极端带宽扩展网络(Extreme Bandwidth Extension Network, EBEN),这是一种用于改进骨传导麦克风捕获音频质量的生成对抗网络(Generative Adversarial Network, GAN)。研究表明,此类麦克风能显著降低环境噪声,但这一对环境噪声的敏感性不足是以牺牲佩戴者语音信号带宽为代价的。因此,所获取的捕获信号需采用信号增强技术以恢复全带宽语音。EBEN利用对原始捕获信号的可配置多频带分解,该分解既能降低数据时域维度,又能更好地控制全频带信号。捕获信号的多频带表示通过类U-Net模型进行处理,该模型结合特征损失与对抗损失生成增强语音信号。我们还在所提出的可配置判别器架构中受益于这种原始表示。可配置EBEN方法可在轻量级生成器上达到合成数据的先进增强效果,并支持实时处理。