We study speech enhancement using deep learning (DL) for virtual meetings on cellular devices, where transmitted speech has background noise and transmission loss that affects speech quality. Since the Deep Noise Suppression (DNS) Challenge dataset does not contain practical disturbance, we collect a transmitted DNS (t-DNS) dataset using Zoom Meetings over T-Mobile network. We select two baseline models: Demucs and FullSubNet. The Demucs is an end-to-end model that takes time-domain inputs and outputs time-domain denoised speech, and the FullSubNet takes time-frequency-domain inputs and outputs the energy ratio of the target speech in the inputs. The goal of this project is to enhance the speech transmitted over the cellular networks using deep learning models.
翻译:我们采用深度学习(DL)研究面向蜂窝设备虚拟会议的语音增强问题,其中传输的语音存在背景噪声和传输损耗,从而影响语音质量。由于深度降噪挑战(DNS)数据集不含实际干扰,我们通过T-Mobile网络上的Zoom会议采集了传输型DNS(t-DNS)数据集。我们选取了两个基线模型:Demucs和FullSubNet。Demucs为端到端模型,接收时域输入并输出时域降噪语音;FullSubNet则接收时频域输入,并输出输入中目标语音的能量比。本项目的目标是通过深度学习模型增强蜂窝网络上传输的语音。