We propose SE-Bridge, a novel method for speech enhancement (SE). After recently applying the diffusion models to speech enhancement, we can achieve speech enhancement by solving a stochastic differential equation (SDE). Each SDE corresponds to a probabilistic flow ordinary differential equation (PF-ODE), and the trajectory of the PF-ODE solution consists of the speech states at different moments. Our approach is based on consistency model that ensure any speech states on the same PF-ODE trajectory, correspond to the same initial state. By integrating the Brownian Bridge process, the model is able to generate high-intelligibility speech samples without adversarial training. This is the first attempt that applies the consistency models to SE task, achieving state-of-the-art results in several metrics while saving 15 x the time required for sampling compared to the diffusion-based baseline. Our experiments on multiple datasets demonstrate the effectiveness of SE-Bridge in SE. Furthermore, we show through extensive experiments on downstream tasks, including Automatic Speech Recognition (ASR) and Speaker Verification (SV), that SE-Bridge can effectively support multiple downstream tasks.
翻译:我们提出SE-Bridge,一种新颖的语音增强(SE)方法。在近期将扩散模型应用于语音增强后,我们可以通过求解随机微分方程(SDE)实现语音增强。每个SDE对应一个概率流常微分方程(PF-ODE),其解的轨迹由不同时刻的语音状态构成。我们的方法基于一致性模型,确保同一PF-ODE轨迹上的任意语音状态均对应于相同的初始状态。通过整合布朗桥过程,该模型无需对抗训练即可生成高可懂度的语音样本。这是首次将一致性模型应用于语音增强任务,在多项指标上达到最优性能,同时相比基于扩散的基线方法将采样所需时间节省15倍。我们在多个数据集上的实验证明了SE-Bridge在语音增强中的有效性。此外,通过包括自动语音识别(ASR)和说话人确认(SV)在内的下游任务广泛实验,我们证实SE-Bridge能够有效支持多个下游任务。