Previous speech restoration (SR) primarily focuses on single-task speech restoration (SSR), which cannot address general speech restoration problems. Training specific SSR models for different distortions is time-consuming and lacks generality. In addition, most studies ignore the problem of model generalization across unseen domains. To overcome those limitations, we propose DisSR, a Disentangling Speech Representation based general speech restoration model with two properties: 1) Degradation-prior guidance, which extracts speaker-invariant degradation representation to guide the diffusion-based speech restoration model. 2) Domain adaptation, where we design cross-domain alignment training to enhance the model's adaptability and generalization on cross-domain data, respectively. Experimental results demonstrate that our method can produce high-quality restored speech under various distortion conditions. Audio samples can be found at https://itspsp.github.io/DisSR.
翻译:现有语音修复研究主要集中于单任务语音修复,难以解决通用语音修复问题。为不同失真类型训练专用模型耗时且缺乏泛化性。此外,多数研究忽略了模型在未见领域间的泛化问题。为克服这些局限,我们提出DisSR——一种基于解耦语音表征的通用语音修复模型,其具备两大特性:1) 退化先验引导:提取说话人无关的退化表征来指导基于扩散的语音修复模型;2) 领域自适应:通过设计跨领域对齐训练分别增强模型对跨领域数据的适应性与泛化能力。实验结果表明,本方法能在多种失真条件下生成高质量的修复语音。音频样本可见于 https://itspsp.github.io/DisSR。