Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TTS) system with limited resources using SSL features and generate a large synthetic corpus for pre-training. Experimental results demonstrate that our proposed approach effectively reduces the demand for speech data by 90% with only slight performance degradation. To the best of our knowledge, this is the first work aiming to enhance low-resource self-supervised learning in speech processing.
翻译:自监督学习(SSL)技术在多种语音处理任务中取得了显著成果。然而,如何减少预训练对海量语音数据的依赖仍然是一个重大挑战。本文提出通过利用合成语音来增强低资源预训练语料库以应对这一挑战。我们使用SSL特征构建了一个资源受限的高质量文本转语音(TTS)系统,并生成了大规模合成语料库用于预训练。实验结果表明,我们提出的方法在仅造成轻微性能下降的情况下,有效将语音数据需求降低了90%。据我们所知,这是首个致力于增强语音处理中低资源自监督学习的研究工作。