Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TTS) system with limited resources using SSL features and generate a large synthetic corpus for pre-training. Experimental results demonstrate that our proposed approach effectively reduces the demand for speech data by 90\% with only slight performance degradation. To the best of our knowledge, this is the first work aiming to enhance low-resource self-supervised learning in speech processing.
翻译:自监督学习(SSL)技术在多种语音处理任务中取得了显著成果。然而,如何减少对大量预训练语音数据的依赖仍是一个重要挑战。本文提出利用合成语音来扩充低资源预训练语料库以应对这一挑战。我们利用SSL特征在有限资源下构建了高质量的文本转语音(TTS)系统,并生成用于预训练的大规模合成语料库。实验结果表明,所提方法在仅造成轻微性能损失的情况下,有效降低了90%的语音数据需求。据我们所知,这是首个旨在增强语音处理领域低资源自监督学习的工作。