Phrase break prediction is a crucial task for improving the prosody naturalness of a text-to-speech (TTS) system. However, most proposed phrase break prediction models are monolingual, trained exclusively on a large amount of labeled data. In this paper, we address this issue for low-resource languages with limited labeled data using cross-lingual transfer. We investigate the effectiveness of zero-shot and few-shot cross-lingual transfer for phrase break prediction using a pre-trained multilingual language model. We use manually collected datasets in four Indo-European languages: one high-resource language and three with limited resources. Our findings demonstrate that cross-lingual transfer learning can be a particularly effective approach, especially in the few-shot setting, for improving performance in low-resource languages. This suggests that cross-lingual transfer can be inexpensive and effective for developing TTS front-end in resource-poor languages.
翻译:短语切分预测是提升文本到语音(TTS)系统韵律自然度的关键任务。然而,现有大多数短语切分预测模型均为单语言模型,仅依赖大量标注数据进行训练。本文针对标注数据有限的低资源语言,采用跨语言迁移方法解决这一问题。我们研究了基于预训练多语言语言模型的零样本与小样本跨语言迁移在短语切分预测中的有效性,并使用了四种印欧语系语言的人工采集数据集(包含一种高资源语言与三种低资源语言)。实验结果表明,跨语言迁移学习在提升低资源语言性能方面尤为有效,尤其在少样本场景下表现突出。这证明跨语言迁移可作为开发资源匮乏语言TTS前端的经济高效方案。