In this paper we propose a word-wise intonation model for Russian language and show how it can be generalized for other languages. The proposed model is suitable for automatic data markup and its extended application to text-to-speech systems. It can also be implemented for an intonation contour modeling by using rule-based algorithms or by predicting contours with language models. The key idea is a partial elimination of the variability connected with different placements of a stressed syllable in a word. It is achieved with simultaneous applying of pitch simplification with a dynamic time warping clustering. The proposed model could be used as a tool for intonation research or as a backbone for prosody description in text-to-speech systems. As the advantage of the model, we show its relations with the existing intonation systems as well as the possibility of using language models for prosody prediction. Finally, we demonstrate some practical evidence of the system robustness to parameter variations.
翻译:本文提出了一种适用于俄语的词级语调模型,并展示了如何将其推广至其他语言。该模型适用于自动数据标注,并可扩展应用于文本转语音系统。通过基于规则的算法或利用语言模型预测语调轮廓,该模型还可实现语调轮廓建模。其核心思想在于部分消除由单词重读音节不同位置引起的变异性,这是通过结合动态时间规整聚类的音高简化方法实现的。该模型可作为语调研究的工具,或作为文本转语音系统中韵律描述的框架。我们通过展示该模型与现有语调系统的关联性,以及利用语言模型进行韵律预测的可能性,体现了其优势。最后,我们通过实验证据证明了该系统对参数变化的鲁棒性。