This paper presents an overview of rule-based system for automatic accentuation and phonemic transcription of Russian texts for speech connected tasks, such as Automatic Speech Recognition (ASR). Two parts of the developed system, accentuation and transcription, use different approaches to achieve correct phonemic representations of input phrases. Accentuation is based on "Grammatical dictionary of the Russian language" of A.A. Zaliznyak and wiktionary corpus. To distinguish homographs, the accentuation system also utilises morphological information of the sentences based on Recurrent Neural Networks (RNN). Transcription algorithms apply the rules presented in the monograph of B.M. Lobanov and L.I. Tsirulnik "Computer Synthesis and Voice Cloning". The rules described in the present paper are implemented in an open-source module, which can be of use to any scientific study connected to ASR or Speech To Text (STT) tasks. Automatically marked up text annotations of the Russian Voxforge database were used as training data for an acoustic model in CMU Sphinx. The resulting acoustic model was evaluated on cross-validation, mean Word Accuracy being 71.2%. The developed toolkit is written in the Python language and is accessible on GitHub for any researcher interested.
翻译:本文综述了一种基于规则的俄语文本自动重音标注与音位转写系统,该系统面向语音相关任务,如自动语音识别。所开发系统的两个部分——重音标注与音位转写——采用不同方法以实现输入短语的正确音位表征。重音标注基于 A.A. Zaliznyak 的《俄语语法词典》及维基词典语料库。为区分同形异义词,重音标注系统还利用基于循环神经网络的句子形态学信息。音位转写算法采用 B.M. Lobanov 与 L.I. Tsirulnik 的专著《计算机语音合成与语音克隆》中提出的规则。本文所述规则已在一个开源模块中实现,可适用于任何与自动语音识别或语音转文本任务相关的科学研究。俄语 Voxforge 数据库的自动标注文本注释被用作 CMU Sphinx 中声学模型的训练数据。所得声学模型通过交叉验证进行评估,平均词准确率为 71.2%。所开发工具包使用 Python 语言编写,已在 GitHub 上开源,供相关研究人员使用。