For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synthesis architecture based on Conformer and Glow. A GAN based neural vocoder that combines recent state-of-the-art approaches converts the spectrogram to the final wave. We carefully designed the data processing, training, and inference procedures for the challenge data. Our system identifier is G. Open source code and demo are available.
翻译:针对2023年Blizzard挑战赛,我们在2021年参赛系统的基础上进行了改进。本方案采用基于规则的字素-音素处理系统,包含针对法语同形异义词的规则消歧模块。该系统随后利用基于Conformer和Glow的快速高效非自回归合成架构,将音素转换为频谱图作为中间表征。结合最新前沿方法的基于GAN的神经声码器将频谱图转换为最终波形。我们为挑战赛数据精心设计了数据处理、训练和推理流程。本系统标识符为G。相关开源代码及演示已公开。