Utterances by L2 speakers can be unintelligible due to mispronunciation and improper prosody. In computer-aided language learning systems, textual feedback is often provided using a speech recognition engine. However, an ideal form of feedback for L2 speakers should be so fine-grained that it enables them to detect and diagnose unintelligible parts of L2 speakers' utterances. Inspired by language teachers who correct students' pronunciation through a voice-to-voice process, this pilot study utilizes a unique semi-parallel dataset composed of non-native speakers' (L2) reading aloud, shadowing of native speakers (L1) and their script-shadowing utterances. We explore the technical possibility of replicating the process of an L1 speaker's shadowing L2 speech using Voice Conversion techniques, to create a virtual shadower system. Experimental results demonstrate the feasibility of the VC system in simulating L1's shadowing behavior. The output of the virtual shadower system shows a reasonable similarity to the real L1 shadowing utterances in both linguistic and acoustic aspects.
翻译:二语学习者的发音可能因语音误读和韵律不当而难以理解。在计算机辅助语言学习系统中,通常通过语音识别引擎提供文本反馈。然而,理想的二语反馈应具备足够细粒度,使学习者能够检测并诊断其话语中的不可理解部分。受语言教师通过语音到语音过程纠正学生发音的启发,本研究利用由非母语者朗读、母语者跟读及其文本跟读话语构成的独特半平行数据集,探索运用语音转换技术复现母语者跟读二语语音过程的技术可行性,以构建虚拟跟读系统。实验结果表明,该语音转换系统在模拟母语者跟读行为方面具有可行性。虚拟跟读系统的输出在语言学和声学特征上均与真实母语跟读话语呈现合理相似性。