Speech dysfluency modeling is the bottleneck for both speech therapy and language learning. However, there is no AI solution to systematically tackle this problem. We first propose to define the concept of dysfluent speech and dysfluent speech modeling. We then present Hierarchical Unconstrained Dysfluency Modeling (H-UDM) approach that addresses both dysfluency transcription and detection to eliminate the need for extensive manual annotation. Furthermore, we introduce a simulated dysfluent dataset called VCTK++ to enhance the capabilities of H-UDM in phonetic transcription. Our experimental results demonstrate the effectiveness and robustness of our proposed methods in both transcription and detection tasks.
翻译:语音不流畅建模是语言治疗和语言学习的瓶颈问题。然而,目前尚无AI解决方案能系统性地应对这一挑战。我们首先提出定义不流畅语音及不流畅语音建模的概念,随后提出层次化无约束不流畅建模方法,该方法同时解决不流畅转录与检测问题,从而消除对大量人工标注的需求。此外,我们引入名为VCTK++的模拟不流畅数据集,以增强H-UDM在音素转录方面的能力。实验结果表明,我们提出的方法在转录和检测任务中均展现出有效性和鲁棒性。