Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verbal cues, alongside the conventional speech and phoneme representations. Additionally, we introduce ''Goodness of phonemic-duration'' metric to effectively model duration distribution within the framework. Our results validate the effectiveness of the proposed IntraVerbalPA framework and its individual components, yielding performance that either matches or outperforms existing research works.
翻译:发音评估系统的研究主要聚焦于利用非母语语音的音系和音韵特征,常忽略隐藏在非语言线索中的丰富信息层。本研究提出了一种新型发音评估框架IntraVerbalPA。该框架创新性地融合了细粒度帧级与抽象话语级的非语言线索,以及传统的语音和音素表征。此外,我们引入“音素时长优度”度量指标,以有效建模框架内的时长分布。实验结果验证了所提出的IntraVerbalPA框架及其各组件的有效性,其性能达到或超越现有研究工作。