Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrastive phonemic ordinal regularizer (ConPCO) tailored for regression-based APA models to generate more phoneme-discriminative features while considering the ordinal relationships among the regression targets. The proposed ConPCO first aligns the phoneme representations of an APA model and textual embeddings of phonetic transcriptions via contrastive learning. Afterward, the phoneme characteristics are retained by regulating the distances between inter- and intra-phoneme categories in the feature space while allowing for the ordinal relationships among the output targets. We further design and develop a hierarchical APA model to evaluate the effectiveness of our method. Extensive experiments conducted on the speechocean762 benchmark dataset suggest the feasibility and efficacy of our approach in relation to some cutting-edge baselines.
翻译:自动发音评估(APA)旨在评估第二语言(L2)学习者在目标语言中的发音熟练度。现有研究通常采用回归模型进行熟练度评分预测,但这类模型在训练时仅估计目标值,未显式考虑特征空间中的音素感知能力。本文提出一种面向回归型APA模型的对比音素有序正则化器(ConPCO),可在兼顾回归目标有序关系的同时生成更具音素判别性的特征。所提出的ConPCO首先通过对比学习将APA模型的音素表征与音标文本嵌入对齐,随后通过调控特征空间中音素类别间与类别内的距离,在保持输出目标有序关系的同时保留音素特征。我们进一步设计并开发了分层APA模型以验证该方法的有效性。在speechocean762基准数据集上的大量实验表明,该方法相较于若干前沿基线具有可行性与有效性。