Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrastive phonemic ordinal regularizer (ConPCO) tailored for regression-based APA models to generate more phoneme-discriminative features while considering the ordinal relationships among the regression targets. The proposed ConPCO first aligns the phoneme representations of an APA model and textual embeddings of phonetic transcriptions via contrastive learning. Afterward, the phoneme characteristics are retained by regulating the distances between inter- and intra-phoneme categories in the feature space while allowing for the ordinal relationships among the output targets. We further design and develop a hierarchical APA model to evaluate the effectiveness of our method. Extensive experiments conducted on the speechocean762 benchmark dataset suggest the feasibility and efficacy of our approach in relation to some cutting-edge baselines.
翻译:自动发音评估(APA)旨在评估第二语言(L2)学习者在目标语言中的发音熟练度。现有方法通常采用回归模型进行熟练度分数预测,这些模型被训练用于估计目标值,而未在特征空间中明确考虑音素感知。本文提出一种专为基于回归的APA模型设计的对比音素序数正则化器(ConPCO),以在考虑回归目标间序数关系的同时,生成更具音素区分性的特征。所提出的ConPCO首先通过对比学习将APA模型的音素表示与音标文本嵌入对齐。随后,通过在特征空间中调节音素类别间和类别内的距离,同时允许输出目标间的序数关系,来保持音素特性。我们进一步设计并开发了一个分层APA模型来评估方法的有效性。在speechocean762基准数据集上进行的大量实验表明,相较于若干前沿基线方法,我们的方案具有可行性和有效性。