With rapid technological growth, automatic pronunciation assessment has transitioned toward systems that evaluate pronunciation in various aspects, such as fluency and stress. However, despite the highly imbalanced score labels within each aspect, existing studies have rarely tackled the data imbalance problem. In this paper, we suggest a novel loss function, score-balanced loss, to address the problem caused by uneven data, such as bias toward the majority scores. As a re-weighting approach, we assign higher costs when the predicted score is of the minority class, thus, guiding the model to gain positive feedback for sparse score prediction. Specifically, we design two weighting factors by leveraging the concept of an effective number of samples and using the ranks of scores. We evaluate our method on the speechocean762 dataset, which has noticeably imbalanced scores for several aspects. Improved results particularly on such uneven aspects prove the effectiveness of our method.
翻译:随着技术的快速发展,自动发音评估已转向从多个维度(如流利度和重音)评价发音的系统。然而,尽管每个维度内的评分标签高度不平衡,现有研究却很少处理数据不平衡问题。本文提出了一种新颖的损失函数——得分平衡损失,以解决数据分布不均(例如对多数得分的偏向)所带来的问题。作为一种重加权方法,当预测得分属于少数类时,我们赋予更高的代价,从而引导模型在稀疏得分预测中获取正向反馈。具体而言,我们利用有效样本数的概念并结合得分的排序,设计了两种加权因子。我们在speechocean762数据集上评估了所提出的方法,该数据集在多个维度上存在明显的不平衡得分。尤其在不平衡维度上的改进结果证明了我们方法的有效性。