For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confidence for speech classification tasks by training a joint EBM integrating a discriminative and a generative model, thereby enhancing the classifiers calibration and mitigating overconfidence. Experimental evaluations conducted on three speech classification tasks specifically: age, emotion, and language recognition. Our findings highlight the competitive performance of EBMs in calibrating the speech classification models. This research emphasizes the potential of EBMs in speech classification tasks, demonstrating their ability to enhance calibration without sacrificing accuracy.
翻译:在语音分类任务中,深度学习模型通常能够实现较高的准确率,但在校准方面存在不足,表现为分类器表现出过度自信。校准的重要性在于其在保证深度学习系统决策可靠性方面的关键作用。本研究通过训练一个结合判别式模型与生成式模型的联合能量模型,探索了基于能量模型在语音分类任务置信度校准中的有效性,从而提升了分类器的校准能力并缓解了过度自信问题。实验评估在三个具体的语音分类任务上进行:年龄识别、情感识别和语言识别。我们的研究结果突显了能量模型在校准语音分类模型方面的竞争性表现。本研究强调了能量模型在语音分类任务中的潜力,证明了其在保持准确率的同时提升校准能力的作用。