Recent studies have explored the use of pre-trained embeddings for speech emotion recognition (SER), achieving comparable performance to conventional methods that rely on low-level knowledge-inspired acoustic features. These embeddings are often generated from models trained on large-scale speech datasets using self-supervised or weakly-supervised learning objectives. Despite the significant advancements made in SER through the use of pre-trained embeddings, there is a limited understanding of the trustworthiness of these methods, including privacy breaches, unfair performance, vulnerability to adversarial attacks, and computational cost, all of which may hinder the real-world deployment of these systems. In response, we introduce TrustSER, a general framework designed to evaluate the trustworthiness of SER systems using deep learning methods, with a focus on privacy, safety, fairness, and sustainability, offering unique insights into future research in the field of SER. Our code is publicly available under: https://github.com/usc-sail/trust-ser.
翻译:近期研究探索了利用预训练嵌入进行语音情感识别(SER),其性能可与依赖低级知识启发性声学特征的传统方法相媲美。这些嵌入通常通过自监督或弱监督学习目标在大规模语音数据集上训练的模型生成。尽管预训练嵌入在SER领域取得了显著进展,但对此类方法可信度的理解仍十分有限,包括隐私泄露、不公平表现、对对抗性攻击的脆弱性以及计算成本等问题——这些均可能阻碍系统的实际部署。为此,我们提出TrustSER——一个通用框架,旨在评估基于深度学习的SER系统在隐私、安全、公平性和可持续性方面的可信度,为该领域未来研究提供独特见解。我们的代码已在https://github.com/usc-sail/trust-ser 公开。