Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.
翻译:语音情感识别(SER)在人机交互中发挥着关键作用。物联网(IoT)中边缘设备的出现,由于内存和计算资源的限制,给构建复杂的深度学习模型带来了挑战。此外,情感语音数据通常包含私人信息,引发了在部署SER模型时隐私泄露的担忧。为应对这些挑战,我们提出了一个数据蒸馏框架,通过使用合成的、更小规模的蒸馏数据集,促进物联网应用中SER模型的高效开发。我们的实验表明,蒸馏数据集能够有效地用于训练具有固定初始化的SER模型,其性能可与使用原始完整情感语音数据集开发的模型相媲美。