Computers can understand and then engage with people in an emotionally intelligent way thanks to speech-emotion recognition (SER). However, the performance of SER in cross-corpus and real-world live data feed scenarios can be significantly improved. The inability to adapt an existing model to a new domain is one of the shortcomings of SER methods. To address this challenge, researchers have developed domain adaptation techniques that transfer knowledge learnt by a model across the domain. Although existing domain adaptation techniques have improved performances across domains, they can be improved to adapt to a real-world live data feed situation where a model can self-tune while deployed. In this paper, we present a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained model to a real-world live data feed setting while interacting with the environment and collecting continual feedback. RL-DA is evaluated on SER tasks, including cross-corpus and cross-language domain adaption schema. Evaluation results show that in a live data feed setting, RL-DA outperforms a baseline strategy by 11% and 14% in cross-corpus and cross-language scenarios, respectively.
翻译:计算机通过语音情感识别(SER)能够理解并以情感智能的方式与人互动。然而,SER在跨语料库和真实世界实时数据流场景中的性能仍有显著提升空间。现有SER方法的局限之一在于难以将已训练模型适应到新领域。为应对这一挑战,研究者开发了域适应技术,以迁移模型在不同领域间习得的知识。尽管现有域适应技术已提升了跨域性能,但在适应真实世界实时数据流场景——即模型能在部署过程中进行自调优——方面仍有改进余地。本文提出一种基于深度强化学习的策略(RL-DA),使预训练模型能够在与环境交互并持续收集反馈的同时,适应真实世界的实时数据流场景。RL-DA在SER任务上进行了评估,包括跨语料库和跨语言的域适应方案。评估结果表明,在实时数据流场景中,RL-DA在跨语料库和跨语言场景下分别比基线策略性能提升11%和14%。