Multimodal emotion recognition plays a crucial role in enhancing user experience in human-computer interaction. Over the past few decades, researchers have proposed a series of algorithms and achieved impressive progress. Although each method shows its superior performance, different methods lack a fair comparison due to inconsistencies in feature extractors, evaluation manners, and experimental settings. These inconsistencies severely hinder the development of this field. Therefore, we build MERBench, a unified evaluation benchmark for multimodal emotion recognition. We aim to reveal the contribution of some important techniques employed in previous works, such as feature selection, multimodal fusion, robustness analysis, fine-tuning, pre-training, etc. We hope this benchmark can provide clear and comprehensive guidance for follow-up researchers. Based on the evaluation results of MERBench, we further point out some promising research directions. Additionally, we introduce a new emotion dataset MER2023, focusing on the Chinese language environment. This dataset can serve as a benchmark dataset for research on multi-label learning, noise robustness, and semi-supervised learning. We encourage the follow-up researchers to evaluate their algorithms under the same experimental setup as MERBench for fair comparisons. Our code is available at: https://github.com/zeroQiaoba/MERTools.
翻译:多模态情感识别在提升人机交互用户体验中起着关键作用。过去数十年间,研究人员提出了一系列算法并取得了令人瞩目的进展。尽管各方法展现了优越性能,但由于特征提取器、评估方式及实验设置的不一致性,不同方法间缺乏公平比较。这些不一致性严重阻碍了该领域的发展。为此,我们构建了MERBench——一个面向多模态情感识别的统一评估基准。旨在揭示以往工作中关键技术的贡献,如特征选择、多模态融合、鲁棒性分析、微调、预训练等。期望该基准能为后续研究者提供清晰全面的指导。基于MERBench的评估结果,我们进一步指出了若干有前景的研究方向。此外,我们引入了一个面向中文语言环境的新型情感数据集MER2023。该数据集可作为多标签学习、噪声鲁棒性与半监督学习研究的基准数据集。我们鼓励后续研究者将算法置于与MERBench相同的实验环境下进行公平比较。代码开源地址:https://github.com/zeroQiaoba/MERTools。