ERIT is a novel multimodal dataset designed to facilitate research in a lightweight multimodal fusion. It contains text and image data collected from videos of elderly individuals reacting to various situations, as well as seven emotion labels for each data sample. Because of the use of labeled images of elderly users reacting emotionally, it is also facilitating research on emotion recognition in an underrepresented age group in machine learning visual emotion recognition. The dataset is validated through comprehensive experiments indicating its importance in neural multimodal fusion research.
翻译:ERIT是一个新颖的多模态数据集,旨在促进轻量化多模态融合的研究。该数据集包含从老年人对各种情境做出反应的视频中收集的文本和图像数据,以及每个数据样本对应的七种情感标签。由于采用了老年人用户情感反应标注图像,该数据集也促进了机器学习视觉情感识别中代表性不足年龄组的情感识别研究。通过全面的实验验证,该数据集在神经多模态融合研究中的重要性得到了证实。