EMID: An Emotional Aligned Dataset in Audio-Visual Modality

from arxiv, Accepted by ACM MM workshop McGE 2023: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

In this paper, we propose Emotionally paired Music and Image Dataset (EMID), a novel dataset designed for the emotional matching of music and images, to facilitate auditory-visual cross-modal tasks such as generation and retrieval. Unlike existing approaches that primarily focus on semantic correlations or roughly divided emotional relations, EMID emphasizes the significance of emotional consistency between music and images using an advanced 13-dimension emotional model. By incorporating emotional alignment into the dataset, it aims to establish pairs that closely align with human perceptual understanding, thereby raising the performance of auditory-visual cross-modal tasks. We also design a supplemental module named EMI-Adapter to optimize existing cross-modal alignment methods. To validate the effectiveness of the EMID, we conduct a psychological experiment, which has demonstrated that considering the emotional relationship between the two modalities effectively improves the accuracy of matching in abstract perspective. This research lays the foundation for future cross-modal research in domains such as psychotherapy and contributes to advancing the understanding and utilization of emotions in cross-modal alignment. The EMID dataset is available at https://github.com/ecnu-aigc/EMID.

翻译：本文提出了一种新颖的数据集——情感配对音乐与图像数据集（EMID），专为音乐与图像的情感匹配而设计，旨在促进生成与检索等视听跨模态任务。与现有主要关注语义关联或粗略划分情感关系的方法不同，EMID采用先进的13维情感模型，强调音乐与图像之间情感一致性的重要性。通过将情感对齐融入数据集构建，其目标是建立更贴合人类感知理解的数据对，从而提升视听跨模态任务的性能。我们还设计了一个名为EMI-Adapter的补充模块，以优化现有的跨模态对齐方法。为验证EMID的有效性，我们进行了一项心理学实验，结果表明考虑两种模态间的情感关系能有效提升抽象视角下匹配的准确性。本研究为未来在心理治疗等领域的跨模态研究奠定了基础，并有助于推进对跨模态对齐中情感的理解与利用。EMID数据集发布于https://github.com/ecnu-aigc/EMID。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日