Infographics are widely used in social media to convey complex information, yet how they influence users' affects remains underexplored due to the scarcity of relevant datasets. To address this gap, we introduce a 3.5k-sample affect-annotated InfoAffect dataset, which combines textual content with real-world infographics. We first collected the raw data from six fields and aligned it via preprocessing, the accompanied-text-priority method, and three strategies to guarantee quality and compliance. After that, we constructed an Affect Table to constrain annotation. We used five state-of-the-art multimodal large language models (MLLMs) to analyze both modalities, and their outputs were fused with the Reciprocal Rank Fusion (RRF) algorithm to yield robust affects and confidences. We conducted a user study with two experiments to validate usability and assess the InfoAffect dataset using the Composite Affect Consistency Index (CACI), achieving an overall score of 0.608, which indicates high accuracy. The InfoAffect dataset is available in a public repository at https://github.com/bulichuchu/InfoAffect-dataset.
翻译:信息图在社交媒体中被广泛用于传达复杂信息,然而由于相关数据集的稀缺,它们如何影响用户情感仍未得到充分探索。为填补这一空白,我们引入了包含3.5k个样本的情感标注数据集InfoAffect,该数据集结合了文本内容与现实世界中的信息图。我们首先从六个领域收集原始数据,并通过预处理、伴随文本优先方法及三种质量与合规性保障策略进行数据对齐。随后,我们构建了情感约束表以规范标注流程。我们使用五种前沿的多模态大语言模型(MLLMs)对图文双模态进行分析,并采用逆序融合(RRF)算法融合其输出,从而生成稳健的情感标签及置信度。我们通过两项实验开展用户研究,以验证数据集的可用性,并采用复合情感一致性指数(CACI)评估InfoAffect数据集,其综合得分达0.608,表明数据集具有较高的准确性。InfoAffect数据集已公开于https://github.com/bulichuchu/InfoAffect-dataset。