Social media users articulate their opinions on a broad spectrum of subjects and share their experiences through posts comprising multiple modes of expression, leading to a notable surge in such multimodal content on social media platforms. Nonetheless, accurately forecasting the popularity of these posts presents a considerable challenge. Prevailing methodologies primarily center on the content itself, thereby overlooking the wealth of information encapsulated within alternative modalities such as visual demographics, sentiments conveyed through hashtags and adequately modeling the intricate relationships among hashtags, texts, and accompanying images. This oversight limits the ability to capture emotional connection and audience relevance, significantly influencing post popularity. To address these limitations, we propose a seNtiment and hAshtag-aware attentive deep neuRal netwoRk for multimodAl posT pOpularity pRediction, herein referred to as NARRATOR that extracts visual demographics from faces appearing in images and discerns sentiment from hashtag usage, providing a more comprehensive understanding of the factors influencing post popularity Moreover, we introduce a hashtag-guided attention mechanism that leverages hashtags as navigational cues, guiding the models focus toward the most pertinent features of textual and visual modalities, thus aligning with target audience interests and broader social media context. Experimental results demonstrate that NARRATOR outperforms existing methods by a significant margin on two real-world datasets. Furthermore, ablation studies underscore the efficacy of integrating visual demographics, sentiment analysis of hashtags, and hashtag-guided attention mechanisms in enhancing the performance of post popularity prediction, thereby facilitating increased audience relevance, emotional engagement, and aesthetic appeal.
翻译:社交媒体用户通过包含多种表达方式的帖子,就广泛主题阐述观点并分享经历,导致此类多模态内容在社交平台上的显著增长。然而,准确预测这些帖子的流行度仍面临重大挑战。现有方法主要聚焦于内容本身,从而忽视了其他模态中蕴含的丰富信息,例如视觉人口统计学特征、通过标签传达的情感,以及对标签、文本和配图之间复杂关系的充分建模。这种局限削弱了捕捉情感共鸣与受众关联性的能力,而这两者对帖子流行度具有重要影响。为克服这些不足,我们提出一种基于情感与标签感知的注意力深度神经网络用于多模态帖子流行度预测(简称NARRATOR)。该模型从图像中出现的人脸提取视觉人口统计学特征,并通过标签使用模式识别情感,从而更全面地理解影响帖子流行度的因素。此外,我们引入一种标签引导的注意力机制,将标签作为导航线索,引导模型聚焦于文本和视觉模态中最相关的特征,使其更契合目标受众兴趣与更广泛的社交媒体语境。实验结果表明,在两个真实数据集上,NARRATOR以显著优势超越现有方法。进一步的消融研究证实,整合视觉人口统计学特征、标签情感分析以及标签引导注意力机制能有效提升帖子流行度预测性能,从而增强受众关联性、情感共鸣与审美吸引力。