Existing Image-Text Sentiment Analysis (ITSA) methods may suffer from inconsistent intra-modal and inter-modal sentiment relationships. Therefore, we develop a method that balances before fusing to solve the issue of vision-language imbalance intra-modal and inter-modal sentiment relationships; that is, a Semi-Push-Pull Supervised Contrastive Learning (SPP-SCL) method is proposed. Specifically, the method is implemented using a novel two-step strategy, namely first using the proposed intra-modal supervised contrastive learning to pull the relationships between the intra-modal and then performing a well-designed conditional execution statement. If the statement result is false, our method will perform the second step, which is inter-modal supervised contrastive learning to push away the relationships between inter-modal. The two-step strategy will balance the intra-modal and inter-modal relationships to achieve the purpose of relationship consistency and finally perform cross-modal feature fusion for sentiment analysis and detection. Experimental studies on three public image-text sentiment and sarcasm detection datasets demonstrate that SPP-SCL significantly outperforms state-of-the-art methods by a large margin and is more discriminative in sentiment.
翻译:现有的图文情感分析方法可能面临模态内与模态间情感关系不一致的问题。为此,我们提出一种在融合前进行平衡的方法,以解决视觉-语言模态内及模态间情感关系不均衡的问题;即提出一种半推拉式监督对比学习方法。具体而言,该方法通过一种新颖的两步策略实现:首先使用所提出的模态内监督对比学习拉近模态内关系,随后执行一个精心设计的条件判断语句。若判断结果为假,则执行第二步,即通过模态间监督对比学习推远模态间关系。该两步策略将平衡模态内与模态间关系,以实现关系一致性的目标,最终进行跨模态特征融合以完成情感分析与检测。在三个公开的图文情感及讽刺检测数据集上的实验研究表明,SPP-SCL 显著优于现有最优方法,且在情感判别上更具区分力。