Multi-modal aspect-based sentiment analysis (MABSA) has recently attracted increasing attention. The span-based extraction methods, such as FSUIE, demonstrate strong performance in sentiment analysis due to their joint modeling of input sequences and target labels. However, previous methods still have certain limitations: (i) They ignore the difference in the focus of visual information between different analysis targets (aspect or sentiment). (ii) Combining features from uni-modal encoders directly may not be sufficient to eliminate the modal gap and can cause difficulties in capturing the image-text pairwise relevance. (iii) Existing span-based methods for MABSA ignore the pairwise relevance of target span boundaries. To tackle these limitations, we propose a novel framework called DQPSA for multi-modal sentiment analysis. Specifically, our model contains a Prompt as Dual Query (PDQ) module that uses the prompt as both a visual query and a language query to extract prompt-aware visual information and strengthen the pairwise relevance between visual information and the analysis target. Additionally, we introduce an Energy-based Pairwise Expert (EPE) module that models the boundaries pairing of the analysis target from the perspective of an Energy-based Model. This expert predicts aspect or sentiment span based on pairwise stability. Experiments on three widely used benchmarks demonstrate that DQPSA outperforms previous approaches and achieves a new state-of-the-art performance.
翻译:多模态方面级情感分析(MABSA)近年来受到越来越多的关注。基于跨度的提取方法(如FSUIE)通过对输入序列和目标标签进行联合建模,在情感分析中展现出强大的性能。然而,现有方法仍存在一定局限性:(i)忽略不同分析目标(方面或情感)在视觉信息关注点上的差异;(ii)直接融合单模态编码器的特征可能不足以消除模态差异,并导致难以捕捉图像-文本对的关联性;(iii)现有的基于跨度的方法在MABSA中忽略了目标跨度边界的配对关联性。为解决上述问题,我们提出了一种名为DQPSA的新框架用于多模态情感分析。具体而言,模型包含提示作为双查询(PDQ)模块,该模块将提示同时作为视觉查询和语言查询,以提取提示感知的视觉信息并增强视觉信息与分析目标之间的配对关联性。此外,我们引入基于能量的配对专家(EPE)模块,该模块从基于能量模型的角度对分析目标的边界配对进行建模。该专家基于配对稳定性预测方面或情感跨度。在三个广泛使用的基准测试上的实验表明,DQPSA优于现有方法,并取得了新的最佳性能。