Multi-modal aspect-based sentiment analysis (MABSA) has recently attracted increasing attention. The span-based extraction methods, such as FSUIE, demonstrate strong performance in sentiment analysis due to their joint modeling of input sequences and target labels. However, previous methods still have certain limitations: (i) They ignore the difference in the focus of visual information between different analysis targets (aspect or sentiment). (ii) Combining features from uni-modal encoders directly may not be sufficient to eliminate the modal gap and can cause difficulties in capturing the image-text pairwise relevance. (iii) Existing span-based methods for MABSA ignore the pairwise relevance of target span boundaries. To tackle these limitations, we propose a novel framework called DQPSA for multi-modal sentiment analysis. Specifically, our model contains a Prompt as Dual Query (PDQ) module that uses the prompt as both a visual query and a language query to extract prompt-aware visual information and strengthen the pairwise relevance between visual information and the analysis target. Additionally, we introduce an Energy-based Pairwise Expert (EPE) module that models the boundaries pairing of the analysis target from the perspective of an Energy-based Model. This expert predicts aspect or sentiment span based on pairwise stability. Experiments on three widely used benchmarks demonstrate that DQPSA outperforms previous approaches and achieves a new state-of-the-art performance.
翻译:多模态方面级情感分析(MABSA)近期受到广泛关注。基于跨度提取的方法(如FSUIE)因其对输入序列与目标标签的联合建模,在情感分析中展现出强大性能。然而,现有方法仍存在若干局限:(i)忽略不同分析目标(方面或情感)对视觉信息关注度的差异;(ii)直接融合单模态编码器特征难以消除模态鸿沟,导致图像-文本关联性建模困难;(iii)现有基于跨度的方法未考虑目标跨度边界的成对关联性。为克服上述局限,我们提出名为DQPSA的新型多模态情感分析框架。具体而言,该模型包含提示双查询(PDQ)模块,通过将提示同时作为视觉查询和语言查询,提取提示感知的视觉信息并增强视觉信息与分析目标之间的成对关联性。此外,我们引入基于能量的成对专家(EPE)模块,从能量模型视角建模分析目标的边界配对。该专家基于成对稳定性预测情感或方面跨度。在三个广泛使用的基准数据集上的实验表明,DQPSA优于现有方法,取得了新的最佳性能。