We have witnessed the rapid proliferation of multimodal data on numerous social media platforms. Conventional studies typically require massive labeled data to train models for Multimodal Aspect-Based Sentiment Analysis (MABSA). However, collecting and annotating fine-grained multimodal data for MABSA is tough. To alleviate the above issue, we perform three MABSA-related tasks with quite a small number of labeled multimodal samples. We first build diverse and comprehensive multimodal few-shot datasets according to the data distribution. To capture the specific prompt for each aspect term in a few-shot scenario, we propose a novel Generative Multimodal Prompt (GMP) model for MABSA, which includes the Multimodal Encoder module and the N-Stream Decoders module. We further introduce a subtask to predict the number of aspect terms in each instance to construct the multimodal prompt. Extensive experiments on two datasets demonstrate that our approach outperforms strong baselines on two MABSA-related tasks in the few-shot setting.
翻译:我们见证了众多社交媒体平台上多模态数据的快速激增。传统研究通常需要大量标注数据来训练多模态方面级情感分析(MABSA)模型。然而,为MABSA收集并标注细粒度的多模态数据难度较大。为缓解上述问题,我们利用极少量标注的多模态样本执行三项MABSA相关任务。我们首先根据数据分布构建多样化且全面的多模态少样本数据集。为在少样本场景中捕捉每个方面项的特定提示,我们提出了一种新颖的生成式多模态提示(GMP)模型用于MABSA,该模型包括多模态编码器模块和N流解码器模块。我们进一步引入一个子任务来预测每个实例中的方面项数量,以构建多模态提示。在两个数据集上进行的大量实验表明,我们的方法在少样本设置下的两项MABSA相关任务中优于强基线模型。