Multimodal sentiment analysis has gained significant attention due to the proliferation of multimodal content on social media. However, existing studies in this area rely heavily on large-scale supervised data, which is time-consuming and labor-intensive to collect. Thus, there is a need to address the challenge of few-shot multimodal sentiment analysis. To tackle this problem, we propose a novel method called Multimodal Probabilistic Fusion Prompts (MultiPoint) that leverages diverse cues from different modalities for multimodal sentiment detection in the few-shot scenario. Specifically, we start by introducing a Consistently Distributed Sampling approach called CDS, which ensures that the few-shot dataset has the same category distribution as the full dataset. Unlike previous approaches primarily using prompts based on the text modality, we design unified multimodal prompts to reduce discrepancies between different modalities and dynamically incorporate multimodal demonstrations into the context of each multimodal instance. To enhance the model's robustness, we introduce a probabilistic fusion method to fuse output predictions from multiple diverse prompts for each input. Our extensive experiments on six datasets demonstrate the effectiveness of our approach. First, our method outperforms strong baselines in the multimodal few-shot setting. Furthermore, under the same amount of data (1% of the full dataset), our CDS-based experimental results significantly outperform those based on previously sampled datasets constructed from the same number of instances of each class.
翻译:多模态情感分析因社交媒体上多模态内容的激增而受到广泛关注。然而,现有该领域研究高度依赖大规模监督数据,而此类数据的收集耗时费力。因此,亟需解决小样本多模态情感分析的挑战。针对此问题,我们提出一种名为多模态概率融合提示(MultiPoint)的新方法,通过利用不同模态的多样化线索在小样本场景下进行多模态情感检测。具体而言,我们首先引入一种称为一致分布采样(CDS)的方法,确保小样本数据集与完整数据集具有相同的类别分布。与先前主要基于文本模态设计提示的方法不同,我们设计了统一多模态提示以减少不同模态间的差异,并动态地将多模态示例融入每个多模态实例的上下文中。为增强模型鲁棒性,我们引入概率融合方法,融合针对每个输入的多组多样化提示的输出预测。在六个数据集上的广泛实验证明了我们方法的有效性。首先,我们的方法在多模态小样本场景中优于强基线模型。此外,在相同数据量(完整数据集的1%)条件下,基于CDS的实验结果显著优于基于先前从每类相同实例数构建的采样数据集的结果。