We present LLaVAC, a method for constructing a classifier for multimodal sentiment analysis. This method leverages fine-tuning of the Large Language and Vision Assistant (LLaVA) to predict sentiment labels across both image and text modalities. Our approach involves designing a structured prompt that incorporates both unimodal and multimodal labels to fine-tune LLaVA, enabling it to perform sentiment classification effectively. Experiments on the MVSA-Single dataset demonstrate that LLaVAC outperforms existing methods in multimodal sentiment analysis across three data processing procedures. The implementation of LLaVAC is publicly available at https://github.com/tchayintr/llavac.
翻译:本文提出LLaVAC,一种构建多模态情感分析分类器的方法。该方法通过微调大型语言与视觉助手(LLaVA),使其能够跨图像和文本模态预测情感标签。我们的方案涉及设计一种融合单模态与多模态标签的结构化提示,用以微调LLaVA,从而使其有效执行情感分类任务。在MVSA-Single数据集上的实验表明,在三种数据处理流程中,LLaVAC在多模态情感分析任务上均优于现有方法。LLaVAC的实现代码已公开于https://github.com/tchayintr/llavac。