Because multimodal data contains more modal information, multimodal sentiment analysis has become a recent research hotspot. However, redundant information is easily involved in feature fusion after feature extraction, which has a certain impact on the feature representation after fusion. Therefore, in this papaer, we propose a new multimodal sentiment analysis model. In our model, we use BERT + BiLSTM as new feature extractor to capture the long-distance dependencies in sentences and consider the position information of input sequences to obtain richer text features. To remove redundant information and make the network pay more attention to the correlation between image and text features, CNN and CBAM attention are added after splicing text features and picture features, to improve the feature representation ability. On the MVSA-single dataset and HFM dataset, compared with the baseline model, the ACC of our model is improved by 1.78% and 1.91%, and the F1 value is enhanced by 3.09% and 2.0%, respectively. The experimental results show that our model achieves a sound effect, similar to the advanced model.
翻译:由于多模态数据包含更丰富的模态信息,多模态情感分析已成为近年来的研究热点。然而,特征提取后的融合过程中易引入冗余信息,对融合后的特征表征造成一定影响。为此,本文提出一种新型多模态情感分析模型。该模型采用BERT+BiLSTM作为新型特征提取器,以捕获语句中的长距离依赖关系,并考虑输入序列的位置信息,从而获取更丰富的文本特征。为消除冗余信息并使网络更关注图像与文本特征间的相关性,在拼接文本特征与图像特征后引入CNN与CBAM注意力机制,有效提升特征表征能力。在MVSA-single数据集和HFM数据集上,与基线模型相比,本模型ACC值分别提升1.78%和1.91%,F1值分别提升3.09%和2.0%。实验结果表明,本模型取得了与先进模型相当的优异效果。