The prevalence of memes on social media has created the need to sentiment analyze their underlying meanings for censoring harmful content. Meme censoring systems by machine learning raise the need for a semi-supervised learning solution to take advantage of the large number of unlabeled memes available on the internet and make the annotation process less challenging. Moreover, the approach needs to utilize multimodal data as memes' meanings usually come from both images and texts. This research proposes a multimodal semi-supervised learning approach that outperforms other multimodal semi-supervised learning and supervised learning state-of-the-art models on two datasets, the Multimedia Automatic Misogyny Identification and Hateful Memes dataset. Building on the insights gained from Contrastive Language-Image Pre-training, which is an effective multimodal learning technique, this research introduces SemiMemes, a novel training method that combines auto-encoder and classification task to make use of the resourceful unlabeled data.
翻译:社交媒体上模因的泛滥催生了对其隐含意义进行情感分析以审查有害内容的需求。基于机器学习的模因审查系统需要一种半监督学习解决方案,以利用互联网上大量未标注的模因数据,降低标注过程的难度。此外,该方法需要利用多模态数据,因为模因的意义通常同时来源于图像和文本。本研究提出了一种多模态半监督学习方法,在两个数据集(多媒体自动厌女识别数据集与仇恨模因数据集)上,其性能超越了其他多模态半监督学习及监督学习领域的最先进模型。基于对比语言-图像预训练(一种有效的多模态学习技术)的洞见,本研究引入了SemiMemes——一种结合自编码器与分类任务的新颖训练方法,从而充分利用丰富的未标注数据。