The prevalence of memes on social media has created the need to sentiment analyze their underlying meanings for censoring harmful content. Meme censoring systems by machine learning raise the need for a semi-supervised learning solution to take advantage of the large number of unlabeled memes available on the internet and make the annotation process less challenging. Moreover, the approach needs to utilize multimodal data as memes' meanings usually come from both images and texts. This research proposes a multimodal semi-supervised learning approach that outperforms other multimodal semi-supervised learning and supervised learning state-of-the-art models on two datasets, the Multimedia Automatic Misogyny Identification and Hateful Memes dataset. Building on the insights gained from Contrastive Language-Image Pre-training, which is an effective multimodal learning technique, this research introduces SemiMemes, a novel training method that combines auto-encoder and classification task to make use of the resourceful unlabeled data.
翻译:社交媒体上模因的盛行催生了对其潜在含义进行情感分析以审查有害内容的需求。基于机器学习的模因审查系统需要利用半监督学习方法,以充分利用互联网上大量未标注的模因数据,降低标注过程的难度。此外,该方法需利用多模态数据,因为模因的含义通常同时来源于图像和文本。本研究提出一种多模态半监督学习方法,在多媒体自动性别歧视识别数据集和仇恨模因数据集上,其性能优于其他多模态半监督学习及监督学习的最新模型。基于对比语言-图像预训练(一种有效的多模态学习技术)的洞见,本文引入SemiMemes——一种结合自编码器与分类任务的新型训练方法,以充分利用丰富的未标注数据。