The misuse of real photographs with conflicting image captions in news items is an example of the out-of-context (OOC) misuse of media. In order to detect OOC media, individuals must determine the accuracy of the statement and evaluate whether the triplet (~\textit{i.e.}, the image and two captions) relates to the same event. This paper presents a novel learnable approach for detecting OOC media in ICME'23 Grand Challenge on Detecting Cheapfakes. The proposed method is based on the COSMOS structure, which assesses the coherence between an image and captions, as well as between two captions. We enhance the baseline algorithm by incorporating a Large Language Model (LLM), GPT3.5, as a feature extractor. Specifically, we propose an innovative approach to feature extraction utilizing prompt engineering to develop a robust and reliable feature extractor with GPT3.5 model. The proposed method captures the correlation between two captions and effectively integrates this module into the COSMOS baseline model, which allows for a deeper understanding of the relationship between captions. By incorporating this module, we demonstrate the potential for significant improvements in cheap-fakes detection performance. The proposed methodology holds promising implications for various applications such as natural language processing, image captioning, and text-to-image synthesis. Docker for submission is available at https://hub.docker.com/repository/docker/mulns/ acmmmcheapfakes.
翻译:新闻中真实照片与矛盾图像标题的误用是媒体脱离上下文(OOC)滥用的典型案例。为检测OOC媒体,个体需判断陈述准确性并评估三元组(即图像与两个标题)是否关联同一事件。本文针对ICME'23廉价伪造检测挑战赛提出一种新型可学习方法:基于COSMOS结构评估图像与标题间及标题对间的语义连贯性。通过引入大语言模型GPT3.5作为特征提取器增强基线算法,具体提出采用提示工程构建鲁棒可靠的GPT3.5特征提取方法。该方法可捕获双标题相关性,并将其有效集成至COSMOS基线模型,实现对标题关系的深度理解。实验表明,该模块能显著提升廉价伪造检测性能。本方法在自然语言处理、图像描述生成、文本到图像合成等领域具有重要应用前景。提交用的Docker镜像已发布于https://hub.docker.com/repository/docker/mulns/acmmmcheapfakes。