This paper presents a computational case study that evaluates the capabilities of specialized machine learning models and emerging multimodal large language models for Visual Political Communication (VPC) analysis. Focusing on concentrated visibility in Instagram stories and posts during the 2021 German federal election campaign, we compare the performance of traditional computer vision models (FaceNet512, RetinaFace, Google Cloud Vision) with a multimodal large language model (GPT-4o) in identifying front-runner politicians and counting individuals in images. GPT-4o outperformed the other models, achieving a macro F1-score of 0.89 for face recognition and 0.86 for person counting in stories. These findings demonstrate the potential of advanced AI systems to scale and refine visual content analysis in political communication while highlighting methodological considerations for future research.
翻译:本文通过计算案例研究,评估了专门机器学习模型与新兴多模态大语言模型在视觉政治传播(VPC)分析中的能力。聚焦2021年德国联邦大选期间Instagram故事和帖子中的集中可见性,我们比较了传统计算机视觉模型(FaceNet512、RetinaFace、Google Cloud Vision)与多模态大语言模型(GPT-4o)在识别领先候选人和统计图像人数方面的表现。GPT-4o表现优于其他模型,在故事中的人脸识别宏平均F1分数达0.89,人物计数达0.86。这些发现展示了先进AI系统在政治传播中规模化及精细化视觉内容分析的潜力,同时为未来研究提出了方法论考量。