Cultural context profoundly shapes how people interpret online content, yet vision-language models (VLMs) remain predominantly trained through Western or English-centric lenses. This limits their fairness and cross-cultural robustness in tasks like hateful meme detection. We introduce a systematic evaluation framework designed to diagnose and quantify the cross-cultural robustness of state-of-the-art VLMs across multilingual meme datasets, analyzing three axes: (i) learning strategy (zero-shot vs. one-shot), (ii) prompting language (native vs. English), and (iii) translation effects on meaning and detection. Results show that the common ``translate-then-detect'' approach deteriorate performance, while culturally aligned interventions - native-language prompting and one-shot learning - significantly enhance detection. Our findings reveal systematic convergence toward Western safety norms and provide actionable strategies to mitigate such bias, guiding the design of globally robust multimodal moderation systems.
翻译:文化背景深刻影响着人们对网络内容的解读,然而视觉-语言模型(VLMs)的训练仍主要基于西方或英语中心视角。这限制了其在仇恨梗图检测等任务中的公平性与跨文化鲁棒性。我们提出了一个系统性评估框架,旨在通过多语言梗图数据集诊断并量化前沿视觉-语言模型的跨文化鲁棒性,重点分析三个维度:(i)学习策略(零样本 vs. 单样本),(ii)提示语言(母语 vs. 英语),以及(iii)翻译对语义与检测的影响。结果表明,常见的“先翻译后检测”方法会降低模型性能,而文化对齐的干预措施——母语提示与单样本学习——能显著提升检测效果。我们的研究揭示了模型系统性地趋同于西方安全规范的现象,并提供了可操作的策略来缓解此类偏差,为设计具有全球鲁棒性的多模态内容审核系统提供了指导。