The prevalence of harmful content on social media platforms poses significant risks to users and society, necessitating more effective and scalable content moderation strategies. Current approaches rely on human moderators, supervised classifiers, and large volumes of training data, and often struggle with scalability, subjectivity, and the dynamic nature of harmful content (e.g., violent content, dangerous challenge trends, etc.). To bridge these gaps, we utilize Large Language Models (LLMs) to undertake few-shot dynamic content moderation via in-context learning. Through extensive experiments on multiple LLMs, we demonstrate that our few-shot approaches can outperform existing proprietary baselines (Perspective and OpenAI Moderation) as well as prior state-of-the-art few-shot learning methods, in identifying harm. We also incorporate visual information (video thumbnails) and assess if different multimodal techniques improve model performance. Our results underscore the significant benefits of employing LLM based methods for scalable and dynamic harmful content moderation online.
翻译:社交媒体平台上有害内容的普遍存在对用户和社会构成重大风险,亟需更有效且可扩展的内容审核策略。当前方法依赖于人工审核员、监督分类器及大量训练数据,且在可扩展性、主观性以及有害内容的动态特性(例如暴力内容、危险挑战趋势等)方面常面临挑战。为弥补这些不足,我们利用大型语言模型通过上下文学习进行少样本动态内容审核。通过对多种LLM的大量实验,我们证明在有害内容识别方面,我们的少样本方法能够超越现有的专有基线模型(Perspective和OpenAI Moderation)以及先前最先进的少样本学习方法。我们还整合了视觉信息(视频缩略图),并评估不同多模态技术是否能提升模型性能。我们的研究结果凸显了采用基于LLM的方法进行在线可扩展动态有害内容审核的显著优势。