Short video platforms, such as YouTube, Instagram, or TikTok, are used by billions of users globally. These platforms expose users to harmful content, ranging from clickbait or physical harms to misinformation or online hate. Yet, detecting harmful videos remains challenging due to an inconsistent understanding of what constitutes harm and limited resources and mental tolls involved in human annotation. As such, this study advances measures and methods to detect harm in video content. First, we develop a comprehensive taxonomy for online harm on video platforms, categorizing it into six categories: Information, Hate and harassment, Addictive, Clickbait, Sexual, and Physical harms. Next, we establish multimodal large language models as reliable annotators of harmful videos. We analyze 19,422 YouTube videos using 14 image frames, 1 thumbnail, and text metadata, comparing the accuracy of crowdworkers (Mturk) and GPT-4-Turbo with domain expert annotations serving as the gold standard. Our results demonstrate that GPT-4-Turbo outperforms crowdworkers in both binary classification (harmful vs. harmless) and multi-label harm categorization tasks. Methodologically, this study extends the application of LLMs to multi-label and multi-modal contexts beyond text annotation and binary classification. Practically, our study contributes to online harm mitigation by guiding the definitions and identification of harmful content on video platforms.
翻译:全球数十亿用户使用YouTube、Instagram或TikTok等短视频平台。这些平台使用户暴露于有害内容,范围涵盖点击诱饵或身体伤害至虚假信息或网络仇恨。然而,由于对危害构成的理解不一致,以及人工标注涉及的资源和心理负担有限,有害视频检测仍然具有挑战性。因此,本研究提出了检测视频内容危害的衡量标准和方法。首先,我们为视频平台的在线危害制定了全面的分类法,将其分为六类:信息危害、仇恨与骚扰、成瘾性内容、点击诱饵、性相关危害和身体危害。接着,我们将多模态大语言模型确立为有害视频的可靠标注工具。通过分析19,422个YouTube视频(使用14个图像帧、1个缩略图和文本元数据),我们比较了众包工作者(Mturk)和GPT-4-Turbo的准确性,并以领域专家标注作为黄金标准。我们的结果表明,GPT-4-Turbo在二元分类(有害与无害)和多标签危害分类任务中均优于众包工作者。在方法论上,本研究将LLMs的应用从文本标注和二元分类扩展到多标签和多模态场景。在实践层面,我们的研究通过指导视频平台有害内容的定义和识别,为在线危害缓解做出了贡献。