Social media platforms face escalating challenges in detecting harmful content that promotes muscle dysmorphic behaviors and cognitions (bigorexia). This content can evade moderation by camouflaging as legitimate fitness advice and disproportionately affects adolescent males. We address this challenge with BigTokDetect, a clinically informed framework for identifying pro-bigorexia content on TikTok. We introduce BigTok, the first expert-annotated multimodal benchmark dataset of over 2,200 TikTok videos labeled by clinical psychiatrists across five categories and eighteen fine-grained subcategories. Comprehensive evaluation of state-of-the-art vision-language models reveals that while commercial zero-shot models achieve the highest accuracy on broad primary categories, supervised fine-tuning enables smaller open-source models to perform better on fine-grained subcategory detection. Ablation studies show that multimodal fusion improves performance by 5 to 15 percent, with video features providing the most discriminative signals. These findings support a grounded moderation approach that automates detection of explicit harms while flagging ambiguous content for human review, and they establish a scalable framework for harm mitigation in emerging mental health domains.
翻译:社交媒体平台在检测宣扬肌肉畸形行为和认知(大块头症)的有害内容方面面临日益严峻的挑战。此类内容常伪装成合法的健身建议以规避审核,并对青少年男性群体造成不成比例的影响。为应对这一挑战,我们提出了 BigTokDetect,这是一个基于临床知识、用于识别 TikTok 上促大块头症内容的框架。我们引入了 BigTok,这是首个由专家标注的多模态基准数据集,包含超过 2,200 个 TikTok 视频,由临床精神科医生根据五个主类别和十八个细粒度子类别进行标注。对最先进的视觉-语言模型的综合评估表明,尽管商业零样本模型在宽泛的主类别上取得了最高的准确率,但监督式微调能使较小的开源模型在细粒度子类别检测上表现更优。消融研究表明,多模态融合可将性能提升 5% 至 15%,其中视频特征提供了最具区分度的信号。这些发现支持一种基于事实的审核方法,即自动化检测明确有害内容,同时将模糊内容标记以供人工复审,从而为新兴心理健康领域的危害缓解建立了一个可扩展的框架。