Content moderation remains a critical yet challenging task for large-scale user-generated video platforms, especially in livestreaming environments where moderation must be timely, multimodal, and robust to evolving forms of unwanted content. We present a hybrid moderation framework deployed at production scale that combines supervised classification for known violations with reference-based similarity matching for novel or subtle cases. This hybrid design enables robust detection of both explicit violations and novel edge cases that evade traditional classifiers. Multimodal inputs (text, audio, visual) are processed through both pipelines, with a multimodal large language model (MLLM) distilling knowledge into each to boost accuracy while keeping inference lightweight. In production, the classification pipeline achieves 67% recall at 80% precision, and the similarity pipeline achieves 76% recall at 80% precision. Large-scale A/B tests show a 6-8% reduction in user views of unwanted livestreams}. These results demonstrate a scalable and adaptable approach to multimodal content governance, capable of addressing both explicit violations and emerging adversarial behaviors.
翻译:内容审核对于大规模用户生成视频平台而言仍是一项关键且具有挑战性的任务,尤其在直播环境中,审核必须及时、多模态,并能对不断演化的不良内容形式保持鲁棒性。我们提出一种已部署于生产规模的混合审核框架,该框架将针对已知违规内容的监督分类与针对新颖或微妙案例的基于参考的相似性匹配相结合。这种混合设计能够稳健地检测出显性违规行为以及逃避传统分类器的新型边缘案例。多模态输入(文本、音频、视觉)通过两个处理流程进行处理,其中多模态大语言模型(MLLM)将知识提炼至每个流程,在保持推理轻量化的同时提升准确性。在生产环境中,分类流程在80%精确率下实现了67%的召回率,相似性流程在80%精确率下实现了76%的召回率。大规模A/B测试显示,用户观看不良直播的浏览量减少了6-8%。这些结果表明了一种可扩展且适应性的多模态内容治理方法,能够同时应对显性违规行为和新兴的对抗性行为。