The rapid proliferation of AI-powered video generation systems has introduced significant challenges in content moderation, particularly with respect to adult and sexually explicit material. Existing detection methods operate on either prompts or decoded pixel-space outputs. Therefore, both approaches are blind to the rich internal representations formed during generation. In this paper, we propose a novel latent space probing framework that intercepts the denoised latent representations produced by the CogVideoX video diffusion model during inference and attaches lightweight classifiers to perform real-time adult content detection. To support this work, we construct a large-scale binary dataset of 11039 ten-second video clips (5086 violating, 5953 non-violating) sourced from adult websites and YouTube respectively. We introduce two lightweight probing classifier architectures. We train and evaluate it on the dataset. Our work demonstrates that latent-space signals encode strong discriminative features for harmful content detection, achieving 97.29% F1 on our held-out test set with an overhead in the 4-6ms range. Our results suggest that probing the latent space results in improvements in both detection performance as well as cost.
翻译:人工智能驱动的视频生成系统的快速普及给内容审核带来了重大挑战,尤其涉及成人及色情内容。现有检测方法仅对提示词或解码后的像素空间输出进行操作,因此这两种方法都无法捕获生成过程中形成的丰富内部表征。本文提出一种新颖的隐空间探测框架,该框架在推理阶段截取CogVideoX视频扩散模型产生的去噪隐空间表征,并附加轻量级分类器以实现实时成人内容检测。为支持本研究,我们构建了一个包含11039个十秒视频片段的大规模二元数据集(其中违规内容5086段,非违规内容5953段),分别来源于成人网站和YouTube。我们引入两种轻量级探测分类器架构,并在该数据集上进行训练与评估。实验表明,隐空间信号编码了可用于有害内容检测的强判别特征,在保留测试集上达到97.29%的F1分数,额外开销仅为4-6毫秒。研究结果表明,隐空间探测能在检测性能与计算成本两方面带来显著提升。