Illegal live-streaming identification, which aims to help live-streaming platforms immediately recognize the illegal behaviors in the live-streaming, such as selling precious and endangered animals, plays a crucial role in purifying the network environment. Traditionally, the live-streaming platform needs to employ some professionals to manually identify the potential illegal live-streaming. Specifically, the professional needs to search for related evidence from a large-scale knowledge database for evaluating whether a given live-streaming clip contains illegal behavior, which is time-consuming and laborious. To address this issue, in this work, we propose a multimodal evidence retrieval system, named OFAR, to facilitate the illegal live-streaming identification. OFAR consists of three modules: \textit{Query Encoder}, \textit{Document Encoder}, and \textit{MaxSim-based Contrastive Late Intersection}. Both query encoder and document encoder are implemented with the advanced \mbox{OFA} encoder, which is pretrained on a large-scale multimodal dataset. In the last module, we introduce contrastive learning on the basis of the MaxiSim-based late intersection, to enhance the model's ability of query-document matching. The proposed framework achieves significant improvement on our industrial dataset TaoLive, demonstrating the advances of our scheme.
翻译:非法直播识别旨在帮助直播平台即时识别直播中的非法行为(如出售珍稀濒危动物),对于净化网络环境具有关键作用。传统上,直播平台需雇佣专业人员手动识别潜在非法直播内容。具体而言,专业人员需从大规模知识库中检索相关证据以评估特定直播片段是否包含非法行为,这一过程耗时耗力。为解决此问题,本文提出名为OFAR的多模态证据检索系统用于辅助非法直播识别。OFAR包含三个模块:查询编码器、文档编码器和基于MaxSim的对比式延迟交叉模块。查询编码器与文档编码器均采用先进的OFA编码器实现,该编码器已在大规模多模态数据集上进行了预训练。在最后一个模块中,我们基于MaxSim延迟交叉引入对比学习,以增强模型在查询-文档匹配中的能力。所提框架在工业数据集TaoLive上取得显著性能提升,验证了本方案的有效性。