MatchLM2Lite: A Scalable MLLM-to-Lite Framework for Reproduced Content Identification

Content moderation is critical for online video platforms to ensure content safety, protect creators, and sustain positive user experiences. Beyond filtering harmful content, platforms must guarantee content authenticity at scale so that users are exposed to diverse, original videos rather than low-value reproductions. We present MatchLM2Lite, a real-time, production-grade reproduced content identification (RCI) system that leverages the powerful understanding of a multimodal large language model (MLLM) distilled into a small and fast-inference model. Our system jointly models video, audio, and text signals, operating on pairs of videos to produce fine-grained reproduction scores. The system comprises two modules, MatchLM and MatchLite, and a two-stage training recipe. First, our high-capacity MLLM, MatchLM, serves as a teacher model to define the upper bound of RCI performance. Its capabilities are then distilled into a compact student model, MatchLite. This design allows MatchLite to deliver low-latency, high-throughput inference on video pairs while preserving much of MatchLM's accuracy, making it suitable for integration into real-time recommendation systems. MatchLM achieves an F1-score improvement of +8.57 compared to our previous production model. After knowledge distillation, MatchLite retains a +6.55 gain in F1-score while reducing computational cost by 35x. Deployed at scale, MatchLM2Lite enables efficient, pairwise multimodal RCI, stably serving online traffic at high queries per second (QPS) with an end-to-end latency below 30 seconds. This system has reduced the reproduced video view rate on our platform by 2.5% without degrading user engagement, demonstrating its effectiveness in a large-scale production environment.

翻译：内容审核对于在线视频平台确保内容安全、保护创作者并维持积极的用户体验至关重要。在过滤有害内容之外，平台必须大规模保障内容真实性，确保用户接触到多样化的原创视频而非低价值复刻内容。我们提出MatchLM2Lite——一种实时生产级复刻内容识别（RCI）系统，该系统将多模态大语言模型（MLLM）强大的理解能力蒸馏至小型快速推理模型。我们的系统联合建模视频、音频和文本信号，通过视频对操作生成细粒度复刻分数。系统包含MatchLM和MatchLite两个模块以及两阶段训练方案：首先，高容量MLLM模型MatchLM作为教师模型定义RCI性能上限；随后将其能力蒸馏至紧凑学生模型MatchLite。这种设计使MatchLite在保持MatchLM大部分精度的同时，能够对视频对实现低延迟、高吞吐量推理，适合集成到实时推荐系统中。MatchLM相比我们之前的生成模型，F1分数提升了+8.57；知识蒸馏后MatchLite保留了+6.55的F1增益，同时计算成本降低35倍。大规模部署时，MatchLM2Lite能够实现高效的成对多模态RCI，在高每秒查询数（QPS）下稳定服务在线流量，端到端延迟低于30秒。该系统在不降低用户参与度的前提下，将平台复刻视频浏览量降低了2.5%，证明了其在大规模生产环境中的有效性。