Real-world production systems often grapple with maintaining data quality in large-scale, dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature monitoring and verification in recommendation use cases. Drifter addresses limitations of existing methods by delivering agile, responsive, and adaptable data quality monitoring, enabling real-time root cause analysis, drift detection and insights into problematic production events. Integrating state-of-the-art online feature ranking for sparse data and anomaly detection ideas, Drifter is highly scalable and resource-efficient, requiring only two threads and less than a gigabyte of RAM per production deployments that handle millions of instances per minute. Evaluation on real-world data sets demonstrates Drifter's effectiveness in alerting and mitigating data quality issues, substantially improving reliability and performance of real-time live recommender systems.
翻译:真实世界的生产系统在处理大规模动态流数据时,常面临数据质量维护的挑战。我们提出漂移者(Drifter)——一个面向推荐场景的高效轻量级在线特征监控与验证系统。Drifter通过提供敏捷、响应迅速且适应性强的数据质量监控能力,解决了现有方法的局限性,支持实时根因分析、漂移检测以及生产异常事件的深度洞察。该系统融合了面向稀疏数据的最先进在线特征排序与异常检测理念,具备高度可扩展性与资源高效性——每个每分钟处理数百万实例的生产部署仅需两个线程和不到1GB内存。基于真实数据集的评估表明,Drifter在预警和缓解数据质量问题方面效果显著,大幅提升了实时推荐系统的可靠性与性能。