This paper demonstrates the successful application of Off-Policy Evaluation (OPE) to accelerate recommender system development and optimization at Adyen, a global leader in financial payment processing. Facing the limitations of traditional A/B testing, which proved slow, costly, and often inconclusive, we integrated OPE to enable rapid evaluation of new recommender system variants using historical data. Our analysis, conducted on a billion-scale dataset of transactions, reveals a strong correlation between OPE estimates and online A/B test results, projecting an incremental 9--54 million transactions over a six-month period. We explore the practical challenges and trade-offs associated with deploying OPE in a high-volume production environment, including leveraging exploration traffic for data collection, mitigating variance in importance sampling, and ensuring scalability through the use of Apache Spark. By benchmarking various OPE estimators, we provide guidance on their effectiveness and integration into the decision-making systems for large-scale industrial payment systems.
翻译:本文展示了离线策略评估在Adyen(全球领先的金融支付处理平台)推荐系统开发与优化中的成功应用。针对传统A/B测试方法存在的效率低下、成本高昂且结论往往不明确等局限性,我们通过集成OPE技术实现了利用历史数据对新型推荐系统变体进行快速评估。基于十亿级交易数据集的分析表明,OPE估计值与在线A/B测试结果具有强相关性,预计可在六个月内实现900万至5400万笔增量交易。本文深入探讨了在高流量生产环境中部署OPE面临的实际挑战与权衡,包括利用探索流量进行数据采集、降低重要性采样方差,以及通过Apache Spark确保系统可扩展性。通过对多种OPE评估器进行基准测试,我们为大规模工业支付系统的决策系统提供了关于评估器效能与集成方案的实施指引。