The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shopping behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained agents using offline batch data comprising summarized customer purchase histories to help mitigate this effect. Our experiments revealed that contextual bandit and deep RL methods that are less prone to over-fitting the sparse reward distributions significantly outperform static policies. This study offers a practical framework for simulating AI agents that optimize the entire retail customer journey. It aims to inspire the further development of simulation tools for retail AI systems.
翻译:开放基准测试平台的开发将极大加速AI智能体在零售领域的应用。本文针对优化优惠券投放的强化学习智能体基准测试需求,构建了全面的客户购物行为仿真模型。该学习问题的主要挑战来源于客户购买事件的稀疏性。为缓解这一影响,我们使用包含汇总客户购买历史记录的离线批次数据对智能体进行训练。实验结果表明,对稀疏奖励分布过拟合倾向较低的上下文强盗算法与深度强化学习方法,其性能显著优于静态策略。本研究为优化完整零售客户旅程的AI智能体仿真提供了实用框架,旨在启发零售AI系统仿真工具的进一步发展。