Partnering with a large online retailer, we consider the problem of sending daily personalized promotions to a userbase of over 20 million customers. We propose an efficient policy for determining, every day, the promotion that each customer should receive (10%, 12%, 15%, 17%, or 20% off), while respecting global allocation constraints. This policy was successfully deployed to see a 4.5% revenue increase during an A/B test, by better targeting promotion-sensitive customers and also learning intertemporal patterns across customers. We also consider theoretically modeling the intertemporal state of the customer. The data suggests a simple new combinatorial model of pricing with reference effects, where the customer remembers the best promotion they saw over the past $\ell$ days as the "reference value", and is more likely to purchase if this value is poor. We tightly characterize the structure of optimal policies for maximizing long-run average revenue under this model -- they cycle between offering poor promotion values $\ell$ times and offering good values once.
翻译:与一家大型在线零售商合作,我们研究了向超过2000万客户群体发送每日个性化促销的问题。我们提出了一种高效策略,用于每日确定每位客户应获得的促销折扣(10%、12%、15%、17%或20%折扣),同时满足全局分配约束。该策略通过更精准地定位促销敏感客户并学习客户间的跨期模式,在A/B测试中成功部署并实现了4.5%的营收增长。我们还从理论角度对客户的跨期状态进行建模。数据表明了一种新的简单组合定价模型,该模型包含参考效应:客户会记住过去$\ell$天内看到的最佳促销作为"参考价值",当该价值较低时更可能产生购买行为。我们精确刻画了在该模型下最大化长期平均收益的最优策略结构——它们会在提供$\ell$次低价值促销与提供一次高价值促销之间循环。