Devising dynamic pricing policy with always valid online statistical learning procedure is an important and as yet unresolved problem. Most existing dynamic pricing policy, which focus on the faithfulness of adopted customer choice models, exhibit a limited capability for adapting the online uncertainty of learned statistical model during pricing process. In this paper, we propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees. The new approach overcomes the challenge of continuous monitoring of online Lasso procedure and possesses several appealing properties. In particular, we make the decisive observation that the always-validity of pricing decisions builds and thrives on the online regularization scheme. Our proposed online regularization scheme equips the proposed optimistic online regularized maximum likelihood pricing (OORMLP) pricing policy with three major advantages: encode market noise knowledge into pricing process optimism; empower online statistical learning with always-validity over all decision points; envelop prediction error process with time-uniform non-asymptotic oracle inequalities. This type of non-asymptotic inference results allows us to design more sample-efficient and robust dynamic pricing algorithms in practice. In theory, the proposed OORMLP algorithm exploits the sparsity structure of high-dimensional models and secures a logarithmic regret in a decision horizon. These theoretical advances are made possible by proposing an optimistic online Lasso procedure that resolves dynamic pricing problems at the process level, based on a novel use of non-asymptotic martingale concentration. In experiments, we evaluate OORMLP in different synthetic and real pricing problem settings, and demonstrate that OORMLP advances the state-of-the-art methods.
翻译:制定具有始终有效在线统计学习过程的动态定价策略是一个重要且尚未解决的问题。大多数现有的动态定价策略聚焦于所采用客户选择模型的忠实性,但在定价过程中适应已学习统计模型的在线不确定性方面能力有限。本文提出了一种新颖的基于正则化在线统计学习且具有理论保证的动态定价策略设计方法。这一新方法克服了持续监控在线Lasso过程的挑战,并具备多个吸引人的特性。特别地,我们观察到定价决策的始终有效性构建并依赖于在线正则化方案。我们提出的在线正则化方案赋予了所提出的乐观在线正则化最大似然定价(OORMLP)策略三大优势:将市场噪声知识编码到定价过程的乐观性中;使在线统计学习在所有决策点上具备始终有效性;用时间一致的非渐近预言不等式包络预测误差过程。这种非渐近推断结果使我们能设计出实践中更样本高效且更稳健的动态定价算法。理论上,所提出的OORMLP算法利用了高维模型的稀疏结构,并在决策时间范围内实现了对数遗憾。这些理论进步得益于提出了一种乐观在线Lasso过程,该过程基于非渐近鞅集中的新颖应用,在过程层面解决了动态定价问题。在实验中,我们评估了OORMLP在不同合成和实际定价问题设置下的表现,并证明OORMLP超越了现有最先进方法。