High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection

We consider a high-dimensional dynamic pricing problem under non-stationarity, where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model with potential changes at unknown times. The demand model is assumed to be a high-dimensional generalized linear model (GLM), allowing for a feature vector in $\mathbb R^d$ that encodes products and consumer information. To achieve optimal revenue (i.e., least regret), the firm needs to learn and exploit the unknown GLMs while monitoring for potential change-points. To tackle such a problem, we first design a novel penalized likelihood-based online change-point detection algorithm for high-dimensional GLMs, which is the first algorithm in the change-point literature that achieves optimal minimax localization error rate for high-dimensional GLMs. A change-point detection assisted dynamic pricing (CPDP) policy is further proposed and achieves a near-optimal regret of order $O(s\sqrt{\Upsilon_T T}\log(Td))$, where $s$ is the sparsity level and $\Upsilon_T$ is the number of change-points. This regret is accompanied with a minimax lower bound, demonstrating the optimality of CPDP (up to logarithmic factors). In particular, the optimality with respect to $\Upsilon_T$ is seen for the first time in the dynamic pricing literature, and is achieved via a novel accelerated exploration mechanism. Extensive simulation experiments and a real data application on online lending illustrate the efficiency of the proposed policy and the importance and practical value of handling non-stationarity in dynamic pricing.

翻译：本文研究了非平稳性下的高维动态定价问题，其中企业向$T$个顺序到达的消费者销售产品，这些消费者的行为遵循未知的需求模型，且该模型可能在未知时刻发生结构性变化。需求模型假设为高维广义线性模型（GLM），允许使用$\mathbb R^d$中的特征向量来编码产品与消费者信息。为实现最优收益（即最小遗憾值），企业需要在监测潜在变化点的同时，学习并利用未知的广义线性模型。为解决这一问题，我们首先针对高维GLM设计了一种新颖的基于惩罚似然的在线变点检测算法，这是变点检测文献中首个实现高维GLM最优极小极大定位误差率的算法。进一步地，我们提出了一种变点检测辅助的动态定价（CPDP）策略，该策略实现了阶为$O(s\sqrt{\Upsilon_T T}\log(Td))$的近最优遗憾值，其中$s$为稀疏度水平，$\Upsilon_T$为变点数量。该遗憾值伴随一个极小极大下界，证明了CPDP的最优性（对数因子范围内）。特别地，对$\Upsilon_T$的最优性在动态定价文献中首次被揭示，并通过一种新颖的加速探索机制实现。大量仿真实验及在线借贷领域的真实数据应用，验证了所提策略的有效性以及处理动态定价中非平稳性的重要实践价值。