This paper proposes a practically efficient algorithm with optimal theoretical regret which solves the classical network revenue management (NRM) problem with unknown, nonparametric demand. Over a time horizon of length $T$, in each time period the retailer needs to decide prices of $N$ types of products which are produced based on $M$ types of resources with unreplenishable initial inventory. When demand is nonparametric with some mild assumptions, Miao and Wang (2021) is the first paper which proposes an algorithm with $O(\text{poly}(N,M,\ln(T))\sqrt{T})$ type of regret (in particular, $\tilde O(N^{3.5}\sqrt{T})$ plus additional high-order terms that are $o(\sqrt{T})$ with sufficiently large $T\gg N$). In this paper, we improve the previous result by proposing a primal-dual optimization algorithm which is not only more practical, but also with an improved regret of $\tilde O(N^{3.25}\sqrt{T})$ free from additional high-order terms. A key technical contribution of the proposed algorithm is the so-called demand balancing, which pairs the primal solution (i.e., the price) in each time period with another price to offset the violation of complementary slackness on resource inventory constraints. Numerical experiments compared with several benchmark algorithms further illustrate the effectiveness of our algorithm.
翻译:本文提出了一种兼具最优理论遗憾值与实际高效性的算法,可解决具有未知非参数需求的经典网络收益管理(NRM)问题。在总时长为$T$的时间范围内,零售商需在每个时间段内为基于$M$种不可补充初始库存资源生产的$N$种产品定价。在非参数需求满足温和假设的条件下,Miao和Wang (2021) 首次提出一种遗憾值为$O(\text{poly}(N,M,\ln(T))\sqrt{T})$型(具体为$\tilde O(N^{3.5}\sqrt{T})$加上当$T\gg N$时阶为$o(\sqrt{T})$的高阶项)的算法。本文通过提出一种更实用的原始-对偶优化算法改进了该结果,新算法不仅具有更优的$\tilde O(N^{3.25}\sqrt{T})$遗憾值,且无需额外高阶项。该算法的关键技术贡献在于"需求平衡"机制:它将每个时间段的原始解(即价格)与另一个价格配对,以抵消资源库存约束互补松弛条件的违反。与多种基准算法的数值实验进一步验证了本算法的有效性。