We study the dynamic pricing and replenishment problems under inconsistent decision frequencies. Different from the traditional demand assumption, the discreteness of demand and the parameter within the Poisson distribution as a function of price introduce complexity into analyzing the problem property. We demonstrate the concavity of the single-period profit function with respect to product price and inventory within their respective domains. The demand model is enhanced by integrating a decision tree-based machine learning approach, trained on comprehensive market data. Employing a two-timescale stochastic approximation scheme, we address the discrepancies in decision frequencies between pricing and replenishment, ensuring convergence to local optimum. We further refine our methodology by incorporating deep reinforcement learning (DRL) techniques and propose a fast-slow dual-agent DRL algorithm. In this approach, two agents handle pricing and inventory and are updated on different scales. Numerical results from both single and multiple products scenarios validate the effectiveness of our methods.
翻译:本研究探讨了决策频率不一致情况下的动态定价与补货问题。与传统需求假设不同,需求的离散性以及泊松分布中作为价格函数的参数引入的复杂性,增加了问题特性分析的难度。我们证明了单周期利润函数在各自定义域内关于产品价格和库存的凹性。通过整合基于决策树的机器学习方法,并利用全面的市场数据进行训练,对需求模型进行了增强。采用双时间尺度随机逼近方案,我们解决了定价与补货决策频率不一致的问题,确保算法收敛至局部最优解。通过引入深度强化学习技术,我们进一步优化了方法体系,提出了一种快慢双智能体深度强化学习算法。该算法中两个智能体分别处理定价与库存决策,并在不同时间尺度上进行更新。单产品与多产品场景下的数值实验结果验证了所提方法的有效性。