This paper provides a systematic comparison between Fitted Dynamic Programming (DP), where demand is estimated from data, and Reinforcement Learning (RL) methods in finite-horizon dynamic pricing problems. We analyze their performance across environments of increasing structural complexity, ranging from a single typology benchmark to multi-typology settings with heterogeneous demand and inter-temporal revenue constraints. Unlike simplified comparisons that restrict DP to low-dimensional settings, we apply dynamic programming in richer, multi-dimensional environments with multiple product types and constraints. We evaluate revenue performance, stability, constraint satisfaction behavior, and computational scaling, highlighting the trade-offs between explicit expectation-based optimization and trajectory-based learning.
翻译:本文系统比较了拟合动态规划(DP,通过数据估计需求)与强化学习(RL)方法在有限时域动态定价问题中的表现。我们分析了它们在结构复杂度递增的环境中的性能,范围从单一类型的基准场景到包含异质性需求与跨期收入约束的多类型设置。与将DP局限于低维度的简化比较不同,我们将动态规划应用于具有多种产品类型和约束的更高维、更丰富的环境。我们评估了收入表现、稳定性、约束满足行为以及计算规模扩展性,重点揭示了基于显式期望的优化方法与基于轨迹的学习方法之间的权衡。