This study introduces an optimal mechanism in a dynamic stochastic knapsack environment. The model features a single seller who has a fixed quantity of a perfectly divisible item. Impatient buyers with a piece-wise linear utility function arrive randomly and they report the two-dimensional private information: marginal value and demanded quantity. We derive a revenue-maximizing dynamic mechanism in a finite discrete time framework that satisfies incentive compatibility, individual rationality, and feasibility conditions. It is achieved by characterizing buyers' utility and deriving the Bellman equation. Moreover, we propose the essential penalty scheme for incentive compatibility, as well as the allocation and payment policies. Lastly, we propose algorithms to approximate the optimal policy, based on the Monte Carlo simulation-based regression method and reinforcement learning.
翻译:本研究提出一个动态随机背包环境中的最优机制。模型以单一卖方为特征,其拥有固定数量的完全可分商品。具有分段线性效用函数的不耐烦买家随机到达,并报告其二维私人信息:边际价值与需求数量。我们在有限离散时间框架中推导出满足激励相容、个体理性与可行条件的收益最大化动态机制,通过刻画买家效用并推导贝尔曼方程实现该目标。此外,我们提出保证激励相容性的核心惩罚方案,以及分配与支付策略。最后,基于蒙特卡洛模拟回归方法与强化学习,我们提出近似最优策略的算法。