Revenue-optimal auction design is a challenging problem with significant theoretical and practical implications. Sequential auction mechanisms, known for their simplicity and strong strategyproofness guarantees, are often limited by theoretical results that are largely existential, except for certain restrictive settings. Although traditional reinforcement learning methods such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) are applicable in this domain, they struggle with computational demands and convergence issues when dealing with large and continuous action spaces. In light of this and recognizing that we can model transitions differentiable for our settings, we propose using a new reinforcement learning framework tailored for sequential combinatorial auctions that leverages first-order gradients. Our extensive evaluations show that our approach achieves significant improvement in revenue over both analytical baselines and standard reinforcement learning algorithms. Furthermore, we scale our approach to scenarios involving up to 50 agents and 50 items, demonstrating its applicability in complex, real-world auction settings. As such, this work advances the computational tools available for auction design and contributes to bridging the gap between theoretical results and practical implementations in sequential auction design.
翻译:收益最优拍卖设计是一个具有重要理论和实践意义的挑战性问题。序列拍卖机制以其简洁性和强大的策略证明保证而闻名,但除某些受限场景外,其理论结果大多停留在存在性证明层面,实际应用受限。尽管传统强化学习方法如近端策略优化(PPO)和柔性演员-评论家(SAC)适用于该领域,但在处理大规模连续动作空间时面临计算需求高和收敛困难的问题。基于此,并认识到我们可为当前场景建立可微分的状态转移模型,本文提出一种专为序列组合拍卖设计的新型强化学习框架,该框架利用一阶梯度信息。大量评估结果表明,我们的方法在收益表现上显著优于解析基线方法和标准强化学习算法。此外,我们将该方法扩展至包含50个智能体和50件拍卖品的场景,证明了其在复杂现实拍卖环境中的适用性。因此,本研究推进了拍卖设计的计算工具发展,并为弥合序列拍卖设计中理论成果与实际应用之间的差距作出了贡献。