Dynamic mechanism design is a challenging extension to ordinary mechanism design in which the mechanism designer must make a sequence of decisions over time in the face of possibly untruthful reports of participating agents. Optimizing dynamic mechanisms for welfare is relatively well understood. However, there has been less work on optimizing for other goals (e.g. revenue), and without restrictive assumptions on valuations, it is remarkably challenging to characterize good mechanisms. Instead, we turn to automated mechanism design to find mechanisms with good performance in specific problem instances. In fact, the situation is similar even in static mechanism design. However, in the static case, optimization/machine learning-based automated mechanism design techniques have been successful in finding high-revenue mechanisms in cases beyond the reach of analytical results. We extend the class of affine maximizer mechanisms to MDPs where agents may untruthfully report their rewards. This extension results in a challenging bilevel optimization problem in which the upper problem involves choosing optimal mechanism parameters, and the lower problem involves solving the resulting MDP. Our approach can find truthful dynamic mechanisms that achieve strong performance on goals other than welfare, and can be applied to essentially any problem setting-without restrictions on valuations-for which RL can learn optimal policies.
翻译:动态机制设计是普通机制设计的一个具有挑战性的扩展,其中机制设计者必须在一段时间内面对可能虚假报告参与代理人信息的情况下,进行一系列决策。优化福利导向的动态机制已较为清晰。然而,针对其他目标(如收益)的优化研究较少,且在没有对估值施加严格假设的情况下,刻画优质机制极为困难。为此,我们转向自动化机制设计,以在特定问题实例中寻找性能良好的机制。事实上,在静态机制设计中情况类似。然而,在静态场景中,基于优化/机器学习的自动化机制设计技术已成功在超出分析结果范围的情况下找到高收益机制。我们将仿射最大化机制类扩展到马尔可夫决策过程(MDP)中,其中代理人可能虚假报告其奖励。这一扩展引出了一个具有挑战性的双层优化问题:上层问题涉及选择最优机制参数,下层问题则需求解相应的MDP。我们的方法能够找到除福利外其他目标上表现优异的可信动态机制,且可应用于几乎任何问题场景——无需对估值施加限制——只要强化学习(RL)能为其学习到最优策略。