In many multi-player interactions, players incur strictly positive costs each time they execute actions e.g. 'menu costs' or transaction costs in financial systems. Since acting at each available opportunity would accumulate prohibitively large costs, the resulting decision problem is one in which players must make strategic decisions about when to execute actions in addition to their choice of action. This paper analyses a discrete-time stochastic game (SG) in which players face minimally bounded positive costs for each action and influence the system using impulse controls. We prove SGs of two-sided impulse control have a unique value and characterise the saddle point equilibrium in which the players execute actions at strategically chosen times in accordance with Markovian strategies. We prove the game respects a dynamic programming principle and that the Markov perfect equilibrium can be computed as a limit point of a sequence of Bellman operations. We then introduce a new Q-learning variant which we show converges almost surely to the value of the game enabling solutions to be extracted in unknown settings. Lastly, we extend our results to settings with budgetory constraints.
翻译:在许多多参与者交互中,参与者每次执行行动时都会产生严格正的成本,例如金融系统中的“菜单成本”或交易成本。由于在每个可用机会下采取行动会累积高得令人望而却步的成本,由此产生的决策问题就变成了参与者除了选择行动外,还必须就何时执行行动做出战略决策。本文分析了一种离散时间随机博弈,其中参与者每次行动都面临最小有界的正成本,并使用脉冲控制来影响系统。我们证明了两侧脉冲控制的随机博弈具有唯一值,并刻画了鞍点均衡,在该均衡中,参与者根据马尔可夫策略在战略选择的时间执行行动。我们证明了该博弈遵循动态规划原理,并且马尔可夫完美均衡可以作为一系列贝尔曼算子序列的极限点来计算。然后,我们引入了一种新的Q学习变体,并证明其几乎必然收敛于博弈的值,从而能够在未知环境中提取解。最后,我们将结果扩展到具有预算约束的场景。