Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and Limited Liability

We study a repeated contracting setting in which a Principal adaptively chooses amongst $k$ Agents at each of $T$ rounds. The Agents are non-myopic, and so a mechanism for the Principal induces a $T$-round extensive form game amongst the Agents. We give several results aimed at understanding an under-explored aspect of contract theory -- the game induced when choosing an Agent to contract with. First, we show that this game admits a pure-strategy \emph{non-responsive} equilibrium amongst the Agents -- informally an equilibrium in which the Agent's actions depend on the history of realized states of nature, but not on the history of each other's actions, and so avoids the complexities of collusion and threats. Next, we show that if the Principal selects Agents using a \emph{monotone} bandit algorithm, then for any concave contract, in any such equilibrium, the Principal obtains no regret to contracting with the best Agent in hindsight -- not just given their realized actions, but also to the counterfactual world in which they had offered a guaranteed $T$-round contract to the best Agent in hindsight, which would have induced a different sequence of actions. Finally, we show that if the Principal selects Agents using a monotone bandit algorithm which guarantees no swap-regret, then the Principal can additionally offer only limited liability contracts (in which the Agent never needs to pay the Principal) while getting no-regret to the counterfactual world in which she offered a linear contract to the best Agent in hindsight -- despite the fact that linear contracts are not limited liability. We instantiate this theorem by demonstrating the existence of a monotone no swap-regret bandit algorithm, which to our knowledge has not previously appeared in the literature.

翻译：我们研究了一个重复合约设定，其中委托人每轮从$k$个代理人中自适应地选择一名代理人，共进行$T$轮。代理人是非短视的，因此委托人的机制会在代理人之间引发一个$T$轮的扩展形式博弈。我们给出了一些结果，旨在理解合约理论中一个尚未充分探索的方面——选择合约代理人时所引发的博弈。首先，我们证明该博弈在代理人之间存在一个纯策略的“非响应”均衡——非正式地说，这是一种代理人行动取决于自然状态实现的历史，而不取决于彼此行动历史的均衡，从而避免了共谋和威胁的复杂性。其次，我们证明，如果委托人使用一个“单调”的赌博机算法选择代理人，那么对于任何凹合约，在任何此类均衡中，委托人在事后与最佳代理人签约时不会产生遗憾——不仅针对其实际采取的行动，还针对一个反事实世界，即该世界假设委托人向事后最佳代理人提供了有保证的$T$轮合约，而这会引发不同的行动序列。最后，我们证明，如果委托人使用一个保证无交换遗憾的单调赌博机算法选择代理人，那么委托人可以额外仅提供有限责任合约（其中代理人永远无需向委托人支付），同时在反事实世界中不产生遗憾——尽管线性合约并非有限责任合约，该反事实世界假设委托人向事后最佳代理人提供了线性合约。我们通过证明存在一个单调无交换遗憾的赌博机算法来实例化这一定理，据我们所知，该算法此前未在文献中出现过。