We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme (i.e., a contract) in order to induce an agent to take a costly action leading to a favorable outcome. We consider the online extension of the classical (one-shot) principal-agent problem, in which the principal repeatedly interacts with agents by proposing contracts over multiple rounds. The principal has no information about the agents and, crucially, does not observe their actions. As a result, the principal must learn an optimal contract using only the realized outcomes observed at each round. We focus on the setting with binary actions and single-dimensional agent types, where the agent's private type represents their cost per unit-of-effort. For adversarial-type sequences, we provide tight $Θ(T^{2/3})$ regret guarantees. Remarkably, this rate is completely independent of the number of outcomes $m$. The upper bound is based on two key components: 1) a reduction to a one-dimensional threshold optimization problem and 2) a non-uniform discretization to handle the non-Lipschitz nature of the problem. Moreover, in the case of a single (fixed) hidden type, we show that it is possible to improve the rates and provide a tight $\widetildeΘ(\sqrt{T})$ regret bound. Our algorithm is based on an explore-then-commit strategy where we first approximately learn the hidden type via a stochastic binary search, and then we commit to a ``robustified'' near-optimal contract.
翻译:我们研究委托-代理问题,其中委托人承诺一个依赖于结果的支付方案(即合约),以激励代理人采取成本高昂的行动来获得有利结果。我们考虑经典(单次)委托-代理问题的在线扩展,即委托人在多轮次中通过提出合约与代理人进行重复交互。委托人对代理人的信息一无所知,且关键的是,无法观察到他们的行动。因此,委托人必须仅通过每轮观测到的实现结果来学习最优合约。我们聚焦于二值行动和单维代理人类型的情景,其中代理人的私有类型代表其每单位努力的成本。对于对抗性类型序列,我们给出了紧的 Θ(T^{2/3}) 遗憾保证。值得注意的是,该速率完全独立于结果数量 m。上界基于两个关键组成部分:1)归约到一维阈值优化问题;2)采用非均匀离散化以处理问题的非利普希茨性质。此外,对于单一(固定)隐藏类型的情况,我们证明可以改进速率,并给出紧的 \widetildeΘ(\sqrt{T}) 遗憾界。我们的算法基于一种“探索后承诺”策略:首先通过随机二分搜索近似学习隐藏类型,然后承诺采用“鲁棒化”的近似最优合约。