The Sample Complexity of Online Contract Design

We study the hidden-action principal-agent problem in an online setting. In each round, the principal posts a contract that specifies the payment to the agent based on each outcome. The agent then makes a strategic choice of action that maximizes her own utility, but the action is not directly observable by the principal. The principal observes the outcome and receives utility from the agent's choice of action. Based on past observations, the principal dynamically adjusts the contracts with the goal of maximizing her utility. We introduce an online learning algorithm and provide an upper bound on its Stackelberg regret. We show that when the contract space is $[0,1]^m$, the Stackelberg regret is upper bounded by $\widetilde O(\sqrt{m} \cdot T^{1-1/(2m+1)})$, and lower bounded by $\Omega(T^{1-1/(m+2)})$, where $\widetilde O$ omits logarithmic factors. This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design. Moreover, when contracts are restricted to some subset $\mathcal{F} \subset [0,1]^m$, we define an intrinsic dimension of $\mathcal{F}$ that depends on the covering number of the spherical code in the space and bound the regret in terms of this intrinsic dimension. When $\mathcal{F}$ is the family of linear contracts, we show that the Stackelberg regret grows exactly as $\Theta(T^{2/3})$. The contract design problem is challenging because the utility function is discontinuous. Bounding the discretization error in this setting has been an open problem. In this paper, we identify a limited set of directions in which the utility function is continuous, allowing us to design a new discretization method and bound its error. This approach enables the first upper bound with no restrictions on the contract and action space.

翻译：我们研究隐藏行动下的在线委托代理问题。每轮博弈中，委托人发布一份基于各结果向代理人支付的契约。代理人随后策略性地选择能最大化自身效用的行动，但该行动无法被委托人直接观测。委托人观测到结果并从代理人选择的行为中获得效用。基于历史观测结果，委托人动态调整契约以最大化自身效用。我们提出一种在线学习算法，并给出其斯塔克尔伯格遗憾的上界。研究表明，当契约空间为 $[0,1]^m$ 时，斯塔克尔伯格遗憾的上界为 $\widetilde O(\sqrt{m} \cdot T^{1-1/(2m+1)})$，下界为 $\Omega(T^{1-1/(m+2)})$，其中 $\widetilde O$ 忽略对数因子。该结果说明指数级 $m$ 样本足以且必须用于学习近似最优契约，解决了在线契约设计难度的开放性问题。进一步地，当契约限制在子集 $\mathcal{F} \subset [0,1]^m$ 时，我们定义了基于空间中球面编码覆盖数的 $\mathcal{F}$ 内在维度，并依据该内在维度界定了遗憾值。当 $\mathcal{F}$ 为线性契约族时，我们证明斯塔克尔伯格遗憾精确呈现为 $\Theta(T^{2/3})$。契约设计问题的挑战在于效用函数的不连续性，此前该情境下离散化误差的界定一直是开放性问题。本文识别出效用函数具有连续性的有限方向集，基于此设计新离散化方法并给出误差界。该方法首次实现了对契约与行动空间无限制的上界分析。