We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18. In this model, a principal interacts repeatedly with an agent who possesses an exogenous set of solutions to search for efficient ones. Each solution can yield varying utility for both the principal and the agent, and the agent may propose a solution to maximize its own utility in a selfish manner. To mitigate this behavior, the principal announces an eligible set which screens out a certain set of solutions. The principal, however, does not have any information on the distribution of solutions in advance. Therefore, the principal dynamically announces various eligible sets to efficiently learn the distribution. The principal's objective is to minimize cumulative regret compared to the optimal eligible set in hindsight. We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or stochastic utility. Our analysis mainly characterizes some regimes under which the principal can recover the sublinear regret, thereby shedding light on the rise and fall of the repeated delegation procedure in various regimes.
翻译:我们提出了一个关于重复委托选择问题的研究,这是首个考虑Kleinberg和Kleinberg(EC'18)在线学习变体的工作。在该模型中,委托人反复与一个代理人进行交互,代理人拥有一组外生解集,用于搜索高效解。每个解可能为委托人和代理人产生不同的效用,而代理人可能出于自私目的提出最大化自身效用的解。为缓解此行为,委托人宣布一个合格集,用于筛选出特定解集。然而,委托人事先并不了解解分布的任何信息。因此,委托人动态宣布各种合格集以高效学习分布。委托人的目标是相比事后最优合格集最小化累积遗憾。我们从两个维度探讨了问题设定:代理人是否短期逐利或跨期策略性行动,以及解产生确定性效用还是随机性效用。我们的分析主要刻画了委托人可实现次线性遗憾的若干机制,从而揭示了重复委托程序在不同机制下的兴衰规律。