We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18. In this model, a principal interacts repeatedly with an agent who possesses an exogenous set of solutions to search for efficient ones. Each solution can yield varying utility for both the principal and the agent, and the agent may propose a solution to maximize its own utility in a selfish manner. To mitigate this behavior, the principal announces an eligible set which screens out a certain set of solutions. The principal, however, does not have any information on the distribution of solutions in advance. Therefore, the principal dynamically announces various eligible sets to efficiently learn the distribution. The principal's objective is to minimize cumulative regret compared to the optimal eligible set in hindsight. We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or stochastic utility. Our analysis mainly characterizes some regimes under which the principal can recover the sublinear regret, thereby shedding light on the rise and fall of the repeated delegation procedure in various regimes.
翻译:我们针对重复委托选择问题开展了研究,这是首个考虑Kleinberg和Kleinberg(EC'18)在线学习变体的工作。在该模型中,委托人反复与代理人互动,代理人拥有一个外生解决方案集以搜索高效方案。每个方案能给委托人和代理人带来不同的效用,且代理人可能出于利己动机最大化自身效用而提出方案。为缓解此行为,委托人通过宣布一个合格集来筛选特定方案。然而,委托人事先并不了解方案的分布信息,因此需要动态调整合格集以高效学习该分布。委托人的目标是最小化与事后最优合格集相比的累积遗憾。我们从两个维度探讨问题设置:代理人是否短视还是跨轮次策略化,以及解决方案产生确定性还是随机性效用。通过分析,我们主要刻画了委托人能实现次线性遗憾的若干机制,揭示了重复委托流程在不同情景下的兴衰规律。