We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18. In this model, a principal interacts repeatedly with an agent who possesses an exogenous set of solutions to search for efficient ones. Each solution can yield varying utility for both the principal and the agent, and the agent may propose a solution to maximize its own utility in a selfish manner. To mitigate this behavior, the principal announces an eligible set which screens out a certain set of solutions. The principal, however, does not have any information on the distribution of solutions in advance. Therefore, the principal dynamically announces various eligible sets to efficiently learn the distribution. The principal's objective is to minimize cumulative regret compared to the optimal eligible set in hindsight. We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or stochastic utility. Our analysis mainly characterizes some regimes under which the principal can recover the sublinear regret, thereby shedding light on the rise and fall of the repeated delegation procedure in various regimes.
翻译:我们针对重复委托选择问题开展了一项研究,这是首个考虑Kleinberg与Kleinberg(EC'18)在线学习变体的工作。在该模型中,委托人反复与拥有外生解集的智能体互动,以搜索高效解。每个解可为委托人和智能体带来不同的效用,且智能体可能以利己方式选择最大化自身效用的解。为缓解这种行为,委托人会公布一个合格集以筛选特定解集。然而,委托人无法预先获知解的分布信息,因此需动态调整合格集以高效学习该分布。委托人的目标是最小化与事后最优合格集相比的累积遗憾。我们从两个维度探究问题设定:智能体是短视行为还是跨轮次策略性行为,以及解产生确定性效用还是随机性效用。我们的分析主要刻画了委托人能够实现次线性遗憾的若干情形,从而揭示了重复委托程序在不同情境下的兴衰规律。