We investigate the mechanism design problem faced by a principal who hires \emph{multiple} agents to gather and report costly information. Then, the principal exploits the information to make an informed decision. We model this problem as a game, where the principal announces a mechanism consisting in action recommendations and a payment function, a.k.a. scoring rule. Then, each agent chooses an effort level and receives partial information about an underlying state of nature based on the effort. Finally, the agents report the information (possibly non-truthfully), the principal takes a decision based on this information, and the agents are paid according to the scoring rule. While previous work focuses on single-agent problems, we consider multi-agents settings. This poses the challenge of coordinating the agents' efforts and aggregating correlated information. Indeed, we show that optimal mechanisms must correlate agents' efforts, which introduces externalities among the agents, and hence complex incentive compatibility constraints and equilibrium selection problems. First, we design a polynomial-time algorithm to find an optimal incentive compatible mechanism. Then, we study an online problem, where the principal repeatedly interacts with a group of unknown agents. We design a no-regret algorithm that provides $\widetilde{\mathcal{O}}(T^{2/3})$ regret with respect to an optimal mechanism, matching the state-of-the-art bound for single-agent settings.
翻译:我们研究了一个委托人在雇佣多个智能体来收集并报告昂贵信息时所面临的机制设计问题。然后,委托人利用这些信息做出明智决策。我们将此问题建模为一个博弈:委托人宣布一个由行动建议和支付函数(即评分规则)组成的机制。随后,每个智能体选择努力水平,并根据努力程度获取关于自然状态的局部信息。最后,智能体报告信息(可能不真实),委托人基于这些信息做出决策,并根据评分规则向智能体支付报酬。以往研究主要关注单智能体问题,而我们考虑的是多智能体场景。这带来了协调智能体努力和聚合相关信息等挑战。实际上,我们证明最优机制必须关联智能体的努力,这会在智能体之间产生外部性,从而引入复杂的激励相容约束和均衡选择问题。首先,我们设计了一个多项式时间算法来寻找最优激励相容机制。接着,我们研究了一个在线问题,即委托人反复与一组未知的智能体交互。我们设计了一个无遗憾算法,该算法相对于最优机制实现了 $\widetilde{\mathcal{O}}(T^{2/3})$ 的遗憾值,与单智能体场景的最新界相匹配。