We study the problem of designing mechanisms for \emph{information acquisition} scenarios. This setting models strategic interactions between an uniformed \emph{receiver} and a set of informed \emph{senders}. In our model the senders receive information about the underlying state of nature and communicate their observation (either truthfully or not) to the receiver, which, based on this information, selects an action. Our goal is to design mechanisms maximizing the receiver's utility while incentivizing the senders to report truthfully their information. First, we provide an algorithm that efficiently computes an optimal \emph{incentive compatible} (IC) mechanism. Then, we focus on the \emph{online} problem in which the receiver sequentially interacts in an unknown game, with the objective of minimizing the \emph{cumulative regret} w.r.t. the optimal IC mechanism, and the \emph{cumulative violation} of the incentive compatibility constraints. We investigate two different online scenarios, \emph{i.e.,} the \emph{full} and \emph{bandit feedback} settings. For the full feedback problem, we propose an algorithm that guarantees $\tilde{\mathcal O}(\sqrt T)$ regret and violation, while for the bandit feedback setting we present an algorithm that attains $\tilde{\mathcal O}(T^{\alpha})$ regret and $\tilde{\mathcal O}(T^{1-\alpha/2})$ violation for any $\alpha\in[1/2, 1]$. Finally, we complement our results providing a tight lower bound.
翻译:我们研究面向\emph{信息获取}场景的机制设计问题。该设定建模了不知情的\emph{接收者}与一组拥有信息的\emph{发送者}之间的策略性交互。在我们的模型中,发送者获取关于自然状态的信息,并将其观察结果(如实或非如实)传递给接收者,接收者根据这些信息选择行动。我们的目标是设计能够最大化接收者效用,同时激励发送者如实报告信息的机制。首先,我们提出一种高效计算最优\emph{激励相容}(IC)机制的算法。随后,我们聚焦于\emph{在线}问题:接收者在未知博弈中顺序交互,目标是最小化相对于最优IC机制的\emph{累积遗憾}以及激励相容约束的\emph{累积违反量}。我们研究两种不同的在线场景,即\emph{完整反馈}和\emph{赌博机反馈}设定。针对完整反馈问题,我们提出一种保证遗憾和违反量均为$\tilde{\mathcal O}(\sqrt T)$的算法;对于赌博机反馈设定,我们给出一种算法,对任意$\alpha\in[1/2, 1]$,实现$\tilde{\mathcal O}(T^{\alpha})$的遗憾和$\tilde{\mathcal O}(T^{1-\alpha/2})$的违反量。最后,我们通过给出紧的下界来完善结论。