We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. Such a problem is modeled as a Stackelberg game between the principal and the agent, where the principal announces a scoring rule that specifies the payment, and then the agent then chooses an effort level that maximizes her own profit and reports the information. We study the online setting of such a problem from the principal's perspective, i.e., designing the optimal scoring rule by repeatedly interacting with the strategic agent. We design a provably sample efficient algorithm that tailors the UCB algorithm (Auer et al., 2002) to our model, which achieves a sublinear $T^{2/3}$-regret after $T$ iterations. Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized. Furthermore, a key feature of our regret bound is that it is independent of the number of states of the environment.
翻译:我们研究激励性信息获取问题,其中委托人雇佣代理人代表其收集信息。该问题被建模为委托人与代理人之间的Stackelberg博弈:委托人宣布指定支付的评分规则,代理人随后选择最大化自身利润的努力水平并报告信息。我们从委托人视角研究该问题的在线设置,即通过重复与策略性代理人交互来设计最优评分规则。我们设计了一个理论上样本高效的算法,将UCB算法(Auer等,2002)定制至我们的模型,该算法在T次迭代后达到次线性的$T^{2/3}$-遗憾值。该算法的特点包括对委托人最优利润的精细估计过程,以及确保预期代理人行动被激励的保守修正方案。此外,我们的遗憾界的一个关键特征是其与环境状态数量无关。