A decision-maker (DM) repeatedly makes choices under uncertainty in a bandit environment, where only the realization of the chosen arm is observed. Another competing agent, the adviser (AD), repeatedly provides recommendations, but the realizations of these recommendations are unobserved unless they coincide with the DM's choice. Both agents possess partial information about the arms' realizations. The central question we focus on is whether, in the long run, an outside observer can identify which agent is more informed based solely on the observed decisions, recommendations, and arm realizations. A test selects one of the agents based on the observed data. We focus primarily on the class of scoring tests, which assign a numerical score to each observation and select the agent according to the average score. We study strategic agents whose objective is to be selected by the test. For simultaneous arm choices, we show that there exists a scoring test that successfully identifies the more-informed agent. For sequential arm choices, however, no such scoring test exists. Finally, we explore the tension between identifying the more-informed agent and maximizing welfare. A DM whose objective is to pass the test may not necessarily make welfare-maximizing decisions. In a binary-arm environment, we show that no scoring test can simultaneously identify the more informed agent and achieve more than half of the welfare attained by welfare-maximizing decisions.
翻译:决策者在赌徒环境中反复在不确定条件下做出选择,仅能观察到所选臂的反馈。另一位竞争主体——建议者——重复提供建议,但建议的反馈仅在建议与决策者选择一致时才可观察。双方主体均拥有关于各臂实现的部分信息。我们关注的核心问题是:从长期来看,外部观察者能否仅凭观察到的决策、建议和臂实现结果,识别出信息更充分的一方?测试将基于观测数据从中选择一位主体。我们主要关注评分测试类方法——该方法为每次观测分配数值化评分,并根据平均评分选择主体。我们研究以通过测试为目标策略性主体。针对同步选择臂的设定,我们证明存在能成功识别信息更充分主体的评分测试;但在顺序选择臂的设定下,此类评分测试并不存在。最后,我们探讨识别信息充分者与最大化社会福利之间的张力。以通过测试为目标决策者未必会做出福利最大化的决策。在二元臂环境中,我们证明不存在评分测试能同时实现识别信息充分者与达到福利最大化决策所获福利的半数以上。