We study online Bayesian persuasion problems in which an informed sender repeatedly faces a receiver with the goal of influencing their behavior through the provision of payoff-relevant information. Previous works assume that the sender has knowledge about either the prior distribution over states of nature or receiver's utilities, or both. We relax such unrealistic assumptions by considering settings in which the sender does not know anything about the prior and the receiver. We design an algorithm that achieves sublinear regret with respect to an optimal signaling scheme, and we also provide a collection of lower bounds showing that the guarantees of such an algorithm are tight. Our algorithm works by searching a suitable space of signaling schemes in order to learn receiver's best responses. To do this, we leverage a non-standard representation of signaling schemes that allows to cleverly overcome the challenge of not knowing anything about the prior over states of nature and receiver's utilities. Finally, our results also allow to derive lower/upper bounds on the sample complexity of learning signaling schemes in a related Bayesian persuasion PAC-learning problem.
翻译:我们研究在线贝叶斯劝说问题,其中掌握信息的发送者反复面对接收者,目标是通过提供与收益相关的信息来影响其行为。先前的研究假设发送者了解自然状态的先验分布或接收者的效用函数,或两者兼知。我们通过考虑发送者对先验分布和接收者均一无所知的情境,放宽了这种不切实际的假设。我们设计了一种算法,该算法相对于最优信号方案实现了次线性遗憾,同时提供了一系列下界证明该算法的性能保证是紧致的。我们的算法通过搜索合适的信号方案空间来学习接收者的最优响应。为此,我们利用一种非标准的信号方案表示方法,巧妙地克服了对自然状态先验分布和接收者效用函数完全未知的挑战。最后,我们的结果还能推导出相关贝叶斯劝说PAC学习问题中学习信号方案所需样本复杂度的下界/上界。