We study a repeated information design setting in which the receiver, who is also the decision-maker, updates beliefs in a systematically biased way. More specifically, a distorted posterior in our model can be written as a convex combination of the prior and the Bayesian posterior, governed by a fixed but unknown parameter. Over repeated interactions, the sender chooses persuasive signaling schemes, observes only the receiver's realized actions, and seeks to minimize regret relative to a full-information oracle that knows the receiver's biased updating rule. We propose a safe exploration algorithm for learning the receiver's bias while maintaining high persuasion value. The algorithm exploits the asymmetric cost of probing: conservative probes incur only local loss, whereas overly aggressive probes may lose the persuasive opportunity entirely. For general finite state and action spaces and arbitrary bounded utilities, our method achieves $O(\log\log T)$ regret. A matching $Ω(\log\log T)$ lower bound shows that this rate is optimal. We further discuss the influence on receiver welfare, as well as extensions to jointly unknown prior and bias, and contextual settings with time-varying priors and utilities.
翻译:我们研究了一种重复信息设计场景,其中接收者(同时也是决策者)以系统性偏差的方式更新信念。具体而言,我们模型中的扭曲后验可表示为先验与贝叶斯后验的凸组合,其受控于一个固定但未知的参数。在重复交互过程中,发送者选择具有说服力的信号方案,仅能观测到接收者的实际行动,并旨在最小化相对于知晓接收者偏差更新规则的全信息预言机的遗憾值。我们提出了一种安全探索算法,用于在学习接收者偏好的同时保持高说服价值。该算法利用探测的非对称成本:保守探测仅造成局部损失,而过度激进的探测则可能完全丧失说服机会。对于一般有限状态与动作空间及任意有界效用函数,本方法可实现 $O(\log\log T)$ 的遗憾值。匹配的下界 $\Omega(\log\log T)$ 表明该速率是最优的。我们进一步讨论了该方法对接收者福利的影响,以及扩展到先验与偏差均未知的情景,和具有时变先验与效用的上下文设置。