Classical Bayesian persuasion studies how a sender influences receivers through carefully designed signaling policies within a single strategic interaction. In many real-world environments, such interactions are repeated across multiple games, creating opportunities to exploit structural similarity across tasks. In this work, we introduce Meta-Persuasion algorithms, establishing the first line of theoretical results for both full-feedback and bandit-feedback settings in the Online Bayesian Persuasion (OBP) and Markov Persuasion Process (MPP) frameworks. We show that our proposed meta-persuasion algorithms achieve provably sharper regret rates under natural notions of task similarity, improving upon the best-known convergence rates for both OBP and MPP. At the same time, they recover the standard single-game guarantees when the sequence of games is picked arbitrarily. Finally, we complement our theoretical analysis with numerical experiments that highlight our regret improvements and the benefits of meta-learning in repeated persuasion environments.
翻译:经典贝叶斯说服研究了一个发送者如何通过精心设计的信号策略在单一策略互动中影响接收者。在许多现实环境中,此类互动在多轮博弈中重复进行,从而创造了利用任务间结构相似性的机会。在本工作中,我们引入了元说服算法,首次在在线贝叶斯说服和马尔可夫说服过程框架下,针对全反馈和赌博机反馈两种设置建立了理论结果。我们证明了所提出的元说服算法在任务相似性的自然定义下能够实现可证明的更优遗憾率,改进了OBP和MPP领域已知的最佳收敛速度。同时,当博弈序列被任意选取时,这些算法仍能恢复标准单次博弈的保证。最后,我们通过数值实验补充了理论分析,突显了在重复说服环境中元学习带来的遗憾改进与优势。