Markov Persuasion Processes with Endogenous Agent Beliefs

We consider a dynamic Bayesian persuasion setting where a single long-lived sender persuades a stream of ``short-lived'' agents (receivers) by sharing information about a payoff-relevant state. The state transitions are Markovian and the sender seeks to maximize the long-run average reward by committing to a (possibly history-dependent) signaling mechanism. While most previous studies of Markov persuasion consider exogenous agent beliefs that are independent of the chain, we study a more natural variant with endogenous agent beliefs that depend on the chain's realized history. A key challenge to analyze such settings is to model the agents' partial knowledge about the history information. We analyze a Markov persuasion process (MPP) under various information models that differ in the amount of information the receivers have about the history of the process. Specifically, we formulate a general partial-information model where each receiver observes the history with an $\ell$ period lag. Our technical contribution start with analyzing two benchmark models, i.e., the full-history information model and the no-history information model. We establish an ordering of the sender's payoff as a function of the informativeness of agent's information model (with no-history as the least informative), and develop efficient algorithms to compute optimal solutions for these two benchmarks. For general $\ell$, we present the technical challenges in finding an optimal signaling mechanism, where even determining the right dependency on the history becomes difficult. To bypass the difficulties, we use a robustness framework to design a "simple" \emph{history-independent} signaling mechanism that approximately achieves optimal payoff when $\ell$ is reasonably large.

翻译：本文考虑一个动态贝叶斯说服场景，其中单个长期存在的说服者通过与一系列“短视”主体（接收者）共享关于收益相关状态的信息来说服他们。状态转移服从马尔可夫过程，说服者通过承诺（可能依赖于历史的）信号机制来最大化长期平均收益。虽然以往大多数马尔可夫说服研究假设外部给定的主体信念（独立于马尔可夫链），但本文研究了更自然的变体——主体信念内生化，依赖于链的实际历史。分析此类设置的关键挑战在于建模主体对历史信息的不完全认知。我们分析了不同信息模型下的马尔可夫说服过程（MPP），这些模型的差异在于接收者对过程历史信息的掌握程度。具体而言，我们建立了一个通用部分信息模型，其中每个接收者观察到滞后ℓ期历史。本文的技术贡献始于分析两个基准模型：全历史信息模型与无历史信息模型。我们论证了说服者收益随主体信息模型信息量（以无历史模型为信息量最小）变化的序关系，并开发了高效算法来计算这两个基准模型的最优解。对于一般ℓ的情况，我们揭示了寻找最优信号机制的技术挑战——甚至确定对历史的正确依赖关系也变得困难。为规避这些困难，我们利用鲁棒性框架设计了一种“简单”的**历史无关**信号机制，当ℓ足够大时，该机制能近似实现最优收益。