Markov Persuasion Processes with Endogenous Agent Beliefs

We consider a dynamic Bayesian persuasion setting where a single long-lived sender persuades a stream of ``short-lived'' agents (receivers) by sharing information about a payoff-relevant state. The state transitions are Markovian and the sender seeks to maximize the long-run average reward by committing to a (possibly history-dependent) signaling mechanism. While most previous studies of Markov persuasion consider exogenous agent beliefs that are independent of the chain, we study a more natural variant with endogenous agent beliefs that depend on the chain's realized history. A key challenge to analyze such settings is to model the agents' partial knowledge about the history information. We analyze a Markov persuasion process (MPP) under various information models that differ in the amount of information the receivers have about the history of the process. Specifically, we formulate a general partial-information model where each receiver observes the history with an $\ell$ period lag. Our technical contribution start with analyzing two benchmark models, i.e., the full-history information model and the no-history information model. We establish an ordering of the sender's payoff as a function of the informativeness of agent's information model (with no-history as the least informative), and develop efficient algorithms to compute optimal solutions for these two benchmarks. For general $\ell$, we present the technical challenges in finding an optimal signaling mechanism, where even determining the right dependency on the history becomes difficult. To bypass the difficulties, we use a robustness framework to design a "simple" \emph{history-independent} signaling mechanism that approximately achieves optimal payoff when $\ell$ is reasonably large.

翻译：我们考虑一个动态贝叶斯说服场景：单个长期存在的发送者通过分享与收益相关状态的信息，来说服一系列“短视”的Agent（接收者）。状态转移是马尔可夫过程，发送者承诺采用（可能依赖历史的）信号机制，以最大化长期平均收益。尽管先前大多数关于马尔可夫说服的研究假设Agent信念是外生的且独立于马尔可夫链，但本研究探讨了一个更自然的变体，其中Agent信念是内生的，依赖于链的已实现历史。分析此类场景的关键挑战在于建模Agent对历史信息的部分了解。我们基于接收者对过程历史拥有不同信息量的信息模型，分析了马尔可夫说服过程（MPP）。具体而言，我们提出了一种通用部分信息模型，其中每个接收者观察到存在$\ell$期滞后的历史信息。我们的技术贡献始于分析两个基准模型，即全历史信息模型和无历史信息模型。我们建立了发送者收益随Agent信息模型信息量（无历史模型为信息量最低模型）排序的关系，并开发了求解这两个基准模型最优解的高效算法。对于一般$\ell$值，我们揭示了寻找最优信号机制的技术挑战——即便确定信号对历史的正确依赖关系也变得困难。为克服这些困难，我们采用鲁棒性框架设计了一种“简单”的**历史无关**信号机制，当$\ell$足够大时可近似实现最优收益。