We introduce the Coarse Payoff-Assessment Learning (CPAL) model, which captures reinforcement learning by boundedly rational decision-makers who focus on the aggregate outcomes of choosing among exogenously defined clusters of alternatives (similarity classes), rather than evaluating each alternative individually. Analyzing a smooth approximation of the model, we show that the learning dynamics exhibit steady-states corresponding to smooth Valuation Equilibria (Jehiel and Samet, 2007). We demonstrate the existence of multiple equilibria in decision trees with generic payoffs and establish the local asymptotic stability of pure equilibria when they occur. Conversely, when trivial choices featuring alternatives within the same similarity class yield sufficiently high payoffs, a unique mixed equilibrium emerges, characterized by indifferences between similarity classes, even under acute sensitivity to payoff differences. Finally, we prove that this unique mixed equilibrium is globally asymptotically stable under the CPAL dynamics.
翻译:本文提出了粗粒度收益评估学习(CPAL)模型,该模型刻画了有限理性决策者的强化学习行为:他们关注于从外生定义的备选方案簇(相似类)中进行选择所获得的总体结果,而非单独评估每个备选方案。通过分析模型的一个平滑近似,我们证明了学习动态展现出与平滑估值均衡(Jehiel and Samet, 2007)相对应的稳态。我们证明了在具有一般性收益的决策树中存在多重均衡,并确立了纯均衡出现时的局部渐近稳定性。反之,当同一相似类内的备选方案构成的平凡选择能产生足够高的收益时,会出现一个唯一的混合均衡,其特征是不同相似类之间的无差异,即使在对收益差异高度敏感的情况下也是如此。最后,我们证明了在CPAL动态下,这个唯一的混合均衡是全局渐近稳定的。