We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we characterize the optimal success probability and mutual information over time. Our findings reveal distinct growth phases in mutual information -- initially linear, transitioning to quadratic, and finally returning to linear -- highlighting curious behavioral differences between interactive and non-interactive environments. In particular, we show that optimal success probability and mutual information can be decoupled, where achieving optimal learning does not necessarily require maximizing information gain. These findings shed new light on the intricate interplay between information and learning in interactive decision making.
翻译:本文通过随机多臂老虎机问题的视角,研究交互式决策中信息的演化过程。聚焦于一个基础案例——其中唯一的最优臂以固定优势超越其他臂,我们刻画了随时间变化的最优成功概率与互信息。研究发现互信息呈现明显的增长阶段:初始线性增长,过渡到二次增长,最终回归线性增长,这凸显了交互式与非交互式环境之间有趣的行为差异。特别地,我们证明最优成功概率与互信息可以解耦,即实现最优学习并不必然要求最大化信息增益。这些发现为交互式决策中信息与学习之间复杂的相互作用提供了新的见解。