Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.
翻译:熵最大化和自由能最小化是描述多种物理系统动力学过程的通用物理原理。典型应用包括:利用自由能原理建模大脑决策过程、通过信息瓶颈原则优化隐变量存取中的准确率-复杂度权衡(Tishby等, 2000)、以及运用信息最大化的随机环境导航方法(Vergassola等, 2007)。基于该原理,我们提出一类新型赌博机算法,通过最大化系统关键变量信息的近似值进行决策。为此,我们构建了基于物理分析的熵近似表达式,用于预测每个动作的信息增益,并贪婪地选择信息增益最大的动作。该方法在经典赌博机设置中展现出优异性能。基于其实验成功,我们证明了该方法在高斯奖励的双臂赌博机问题中的渐近最优性。由于该方法能将系统特性纳入全局物理泛函框架,可高效适配更复杂的赌博机场景,这为探索多臂赌博机问题的信息最大化方法开辟了新方向。