In autonomous robotic decision-making under uncertainty, the tradeoff between exploitation and exploration of available options must be considered. If secondary information associated with options can be utilized, such decision-making problems can often be formulated as contextual multi-armed bandits (CMABs). In this study, we apply active inference, which has been actively studied in the field of neuroscience in recent years, as an alternative action selection strategy for CMABs. Unlike conventional action selection strategies, it is possible to rigorously evaluate the uncertainty of each option when calculating the expected free energy (EFE) associated with the decision agent's probabilistic model, as derived from the free-energy principle. We specifically address the case where a categorical observation likelihood function is used, such that EFE values are analytically intractable. We introduce new approximation methods for computing the EFE based on variational and Laplace approximations. Extensive simulation study results demonstrate that, compared to other strategies, active inference generally requires far fewer iterations to identify optimal options and generally achieves superior cumulative regret, for relatively low extra computational cost.
翻译:在不确定性条件下的自主机器人决策中,必须考虑可用选项的利用与探索之间的权衡。若能利用与选项相关的辅助信息,此类决策问题常可建模为上下文多臂赌博机(contextual multi-armed bandits, CMABs)。本研究将近年来神经科学领域广泛研究的主动推断(active inference)作为CMABs的备选动作选择策略加以应用。与传统动作选择策略不同,主动推断在计算基于自由能原理导出的决策智能体概率模型对应的期望自由能(expected free energy, EFE)时,可严格评估每个选项的不确定性。我们特别针对分类观测似然函数(其EFE值无法解析计算)的情况进行探讨,并引入基于变分近似和拉普拉斯近似的EFE新型计算方法。大量仿真研究表明:与其他策略相比,主动推断通常在更少的迭代次数内即可识别最优选项,且能以相对较低的额外计算成本实现更优的累积遗憾值。