Observation-Augmented Contextual Multi-Armed Bandits for Robotic Exploration with Uncertain Semantic Data

For robotic decision-making under uncertainty, the balance between exploitation and exploration of available options must be carefully taken into account. In this study, we introduce a new variant of contextual multi-armed bandits called observation-augmented CMABs (OA-CMABs) wherein a decision-making agent can utilize extra outcome observations from an external information source. CMABs model the expected option outcomes as a function of context features and hidden parameters, which are inferred from previous option outcomes. In OA-CMABs, external observations are also a function of context features and thus provide additional evidence about the hidden parameters. Yet, if an external information source is error-prone, the resulting posterior updates can harm decision-making performance unless the presence of errors is considered. To this end, we propose a robust Bayesian inference process for OA-CMABs that is based on the concept of probabilistic data validation. Our approach handles complex mixture model parameter priors and hybrid observation likelihoods for semantic data sources, allowing us to develop validation algorithms based on recently develop probabilistic semantic data association techniques. Furthermore, to more effectively cope with the combined sources of uncertainty in OA-CMABs, we derive a new active inference algorithm for option selection based on expected free energy minimization. This generalizes previous work on active inference for bandit-based robotic decision-making by accounting for faulty observations and non-Gaussian inference. Our approaches are demonstrated on a simulated asynchronous search site selection problem for space exploration. The results show that even if incorrect observations are provided by external information sources, efficient decision-making and robust parameter inference are still achieved in a wide variety of experimental conditions.

翻译：在不确定性条件下的机器人决策过程中，需谨慎权衡对可用选项的利用与探索。本研究提出一种新型上下文多臂赌博机变体——观测增强型上下文多臂赌博机（OA-CMABs），该框架允许决策主体利用外部信息源提供的额外观测结果。传统CMABs将期望选项结果建模为上下文特征与隐参数（基于历史选项结果推断）的函数。在OA-CMABs中，外部观测同样构成上下文特征的函数，从而为隐参数提供额外证据。然而，若外部信息源存在误差风险，忽略该误差的后验更新将损害决策性能。为此，我们提出基于概率数据验证概念的鲁棒贝叶斯推断方法。该方法能够处理复杂混合模型参数先验与语义数据源的混合观测似然函数，并基于最新发展的概率语义数据关联技术构建验证算法。此外，为更有效应对OA-CMABs中的多重不确定性来源，我们推导出基于预期自由能最小化的新主动推断算法用于选项选择。该算法通过引入故障观测与非高斯推断机制，推广了此前基于赌博机的机器人主动推断决策研究。通过在模拟太空探索异步搜索站点选择问题上的实验表明：即使外部信息源提供错误观测，本方法仍能在广泛实验条件下实现高效决策与鲁棒参数推断。