Causal knowledge can be used to support decision-making problems. This has been recognized in the causal bandits literature, where a causal (multi-armed) bandit is characterized by a causal graphical model and a target variable. The arms are then interventions on the causal model, and rewards are samples of the target variable. Causal bandits were originally studied with a focus on hard interventions. We focus instead on cases where the arms are conditional interventions, which more accurately model many real-world decision-making problems by allowing the value of the intervened variable to be chosen based on the observed values of other variables. This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention, which maximizes the expected reward. We then propose an efficient algorithm with a time complexity of $O(|V| + |E|)$ to identify this minimal set of nodes. We prove that the graphical characterization and the proposed algorithm are correct. Finally, we empirically demonstrate that our algorithm significantly prunes the search space and substantially accelerates convergence rates when integrated into standard multi-armed bandit algorithms.
翻译:因果知识可用于支持决策问题。这一点已在因果赌博机文献中得到认可,其中因果(多臂)赌博机通过因果图模型和目标变量来刻画。臂即为对因果模型的干预,而奖励则是目标变量的样本。因果赌博机最初的研究侧重于硬干预。我们则关注臂为条件干预的情形,这通过允许干预变量的值根据其他变量的观测值来选择,从而更准确地建模许多现实世界的决策问题。本文提出了保证包含最优条件干预(即最大化期望奖励的干预)的最小节点集的图刻画。随后,我们提出了一种时间复杂度为 $O(|V| + |E|)$ 的高效算法来识别此最小节点集。我们证明了该图刻画及所提算法的正确性。最后,我们通过实验证明,当我们的算法被集成到标准多臂赌博机算法中时,能显著剪枝搜索空间并大幅提升收敛速度。