We study the causal bandit problem that entails identifying a near-optimal intervention from a specified set $A$ of (possibly non-atomic) interventions over a given causal graph. Here, an optimal intervention in ${A}$ is one that maximizes the expected value for a designated reward variable in the graph, and we use the standard notion of simple regret to quantify near optimality. Considering Bernoulli random variables and for causal graphs on $N$ vertices with constant in-degree, prior work has achieved a worst case guarantee of $\widetilde{O} (N/\sqrt{T})$ for simple regret. The current work utilizes the idea of covering interventions (which are not necessarily contained within ${A}$) and establishes a simple regret guarantee of $\widetilde{O}(\sqrt{N/T})$. Notably, and in contrast to prior work, our simple regret bound depends only on explicit parameters of the problem instance. We also go beyond prior work and achieve a simple regret guarantee for causal graphs with unobserved variables. Further, we perform experiments to show improvements over baselines in this setting.
翻译:我们研究因果老虎机问题,该问题涉及从给定因果图上的指定干预集合$A$(可能为非原子干预)中识别出接近最优的干预。此处,${A}$中的最优干预是最大化图中指定奖励变量期望值的干预,我们使用简单遗憾的标准概念来量化接近最优性。考虑伯努利随机变量,对于入度恒定的$N$顶点因果图,先前工作在简单遗憾上实现了$\widetilde{O} (N/\sqrt{T})$的最坏情况保证。当前工作利用覆盖干预(不一定包含在${A}$内)的思想,建立了$\widetilde{O}(\sqrt{N/T})$的简单遗憾保证。值得注意的是,与先前工作不同,我们的简单遗憾界仅依赖于问题实例的显式参数。我们还超越了先前工作,为具有未观测变量的因果图实现了简单遗憾保证。此外,我们通过实验展示了在此设置下相对于基线的改进。