In combinatorial causal bandits (CCB), the learning agent chooses a subset of variables in each round to intervene and collects feedback from the observed variables to minimize expected regret or sample complexity. Previous works study this problem in both general causal models and binary generalized linear models (BGLMs). However, all of them require prior knowledge of causal graph structure or unrealistic assumptions. This paper studies the CCB problem without the graph structure on binary general causal models and BGLMs. We first provide an exponential lower bound of cumulative regrets for the CCB problem on general causal models. To overcome the exponentially large space of parameters, we then consider the CCB problem on BGLMs. We design a regret minimization algorithm for BGLMs even without the graph skeleton and show that it still achieves $O(\sqrt{T}\ln T)$ expected regret, as long as the causal graph satisfies a weight gap assumption. This asymptotic regret is the same as the state-of-art algorithms relying on the graph structure. Moreover, we propose another algorithm with $O(T^{\frac{2}{3}}\ln T)$ regret to remove the weight gap assumption.
翻译:在组合因果赌博机(CCB)中,学习代理每轮选择一个变量子集进行干预,并从观测变量收集反馈,以最小化期望遗憾或样本复杂度。先前的研究在一般因果模型和二元广义线性模型(BGLM)中探讨了此问题。然而,所有工作均要求已知因果图结构或依赖不现实的假设。本文研究了在二元一般因果模型和BGLM上无需图结构的CCB问题。我们首先为一般因果模型上的CCB问题提供了累积遗憾的指数下界。为克服参数空间的指数级增长,我们随后考虑BGLM上的CCB问题。我们设计了一种即使没有图骨架也适用于BGLM的遗憾最小化算法,并证明只要因果图满足权重间隙假设,该算法仍能达到$O(\sqrt{T}\ln T)$的期望遗憾。此渐近遗憾与依赖图结构的最先进算法相同。此外,我们提出了另一种具有$O(T^{\frac{2}{3}}\ln T)$遗憾的算法,以消除权重间隙假设。