In combinatorial causal bandits (CCB), the learning agent chooses a subset of variables in each round to intervene and collects feedback from the observed variables to minimize expected regret or sample complexity. Previous works study this problem in both general causal models and binary generalized linear models (BGLMs). However, all of them require prior knowledge of causal graph structure. This paper studies the CCB problem without the graph structure on binary general causal models and BGLMs. We first provide an exponential lower bound of cumulative regrets for the CCB problem on general causal models. To overcome the exponentially large space of parameters, we then consider the CCB problem on BGLMs. We design a regret minimization algorithm for BGLMs even without the graph skeleton and show that it still achieves $O(\sqrt{T}\ln T)$ expected regret. This asymptotic regret is the same as the state-of-art algorithms relying on the graph structure. Moreover, we sacrifice the regret to $O(T^{\frac{2}{3}}\ln T)$ to remove the weight gap covered by the asymptotic notation. At last, we give some discussions and algorithms for pure exploration of the CCB problem without the graph structure.
翻译:在组合因果赌博机(CCB)中,学习智能体每轮选择干预变量子集,并从观测变量中收集反馈,以最小化期望遗憾或样本复杂度。以往研究在通用因果模型和二元广义线性模型(BGLMs)中探讨了该问题,但这些工作均需依赖因果图结构的先验知识。本文研究了无图结构条件下二元通用因果模型与BGLMs中的CCB问题。首先,我们给出了通用因果模型中CCB问题累积遗憾的指数级下界。为克服参数空间的指数级规模,进而针对BGLMs中的CCB问题展开研究。我们设计了无需图骨架的BGLMs遗憾最小化算法,并证明其仍能实现$O(\sqrt{T}\ln T)$的期望遗憾,该渐近遗憾率与依赖图结构的最优算法一致。此外,通过将遗憾牺牲至$O(T^{\frac{2}{3}}\ln T)$,我们消去了渐近符号中覆盖的权重间隙。最后,我们给出无图结构CCB问题纯探索的相关讨论与算法。