This paper considers causal bandits (CBs) for the sequential design of interventions in a causal system. The objective is to optimize a reward function via minimizing a measure of cumulative regret with respect to the best sequence of interventions in hindsight. The paper advances the results on CBs in three directions. First, the structural causal models (SCMs) are assumed to be unknown and drawn arbitrarily from a general class $\mathcal{F}$ of Lipschitz-continuous functions. Existing results are often focused on (generalized) linear SCMs. Second, the interventions are assumed to be generalized soft with any desired level of granularity, resulting in an infinite number of possible interventions. The existing literature, in contrast, generally adopts atomic and hard interventions. Third, we provide general upper and lower bounds on regret. The upper bounds subsume (and improve) known bounds for special cases. The lower bounds are generally hitherto unknown. These bounds are characterized as functions of the (i) graph parameters, (ii) eluder dimension of the space of SCMs, denoted by $\operatorname{dim}(\mathcal{F})$, and (iii) the covering number of the function space, denoted by ${\rm cn}(\mathcal{F})$. Specifically, the cumulative achievable regret over horizon $T$ is $\mathcal{O}(K d^{L-1}\sqrt{T\operatorname{dim}(\mathcal{F}) \log({\rm cn}(\mathcal{F}))})$, where $K$ is related to the Lipschitz constants, $d$ is the graph's maximum in-degree, and $L$ is the length of the longest causal path. The upper bound is further refined for special classes of SCMs (neural network, polynomial, and linear), and their corresponding lower bounds are provided.
翻译:本文研究因果Bandit(CBs)在因果系统中序贯干预设计中的应用,目标是通过最小化与事后最优干预序列相关的累积遗憾度量来优化奖励函数。本文从三个方向推进了因果Bandit的研究进展:第一,假设结构因果模型(SCMs)未知且任意取自Lipschitz连续函数的一般类$\mathcal{F}$,而现有研究通常聚焦于(广义)线性SCMs;第二,假设干预为具有任意粒度水平的广义软干预,导致无穷多种可能的干预方式,而现有文献通常采用原子硬干预;第三,我们提出遗憾的通用上界与下界,其中上界涵盖(并改进)特例已知结果,下界则通常前所未有。这些界由以下函数表征:(i)图参数、(ii)SCMs空间的eluder维数$\operatorname{dim}(\mathcal{F})$,以及(iii)函数空间的覆盖数${\rm cn}(\mathcal{F})$。具体而言,在时间范围$T$内的累积可达到遗憾为$\mathcal{O}(K d^{L-1}\sqrt{T\operatorname{dim}(\mathcal{F}) \log({\rm cn}(\mathcal{F}))})$,其中$K$与Lipschitz常数相关,$d$为图的最大入度,$L$为最长因果路径长度。该上界进一步针对特殊SCMs类别(神经网络、多项式及线性模型)进行细化,并提供相应的下界。