In Causal Bayesian Optimization (CBO), an agent intervenes on an unknown structural causal model to maximize a downstream reward variable. In this paper, we consider the generalization where other agents or external events also intervene on the system, which is key for enabling adaptiveness to non-stationarities such as weather changes, market forces, or adversaries. We formalize this generalization of CBO as Adversarial Causal Bayesian Optimization (ACBO) and introduce the first algorithm for ACBO with bounded regret: Causal Bayesian Optimization with Multiplicative Weights (CBO-MW). Our approach combines a classical online learning strategy with causal modeling of the rewards. To achieve this, it computes optimistic counterfactual reward estimates by propagating uncertainty through the causal graph. We derive regret bounds for CBO-MW that naturally depend on graph-related quantities. We further propose a scalable implementation for the case of combinatorial interventions and submodular rewards. Empirically, CBO-MW outperforms non-causal and non-adversarial Bayesian optimization methods on synthetic environments and environments based on real-word data. Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system and reposition vehicles in strategic areas.
翻译:在因果贝叶斯优化(CBO)中,智能体对未知的结构化因果模型进行干预,以最大化下游奖励变量。本文考虑其他智能体或外部事件也对该系统进行干预的泛化情形,这对于实现对天气变化、市场力量或对抗性行为等非平稳性的适应性至关重要。我们将这种CBO泛化形式化为对抗性因果贝叶斯优化(ACBO),并首次提出具有有界遗憾的ACBO算法:基于乘法权重的因果贝叶斯优化(CBO-MW)。该方法将经典在线学习策略与奖励的因果建模相结合,通过因果图传播不确定性来计算乐观反事实奖励估计。我们推导了CBO-MW的遗憾界,该界限自然依赖于图相关量。进一步针对组合干预和子模奖励情形提出可扩展实现。实验表明,CBO-MW在合成环境及基于真实数据的环境中均优于非因果和非对抗性贝叶斯优化方法。我们的实验包括一个实际演示:展示CBO-MW如何用于学习共享出行系统中用户的需求模式,并在战略区域重新部署车辆。