Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.
翻译:多臂老虎机(MAB)与因果多臂老虎机(CMAB)是解决决策问题的成熟框架。以往的研究通常针对特定问题及其关联数据,孤立地分析和求解单个MAB或CMAB。然而,决策者常面临多个关联问题及多尺度观测场景,此时需要构建联合框架以高效利用问题结构与数据依赖性。针对CMAB的迁移学习解决了模型定义在相同变量上但因果关系可能不同的情况。本研究将迁移学习拓展至涉及CMAB的场景,其中CMAB可能定义在不同变量上,具有不同的粒度层级,并通过抽象映射相互关联。我们正式引入因果抽象多臂老虎机(CAMAB)问题,依托因果抽象理论给出严格的抽象映射定义。随后提出在CAMAB中学习的算法并分析其遗憾值,通过在线广告的真实案例阐明我们算法的局限性与优势。