Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.
翻译:策略学习是许多真实世界学习系统的重要组成部分。策略学习面临的一个主要挑战是如何高效地适应未见过的环境或任务。近年来,有研究提出利用不变条件分布来学习能更好地泛化到未见环境的模型。然而,在现实应用中,假设整个条件分布具有不变性(我们称之为完全不变性)可能过于严格。本文提出了完全不变性的一种松弛形式——效应不变性(简称e-不变性),并证明在适当假设下,该性质足以实现零样本策略泛化。我们还讨论了一种扩展方法,当测试环境中有少量样本时,可利用e-不变性实现少样本策略泛化。我们的工作不依赖潜在的因果图或数据生成的结构因果模型,而是直接基于数据开发检验e-不变性的测试流程。我们使用模拟数据集和移动健康干预数据集进行了实验,结果证明了该方法的有效性。