Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.
翻译:策略学习是许多现实学习系统中的重要组成部分。策略学习面临的一个主要挑战是如何高效适应未见过的环境或任务。近期,利用不变条件分布来学习能够更好泛化到未知环境中的模型这一方法被提出。然而,假设整个条件分布具有不变性(我们称之为完全不变性)在实践中可能过于严苛。本文引入了一种完全不变性的放宽形式——效应不变性(简称e-不变性),并证明在适当假设下,该性质足以实现零样本策略泛化。我们还讨论了当测试环境中有少量样本时如何利用e-不变性进行扩展,从而实现少样本策略泛化。我们的研究不依赖潜在因果图或数据由结构因果模型生成,而是开发了直接从数据中检验e-不变性的测试流程。我们通过模拟数据集和移动健康干预数据集上的实证结果,证明了该方法的效果。