Large-scale administrative or observational datasets are increasingly used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arise as that selection bias and other forms of distribution shift often plague observational data. Previous attempts to provide robust inferences have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of shifts which are possible. To leverage such information, we proposed a framework that enables statistical inference in the presence of distribution shifts which obey user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value an estimand takes on the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds, and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks.
翻译:大规模行政或观测数据集正越来越多地被用于决策制定。虽然这一努力旨在将政策建立在真实世界证据的基础上,但挑战也随之而来,因为选择偏差和其他形式的分布偏移常常困扰着观测数据。以往提供稳健推断的尝试已给出依赖于用户指定可能分布偏移量(例如观测分布与目标分布之间的最大KL散度)的保证。然而,决策者通常拥有关于目标分布的额外知识,这约束了可能发生的偏移类型。为利用此类信息,我们提出了一种框架,该框架能够在服从用户指定约束(即目标分布下期望已知的函数形式)的分布偏移存在时进行统计推断。输出结果是目标分布上估计量取值的高概率边界。因此,我们的方法利用领域知识来部分识别广泛类别的估计量。我们分析了估算这些边界的计算与统计性质,并证明该方法能够在多种模拟和半合成任务中生成信息性边界。