A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and $O(N^{-1/2})$ suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
翻译:在数据丰富的环境中学习决策策略时,一个常见问题是离线数据集中的虚假相关性,这可能是由隐藏的混杂因素引起的。工具变量回归利用一种称为工具的关键无混杂变量,是学习混杂行为、结果和上下文变量之间因果关系的一种标准技术。大多数最新的工具变量回归算法采用两阶段方法:在第一阶段学习到的深度神经网络估计器被直接插入第二阶段,在该阶段中,使用另一个深度神经网络来估计因果效应。简单地将估计器插入可能导致第二阶段出现严重偏差,尤其是当第一阶段估计器存在正则化偏差时。本文提出了DML-IV,一种非线性工具变量回归方法,可减少两阶段工具变量回归中的偏差,并有效学习高性能策略。我们推导了一个新颖的学习目标以减少偏差,并按照双重/去偏机器学习框架设计了DML-IV算法。学习到的DML-IV估计器具有强大的收敛速率和与数据集无混杂时相匹配的$O(N^{-1/2})$次优性保证。在工具变量回归基准测试中,DML-IV优于最先进的工具变量回归方法,并在存在工具的情况下学习高性能策略。