A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and $O(N^{-1/2})$ suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
翻译:在数据丰富的环境中学习决策策略时,一个常见问题是离线数据集中的伪相关性,这可能是由隐藏混淆变量引起的。工具变量回归利用一种关键的无混杂变量(即工具变量),是学习受混杂影响的行为、结果和上下文变量之间因果关系的标准技术。大多数最新的工具变量回归算法采用两阶段方法,其中第一阶段学习的深度神经网络估计器直接代入第二阶段,在第二阶段使用另一个深度神经网络来估计因果效应。直接代入估计器可能导致第二阶段产生严重偏差,尤其是当第一阶段估计器存在正则化偏差时。我们提出DML-IV,一种非线性工具变量回归方法,能够减少两阶段工具变量回归中的偏差,并有效学习高性能策略。我们推导了新的学习目标以减少偏差,并遵循双重/去偏机器学习框架设计了DML-IV算法。学习到的DML-IV估计器具有强收敛速率和$O(N^{-1/2})$次优性保证,与数据集无混杂时的保证相当。在工具变量回归基准测试中,DML-IV优于最先进的工具变量回归方法,并在存在工具变量的情况下学习到高性能策略。