A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and $O(N^{-1/2})$ suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
翻译:在数据丰富的环境中学习决策策略时,一个常见问题是离线数据集中存在的伪相关性,这通常由隐藏的混杂因子引起。工具变量回归是一种利用被称为工具的关键无混杂变量来学习混杂的行动、结果与上下文变量间因果关系的标准技术。大多数最新的工具变量回归算法采用两阶段方法:第一阶段学习得到的深度神经网络估计器被直接代入第二阶段,在此阶段使用另一个深度神经网络来估计因果效应。简单地代入估计器可能导致第二阶段产生严重偏差,尤其是在第一阶段估计器存在正则化偏差时。我们提出了DML-IV,一种非线性工具变量回归方法,旨在减少两阶段工具变量回归中的偏差并有效学习高性能策略。我们推导了一个新的学习目标以减少偏差,并遵循双/去偏机器学习框架设计了DML-IV算法。学习得到的DML-IV估计器具有强收敛速率和$O(N^{-1/2})$次优性保证,这些保证与数据集无混杂时的性能相匹配。在工具变量回归基准测试中,DML-IV超越了最先进的工具变量回归方法,并能在存在工具变量的情况下学习到高性能策略。