Instrumental variable (IV) methods are used to estimate causal effects in settings with unobserved confounding, where we cannot directly experiment on the treatment variable. Instruments are variables which only affect the outcome indirectly via the treatment variable(s). Most IV applications focus on low-dimensional treatments and crucially require at least as many instruments as treatments. This assumption is restrictive: in the natural sciences we often seek to infer causal effects of high-dimensional treatments (e.g., the effect of gene expressions or microbiota on health and disease), but can only run few experiments with a limited number of instruments (e.g., drugs or antibiotics). In such underspecified problems, the full treatment effect is not identifiable in a single experiment even in the linear case. We show that one can still reliably recover the projection of the treatment effect onto the instrumented subspace and develop techniques to consistently combine such partial estimates from different sets of instruments. We then leverage our combined estimators in an algorithm that iteratively proposes the most informative instruments at each round of experimentation to maximize the overall information about the full causal effect.
翻译:工具变量方法用于在存在未观测混杂因素、无法直接对处理变量进行实验的设定中估计因果效应。工具变量是仅通过处理变量间接影响结果的变量。大多数工具变量应用集中于低维处理变量,且关键要求工具变量数量至少与处理变量数量相等。这一假设具有局限性:在自然科学中,我们常需推断高维处理变量的因果效应(例如基因表达或微生物组对健康与疾病的影响),但仅能开展少数实验,使用数量有限的工具变量(如药物或抗生素)。在此类欠指定问题中,即便是线性情形,单一实验也无法识别完整的处理效应。我们证明,仍可稳健地恢复处理效应在工具变量子空间上的投影,并开发技术以一致地组合来自不同工具变量集的偏效应估计。进而,我们将组合估计量应用于算法中,该算法在每轮实验中迭代地提出最具信息量的工具变量,以最大化关于完整因果效应的整体信息。