Instrumental variable (IV) methods are used to estimate causal effects in settings with unobserved confounding, where we cannot directly experiment on the treatment variable. Instruments are variables which only affect the outcome indirectly via the treatment variable(s). Most IV applications focus on low-dimensional treatments and crucially require at least as many instruments as treatments. This assumption is restrictive: in the natural sciences we often seek to infer causal effects of high-dimensional treatments (e.g., the effect of gene expressions or microbiota on health and disease), but can only run few experiments with a limited number of instruments (e.g., drugs or antibiotics). In such underspecified problems, the full treatment effect is not identifiable in a single experiment even in the linear case. We show that one can still reliably recover the projection of the treatment effect onto the instrumented subspace and develop techniques to consistently combine such partial estimates from different sets of instruments. We then leverage our combined estimators in an algorithm that iteratively proposes the most informative instruments at each round of experimentation to maximize the overall information about the full causal effect.
翻译:工具变量(IV)方法用于在存在未观测混杂因素时估计因果效应,此时我们无法直接对处理变量进行实验。工具变量是指仅通过处理变量间接影响结果的变量。大多数IV应用关注低维处理变量,且关键前提是工具变量数量至少与处理变量数量相当。这一假设具有局限性:在自然科学领域,我们常需推断高维处理变量的因果效应(例如基因表达或微生物组对健康与疾病的影响),但只能进行少量实验并使用有限数量的工具变量(如药物或抗生素)。在此类欠指定问题中,即使在线性情形下,单次实验也无法识别完整的处理效应。我们证明,仍可可靠地恢复处理效应在工具变量子空间上的投影,并开发出将不同工具变量集合的部分估计进行一致合并的技术。随后,我们利用所提出的合并估计器构建算法,在每轮实验中迭代推荐最具信息量的工具变量,以最大化关于完整因果效应的总体信息量。