Statistical matching is an effective method for estimating causal effects in which treated units are paired with control units with ``similar'' values of confounding covariates prior to performing estimation. In this way, matching helps isolate the effect of treatment on response from effects due to the confounding covariates. While there are a large number of software packages to perform statistical matching, the algorithms and techniques used to solve statistical matching problems -- especially matching without replacement -- are not widely understood. In this paper, we describe in detail commonly-used algorithms and techniques for solving statistical matching problems. We focus in particular on the efficiency of these algorithms as the number of observations grow large. We advocate for the further development of statistical matching methods that impose and exploit ``sparsity'' -- by greatly restricting the available matches for a given treated unit -- as this may be critical to ensure scalability of matching methods as data sizes grow large.
翻译:统计匹配是一种估算因果效应的有效方法,在估算之前,将处理单元与具有“相似”混杂协变量值的对照单元进行配对。通过这种方式,匹配有助于从混杂协变量的效应中分离出处理对响应的效应。尽管存在大量用于执行统计匹配的软件包,但解决统计匹配问题的算法和技术——尤其是无放回匹配——并未得到广泛理解。在本文中,我们详细描述了解决统计匹配问题的常用算法和技术。我们特别关注这些算法在观测数量增长时的效率。我们倡导进一步发展通过极大限制给定处理单元的可用匹配来施加并利用“稀疏性”的统计匹配方法,因为这可能对于确保匹配方法在数据规模增大时的可扩展性至关重要。