In the past decade, various exact balancing-based weighting methods were introduced to the causal inference literature. Exact balancing alleviates the extreme weight and model misspecification issues that may incur when one implements inverse probability weighting. It eliminates covariate imbalance by imposing balancing constraints in an optimization problem. The optimization problem can nevertheless be infeasible when there is bad overlap between the covariate distributions in the treated and control groups or when the covariates are high-dimensional. Recently, approximate balancing was proposed as an alternative balancing framework, which resolves the feasibility issue by using inequality moment constraints instead. However, it can be difficult to select the threshold parameters when the number of constraints is large. Moreover, moment constraints may not fully capture the discrepancy of covariate distributions. In this paper, we propose Mahalanobis balancing, which approximately balances covariate distributions from a multivariate perspective. We use a quadratic constraint to control overall imbalance with a single threshold parameter, which can be tuned by a simple selection procedure. We show that the dual problem of Mahalanobis balancing is an l_2 norm-based regularized regression problem, and establish interesting connection to propensity score models. We further generalize Mahalanobis balancing to the high-dimensional scenario. We derive asymptotic properties and make extensive comparisons with existing balancing methods in the numerical studies.
翻译:过去十年间,多种基于精确平衡的加权方法被引入因果推断文献中。精确平衡方法缓解了逆概率加权可能导致的极端权重与模型设定错误问题,通过优化问题中的平衡约束消除协变量不平衡。然而,当处理组与对照组的协变量分布存在严重重叠不足或协变量维度较高时,该优化问题可能不可行。近期提出的近似平衡框架采用不等式矩约束替代精确约束,解决了可行性问题,但在约束数量庞大时难以选取阈值参数,且矩约束可能无法完整捕捉协变量分布的差异。本文提出马氏距离平衡方法,从多元视角实现协变量分布的近似平衡。通过单一阈值参数的二次约束控制整体不平衡,该参数可通过简单选择过程进行调优。我们证明马氏距离平衡的对偶问题等价于基于l2范数的正则化回归问题,并建立其与倾向性得分模型的有趣关联。进一步将马氏距离平衡推广至高维场景,推导渐近性质,并在数值研究中与现有平衡方法进行广泛对比。