In an observational study, matching aims to create many small sets of similar treated and control units from initial samples that may differ substantially in order to permit more credible causal inferences. The problem of constructing matched sets may be formulated as an optimization problem, but it can be challenging to specify a single objective function that adequately captures all the design considerations at work. One solution, proposed by \citet{pimentel2019optimal} is to explore a family of matched designs that are Pareto optimal for multiple objective functions. We present an R package, \href{https://github.com/ShichaoHan/MultiObjMatch}{\texttt{MultiObjMatch}}, that implements this multi-objective matching strategy using a network flow algorithm for several common design goals: marginal balance on important covariates, size of the matched sample, and average within-pair multivariate distances. We demonstrate the package's flexibility in exploring user-defined tradeoffs of interest via two case studies, a reanalysis of the canonical National Supported Work dataset and a novel analysis of a clinical dataset to estimate the impact of diabetic kidney disease on hospitalization costs.
翻译:在观察性研究中,匹配旨在从初始可能存在显著差异的样本中,创建多个由相似处理组和对照组单元组成的小型集合,以便进行更可靠的因果推断。构建匹配集合的问题可表述为优化问题,但指定一个能充分涵盖所有设计考量的单一目标函数具有挑战性。\citet{pimentel2019optimal} 提出的一种解决方案是探索一系列对多个目标函数达到帕累托最优的匹配设计方案。本文介绍一个R软件包 \href{https://github.com/ShichaoHan/MultiObjMatch}{\texttt{MultiObjMatch}},该包采用网络流算法实现这种多目标匹配策略,以优化几个常见设计目标:重要协变量的边际平衡性、匹配样本的规模以及组内配对的多变量平均距离。我们通过两个案例研究展示了该包在探索用户定义的目标权衡关系方面的灵活性:一是对经典国家支持性工作数据集的再分析,二是对临床数据集的新分析,以评估糖尿病肾病对住院费用的影响。