Semisupervised score based matching algorithm to evaluate the effect of public health interventions

Multivariate matching algorithms "pair" similar study units in an observational study to remove potential bias and confounding effects caused by the absence of randomizations. In one-to-one multivariate matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks, and therefore, to best match the pairs, such a matching algorithm with efficiency and comparatively limited auxiliary matching knowledge provided through a "training" set of paired units by domain experts, is practically intriguing. We proposed a novel one-to-one matching algorithm based on a quadratic score function $S_{\beta}(x_i,x_j)= \beta^T (x_i-x_j)(x_i-x_j)^T \beta$. The weights $\beta$, which can be interpreted as a variable importance measure, are designed to minimize the score difference between paired training units while maximizing the score difference between unpaired training units. Further, in the typical but intricate case where the training set is much smaller than the unpaired set, we propose a \underline{s}emisupervised \underline{c}ompanion \underline{o}ne-\underline{t}o-\underline{o}ne \underline{m}atching \underline{a}lgorithm (SCOTOMA) that makes the best use of the unpaired units. The proposed weight estimator is proved to be consistent when the truth matching criterion is indeed the quadratic score function. When the model assumptions are violated, we demonstrate that the proposed algorithm still outperforms some popular competing matching algorithms through a series of simulations. We applied the proposed algorithm to a real-world study to investigate the effect of in-person schooling on community Covid-19 transmission rate for policy making purpose.

翻译：多变量匹配算法通过将观察性研究中的相似研究单元“配对”，消除因缺乏随机化而产生的潜在偏倚和混杂效应。在一对一多变量匹配算法中，大量待匹配“对象”可能同时意味着大样本信息与繁重任务，因此，如何通过领域专家提供的配对单元“训练”集，在有限辅助匹配知识条件下实现高效且最优匹配的算法具有重要实践意义。我们提出了一种基于二次评分函数 $S_{\beta}(x_i,x_j)= \beta^T (x_i-x_j)(x_i-x_j)^T \beta$ 的新型一对一匹配算法。可解释为变量重要性度量指标的权重 $\beta$，旨在最小化配对训练单元的评分差异，同时最大化非配对训练单元的评分差异。进一步，针对训练集规模远小于非配对集的典型复杂情况，我们提出了一种半监督伴随一对一匹配算法（SCOTOMA），该算法充分利用非配对单元的潜在信息。当真实匹配准则确实符合二次评分函数时，所提出的权重估计量被证明具有一致性。当模型假设不成立时，通过系列仿真实验证明，所提出的算法仍优于部分主流竞争匹配算法。我们将该算法应用于真实世界研究，以考察线下教学对社区新冠传播率的影响，为政策制定提供依据。