Matching on a low dimensional vector of scalar covariates consists of constructing groups of individuals in which each individual in a group is within a pre-specified distance from an individual in another group. However, matching in high dimensional spaces is more challenging because the distance can be sensitive to implementation details, caliper width, and measurement error of observations. To partially address these problems, we propose to use extensive sensitivity analyses and identify the main sources of variation and bias. We illustrate these concepts by examining the racial disparity in all-cause mortality in the US using the National Health and Nutrition Examination Survey (NHANES 2003-2006). In particular, we match African Americans to Caucasian Americans on age, gender, BMI and objectively measured physical activity (PA). PA is measured every minute using accelerometers for up to seven days and then transformed into an empirical distribution of all of the minute-level observations. The Wasserstein metric is used as the measure of distance between these participant-specific distributions.
翻译:低维标量协变量向量上的匹配,包括构建个体组,使得组内每个个体与另一组中的某个个体之间的距离在预先指定的范围之内。然而,高维空间中的匹配更具挑战性,因为距离可能对实现细节、卡尺宽度以及观测的测量误差敏感。为部分解决这些问题,我们建议使用广泛的敏感性分析,并识别变异和偏差的主要来源。我们通过使用美国国家健康与营养调查(NHANES 2003-2006)数据,检验美国全因死亡率中的种族差异,来阐述这些概念。具体而言,我们在年龄、性别、BMI和客观测量的体力活动(PA)上,将非裔美国人与白种美国人进行匹配。PA通过加速度计每分钟测量,持续最多七天,然后转换为所有分钟级观测的经验分布。Wasserstein距离被用作这些参与者特定分布之间距离的度量。