How to Select Covariates for Imputation-Based Regression Calibration Method -- A Causal Perspective

In this paper, we identify the criteria for the selection of the minimal and most efficient covariate adjustment sets for the regression calibration method developed by Carroll, Rupert and Stefanski (CRS, 1992), used to correct bias due to continuous exposure measurement error. We utilize directed acyclic graphs to illustrate how subject matter knowledge can aid in the selection of such adjustment sets. Valid measurement error correction requires the collection of data on any (1) common causes of true exposure and outcome and (2) common causes of measurement error and outcome, in both the main study and validation study. For the CRS regression calibration method to be valid, researchers need to minimally adjust for covariate set (1) in both the measurement error model (MEM) and the outcome model and adjust for covariate set (2) at least in the MEM. In practice, we recommend including the minimal covariate adjustment set in both the MEM and the outcome model. In contrast with the regression calibration method developed by Rosner, Spiegelman and Willet, it is valid and more efficient to adjust for correlates of the true exposure or of measurement error that are not risk factors in the MEM only under CRS method. We applied the proposed covariate selection approach to the Health Professional Follow-up Study, examining the effect of fiber intake on cardiovascular incidence. In this study, we demonstrated potential issues with a data-driven approach to building the MEM that is agnostic to the structural assumptions. We extend the originally proposed estimators to settings where effect modification by a covariate is allowed. Finally, we caution against the use of the regression calibration method to calibrate the true nutrition intake using biomarkers.

翻译：本文针对Carroll、Rupert和Stefanski（CRS，1992）提出的回归校准方法，确定了用于校正连续暴露测量误差的最小且最有效的协变量调整集的选择标准。我们利用有向无环图阐明领域知识如何辅助选择此类调整集。有效的测量误差校正需要收集以下两类数据：（1）真实暴露与结局的共同原因；（2）测量误差与结局的共同原因，这些数据需同时来自主研究和验证研究。为使CRS回归校准方法有效，研究者需在测量误差模型（MEM）和结局模型中至少调整协变量集（1），并在MEM中至少调整协变量集（2）。实践中，我们建议将最小协变量调整集同时纳入MEM和结局模型。与Rosner、Spiegelman和Willett提出的回归校准方法相比，在CRS方法下，仅在MEM中调整真实暴露或测量误差的相关因素（即使这些因素并非风险因素）是有效且更高效的。我们将提出的协变量选择方法应用于卫生专业人员随访研究，检验纤维摄入对心血管疾病发生率的影响。在该研究中，我们展示了数据驱动方法在构建与结构假设无关的MEM时可能存在的问题。我们将原始估计量扩展至允许协变量存在效应修饰的场景。最后，我们警示使用回归校准方法通过生物标志物校准真实营养摄入量的做法。