How to Select Covariates for Imputation-Based Regression Calibration Method -- A Causal Perspective

In this paper, we identify the criteria for the selection of the minimal and most efficient covariate adjustment sets for the regression calibration method developed by Carroll, Rupert and Stefanski (CRS, 1992), used to correct bias due to continuous exposure measurement error. We utilize directed acyclic graphs to illustrate how subject matter knowledge can aid in the selection of such adjustment sets. Valid measurement error correction requires the collection of data on any (1) common causes of true exposure and outcome and (2) common causes of measurement error and outcome, in both the main study and validation study. For the CRS regression calibration method to be valid, researchers need to minimally adjust for covariate set (1) in both the measurement error model (MEM) and the outcome model and adjust for covariate set (2) at least in the MEM. In practice, we recommend including the minimal covariate adjustment set in both the MEM and the outcome model. In contrast with the regression calibration method developed by Rosner, Spiegelman and Willet, it is valid and more efficient to adjust for correlates of the true exposure or of measurement error that are not risk factors in the MEM only under CRS method. We applied the proposed covariate selection approach to the Health Professional Follow-up Study, examining the effect of fiber intake on cardiovascular incidence. In this study, we demonstrated potential issues with a data-driven approach to building the MEM that is agnostic to the structural assumptions. We extend the originally proposed estimators to settings where effect modification by a covariate is allowed. Finally, we caution against the use of the regression calibration method to calibrate the true nutrition intake using biomarkers.

翻译：本文确定了卡罗尔、鲁珀特和斯特凡斯基（CRS，1992）提出的回归校准方法中，用于校正连续暴露测量误差所需的最小且最有效的协变量调整集的选择标准。我们利用有向无环图说明如何借助学科知识辅助选择此类调整集。有效的测量误差校正要求在主要研究和验证研究中收集以下数据：（1）真实暴露与结果的任何共同原因；（2）测量误差与结果的任何共同原因。为使CRS回归校准方法有效，研究人员需在测量误差模型（MEM）和结果模型中至少调整协变量集（1），并在MEM中至少调整协变量集（2）。实践中，我们建议在MEM和结果模型中均纳入最小协变量调整集。与罗斯纳、施皮格尔曼和威利特提出的回归校准方法相比，在CRS方法下，仅调整与真实暴露或测量误差相关但非风险因素的协变量，既有效又更高效。我们将所提出的协变量选择方法应用于健康专业人员随访研究，检验纤维摄入对心血管发病率的影响。该研究中，我们揭示了采用数据驱动方法构建与结构假设无关的MEM可能引发的问题。我们将原始估计量扩展到允许协变量效应修饰的情形。最后，我们提醒避免使用回归校准方法通过生物标志物校准真实营养摄入量。