Stratification in both the design and analysis of randomized clinical trials is common. Despite features in automated randomization systems to re-confirm the stratifying variables, incorrect values of these variables may be entered. These errors are often detected during subsequent data collection and verification. Questions remain about whether to use the mis-reported initial stratification or the corrected values in subsequent analyses. It is shown that the likelihood function resulting from the design of randomized clinical trials supports the use of the corrected values. New definitions are proposed that characterize misclassification errors as `ignorable' and `non-ignorable'. Ignorable errors may depend on the correct strata and any other modeled baseline covariates, but they are otherwise unrelated to potential treatment outcomes. Data management review suggests most misclassification errors are arbitrarily produced by distracted investigators, so they are ignorable or at most weakly dependent on measured and unmeasured baseline covariates. Ignorable misclassification errors may produce a small increase in standard errors, but other properties of the planned analyses are unchanged (e.g., unbiasedness, confidence interval coverage). It is shown that unbiased linear estimation in the absence of misclassification errors remains unbiased when there are non-ignorable misclassification errors, and the corresponding confidence intervals based on the corrected strata values are conservative.
翻译:在随机临床试验的设计与分析中,分层方法普遍应用。尽管自动化随机化系统设有确认分层变量的功能,但这些变量的错误值仍可能被录入。此类误差通常在后期的数据收集与核查阶段被发现。在后续分析中应使用初始报告的分层数据还是修正值,这一问题尚存争议。研究表明,基于随机临床试验设计得出的似然函数支持使用修正值。本文提出了新的分类定义,将误分类误差划分为"可忽略型"与"不可忽略型"。可忽略型误差可能依赖于正确分层及任何其他建模后的基线协变量,但与潜在治疗结果无关。数据管理审查显示,大多数误分类误差由研究者分心所致,属于随机性误差,因此归为可忽略型,或至多与测量和未测量的基线协变量呈弱相关。可忽略型误分类误差可能使标准误小幅增加,但计划分析的其他特性(如无偏性、置信区间覆盖率)保持不变。研究表明,当存在不可忽略型误分类误差时,无误分类误差条件下的无偏线性估计仍保持无偏性,且基于修正分层值构建的相应置信区间具有保守性。