Stratification in both the design and analysis of randomized clinical trials is common. Despite features in automated randomization systems to re-confirm the stratifying variables, incorrect values of these variables may be entered. These errors are often detected during subsequent data collection and verification. Questions remain about whether to use the mis-reported initial stratification or the corrected values in subsequent analyses. It is shown that the likelihood function resulting from the design of randomized clinical trials supports the use of the corrected values. New definitions are proposed that characterize misclassification errors as `ignorable' and `non-ignorable'. Ignorable errors may depend on the correct strata and any other modeled baseline covariates, but they are otherwise unrelated to potential treatment outcomes. Data management review suggests most misclassification errors are arbitrarily produced by distracted investigators, so they are ignorable or at most weakly dependent on measured and unmeasured baseline covariates. Ignorable misclassification errors may produce a small increase in standard errors, but other properties of the planned analyses are unchanged (e.g., unbiasedness, confidence interval coverage). It is shown that unbiased linear estimation in the absence of misclassification errors remains unbiased when there are non-ignorable misclassification errors, and the corresponding confidence intervals based on the corrected strata values are conservative.
翻译:随机临床试验的设计和分析中常采用分层方法。尽管自动化随机系统具备重新确认分层变量的功能,但这些变量的错误值仍可能被录入。此类错误通常在后续数据收集和验证过程中被发现。关于在后续分析中应使用最初报告的分层变量还是修正后的数值仍存疑问。研究表明,由随机临床试验设计产生的似然函数支持使用修正后的数值。本文提出了新的定义,将错误分类误差分为“可忽略”和“不可忽略”两类。可忽略误差可能依赖于正确的分层变量及其他建模后的基线协变量,但与潜在的治疗结果无关。数据管理审查表明,大多数错误分类是由调查人员分心造成的,因此它们属于可忽略误差,或最多与已测量及未测量的基线协变量存在弱相关。可忽略的错误分类误差可能导致标准误略有增大,但计划分析的其他性质(如无偏性、置信区间覆盖率)保持不变。研究表明,在无错误分类误差情况下的无偏线性估计,在存在不可忽略错误分类误差时仍保持无偏性,且基于修正后分层变量构建的置信区间具有保守性。