Prediction intervals for overdispersed binomial endpoints and their application to historical control data

In toxicology, the validation of the concurrent control by historical control data (HCD) has become requirements. This validation is usually done by historical control limits (HCL) which in practice are often graphically displayed in a Sheward control chart like manner. In many applications, HCL are applied to dichotomous data, e.g. the number of rats with a tumor vs. the number of rats without a tumor (carcinogenicity studies) or the number of cells with a micronucleus out of a total number of cells. Dichotomous HCD may be overdispersed and can be heavily right- (or left-) skewed, which is usually not taken into account in the practical applications of HCL. To overcome this problem, four different prediction intervals (two frequentist, two Bayesian), that can be applied to such data, are proposed. Comprehensive Monte-Carlo simulations assessing the coverage probabilities of seven different methods for HCL calculation reveal, that frequentist bootstrap calibrated prediction intervals control the type-1-error best. Heuristics traditionally used in control charts (e.g. the limits in Sheward np-charts or the mean plus minus 2 SD) as well a the historical range fail to control a pre-specified coverage probability. The application of HCL is demonstrated based on a real life data set containing historical controls from long-term carcinogenicity studies run on behalf of the U.S. National Toxicology Program. The proposed frequentist prediction intervals are publicly available from the R package predint, whereas R code for the computation of the Bayesian prediction intervals is provided via GitHub.

翻译：在毒理学中，利用历史对照数据验证同期对照已成为一项必要要求。这种验证通常通过历史对照限值来完成，实践中常以类似Sheward控制图的方式图形化展示。在许多应用中，HCL被应用于二分类数据，例如：患肿瘤大鼠数量与未患肿瘤大鼠数量之比（致癌性研究），或具有微核的细胞数量占总细胞数量的比例。二分类历史对照数据可能存在过离散现象，且可能呈现严重的右偏（或左偏）分布，而这一特性在HCL的实际应用中通常未被考虑。为克服此问题，本文提出了四种适用于此类数据的预测区间（两种频率学派方法、两种贝叶斯方法）。通过综合蒙特卡洛模拟评估七种不同HCL计算方法的覆盖概率发现：频率学派的Bootstrap校准预测区间对第一类错误的控制效果最佳。传统控制图中使用的启发式方法（如Sheward np图中的限值或均值±2倍标准差）以及历史数据范围法，均无法控制预设的覆盖概率。本文基于美国国家毒理学计划开展的长期致癌性研究中的真实历史对照数据集，演示了HCL的实际应用。所提出的频率学派预测区间可通过R软件包predint公开获取，而贝叶斯预测区间的计算代码则通过GitHub平台提供。