Prediction intervals for overdispersed Poisson data and their application in medical and pre-clinical quality control

In pre-clinical and medical quality control, it is of interest to assess the stability of the process under monitoring or to validate a current observation using historical control data. Classically, this is done by the application of historical control limits (HCL) graphically displayed in control charts. In many applications, HCL are applied to count data, e.g. the number of revertant colonies (Ames assay) or the number of relapses per multiple sclerosis patient. Count data may be overdispersed, can be heavily right-skewed and clusters may differ in cluster size or other baseline quantities (e.g. number of petri dishes per control group or different length of monitoring times per patient). Based on the quasi-Poisson assumption or the negative-binomial distribution, we propose prediction intervals for overdispersed count data to be used as HCL. Variable baseline quantities are accounted for by offsets. Furthermore, we provide a bootstrap calibration algorithm that accounts for the skewed distribution and achieves equal tail probabilities. Comprehensive Monte-Carlo simulations assessing the coverage probabilities of eight different methods for HCL calculation reveal, that the bootstrap calibrated prediction intervals control the type-1-error best. Heuristics traditionally used in control charts (e.g. the limits in Sheward c- or u-charts or the mean plus minus 2 SD) fail to control a pre-specified coverage probability. The application of HCL is demonstrated based on data from the Ames assay and for numbers of relapses of multiple sclerosis patients. The proposed prediction intervals and the algorithm for bootstrap calibration are publicly available via the R package predint.

翻译：在临床前与医学质量控制中，评估监测过程的稳定性或利用历史对照数据验证当前观测值具有重要意义。经典方法通过图形化显示在控制图中的历史控制限（HCL）实现。在许多应用中，HCL适用于计数数据，例如（艾姆斯试验中的）回复菌落数或多发性硬化症患者的复发次数。计数数据可能存在过离散现象、呈严重右偏分布，且不同集群在集群规模或其他基线参数（如每对照组培养皿数量或每位患者监测时长）上存在差异。基于准泊松假设或负二项分布，我们提出了适用于过离散计数数据的预测区间作为HCL。通过偏移量处理可变的基线参数，并进一步提出了能校正偏态分布、实现等尾概率的Bootstrap校准算法。针对八种不同HCL计算方法覆盖概率的综合蒙特卡洛模拟表明，经Bootstrap校准的预测区间对第一类错误控制最优。控制图中传统使用的启发式方法（如Sheward c-图或u-图中的控制限，或均值±2倍标准差）无法控制预设的覆盖概率。基于艾姆斯试验数据及多发性硬化症患者复发次数的实例展示了HCL的应用。所提出的预测区间与Bootstrap校准算法可通过R包predint公开获取。