Transforming a random variable to improve its normality leads to a followup test for whether the transformed variable follows a normal distribution. Previous work has shown that the Anderson Darling test for normality suffers from resubstitution bias following Box-Cox transformation, and indicates normality much too often. The work reported here extends this by adding the Shapiro-Wilk statistic and the two-parameter Box Cox transformation, all of which show severe bias. We also develop a recalibration to correct the bias in all four settings. The methodology was motivated by finding reference ranges in biomarker studies where parametric analysis, possibly on a power-transformed measurand, can be much more informative than nonparametric. It is illustrated with a data set on biomarkers.
翻译:对随机变量进行变换以改善其正态性后,需进一步检验变换后的变量是否服从正态分布。已有研究表明,在Box-Cox变换后使用Anderson-Darling检验进行正态性检验会因重复代入偏差而过于频繁地判定数据服从正态分布。本研究通过引入Shapiro-Wilk统计量与双参数Box-Cox变换拓展了该结论,证明所有方法均存在严重偏差。我们进一步开发了适用于四种场景的偏差校正重校准方法。该方法源于生物标志物研究中参考区间的确定问题——相较于非参数方法,基于幂变换测量值的参数化分析能提供更丰富的信息。本文通过生物标志物数据集对该方法进行了实证说明。