The author's recent research papers, "Cumulative deviation of a subpopulation from the full population" and "A graphical method of cumulative differences between two subpopulations" (both published in volume 8 of Springer's open-access "Journal of Big Data" during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics and methods can measure the calibration of probabilistic predictions and can assess differences in responses between a subpopulation and the full population while controlling for a covariate or score via conditioning on it. These recently published papers construct significance tests based on the scalar summary statistics, but only sketch how to calibrate the attained significance levels (also known as "P-values") for the tests. The present article reviews and synthesizes work spanning many decades in order to detail how to calibrate the P-values. The present paper presents computationally efficient, easily implemented numerical methods for evaluating properly calibrated P-values, together with rigorous mathematical proofs guaranteeing their accuracy, and illustrates and validates the methods with open-source software and numerical examples.
翻译:作者近期研究论文《子群体相对于总体的累积偏差》与《两个子群体间累积差异的图形化方法》(均于2021年发表于施普林格开放获取期刊《大数据杂志》第8卷)提出了图形化方法与汇总统计量,但未对正式显著性检验进行广泛校准。这些汇总指标与方法既能衡量概率预测的校准程度,也可通过控制协变量或得分(通过条件化处理)评估子群体与总体之间的响应差异。上述近期论文基于标量汇总统计量构建了显著性检验,但仅简要概述了如何校准检验中得到的显著性水平(即“P值”)。本文对跨越数十年的相关工作进行回顾与综合,旨在详细阐述P值的校准方法。本文提出了一种计算高效、易于实现的数值方法,用于评估经适当校准的P值,并辅以严格的数学证明保证其准确性,同时通过开源软件与数值示例对所提方法进行了验证与说明。