We discuss recently developed methods that quantify the stability and generalizability of statistical findings under distributional changes. In many practical problems, the data is not drawn i.i.d. from the target population. For example, unobserved sampling bias, batch effects, or unknown associations might inflate the variance compared to i.i.d. sampling. For reliable statistical inference, it is thus necessary to account for these types of variation. We discuss and review two methods that allow quantifying distribution stability based on a single dataset. The first method computes the sensitivity of a parameter under worst-case distributional perturbations to understand which types of shift pose a threat to external validity. The second method treats distributional shifts as random which allows assessing average robustness (instead of worst-case). Based on a stability analysis of multiple estimators on a single dataset, it integrates both sampling and distributional uncertainty into a single confidence interval.
翻译:我们探讨了近期开发的方法,这些方法能够量化统计结果在分布变化下的稳定性与可泛化性。在许多实际问题中,数据并非从目标总体中独立同分布抽取。例如,未观测到的采样偏差、批次效应或未知关联可能导致方差相较于独立同分布采样有所增大。为实现可靠的统计推断,必须考虑这些类型的变异。我们讨论并综述了两种基于单数据集量化分布稳定性的方法。第一种方法计算参数在最坏情况分布扰动下的敏感性,以理解何种偏移类型会对外部有效性构成威胁。第二种方法将分布偏移视为随机变量,从而能够评估平均鲁棒性(而非最坏情况)。基于单数据集中多个估计量的稳定性分析,该方法将采样不确定性与分布不确定性整合至单一置信区间中。