A statistical model is said to be calibrated if the resulting mean estimates perfectly match the true means of the underlying responses. Aiming for calibration is often not achievable in practice as one has to deal with finite samples of noisy observations. A weaker notion of calibration is auto-calibration. An auto-calibrated model satisfies that the expected value of the responses for a given mean estimate matches this estimate. Testing for autocalibration has only been considered recently in the literature and we propose a new approach based on calibration bands. Calibration bands denote a set of lower and upper bounds such that the probability that the true means lie simultaneously inside those bounds exceeds some given confidence level. Such bands were constructed by Yang-Barber (2019) for sub-Gaussian distributions. Dimitriadis et al. (2023) then introduced narrower bands for the Bernoulli distribution. We use the same idea in order to extend the construction to the entire exponential dispersion family that contains for example the binomial, Poisson, negative binomial, gamma and normal distributions. Moreover, we show that the obtained calibration bands allow us to construct various tests for calibration and auto-calibration, respectively. As the construction of the bands does not rely on asymptotic results, we emphasize that our tests can be used for any sample size.
翻译:若统计模型得到的均值估计与潜在响应的真实均值完全匹配,则称该模型已校准。实践中由于必须处理有限噪声观测样本,实现完全校准通常不可行。自校准是较弱的校准概念:自校准模型要求给定均值估计时响应的期望值与该估计相匹配。文献中关于自校准的检验方法近期才被提出,本文基于校准带提出新方法。校准带指一组上下界,使得真实均值同时落在此范围内的概率超过给定置信水平。Yang与Barber(2019)为次高斯分布构建了此类校准带,Dimitriadis等人(2023)随后为伯努利分布提出了更窄的校准带。我们运用相同思想将构建方法扩展至整个指数族,该族包含二项分布、泊松分布、负二项分布、伽马分布和正态分布等。此外,我们证明所得校准带可用于构建多种校准检验与自校准检验。由于校准带构建不依赖渐近结果,本文检验方法适用于任意样本量。