If the probability distribution model aims to approximate the hidden mother distribution, it is imperative to establish a useful criterion for the resemblance between the mother and the model distributions. This study proposes a criterion that measures the Hellinger distance between discretized (quantized) samples from both distributions. Unlike information criteria such as AIC, this criterion does not require the probability density function of the model distribution, which cannot be explicitly obtained for a complicated model such as a deep learning machine. Second, it can draw a positive conclusion (i.e., both distributions are sufficiently close) under a given threshold, whereas a statistical hypothesis test, such as the Kolmogorov-Smirnov test, cannot genuinely lead to a positive conclusion when the hypothesis is accepted. In this study, we establish a reasonable threshold for the criterion deduced from the Bayes error rate and also present the asymptotic bias of the estimator of the criterion. From these results, a reasonable and easy-to-use criterion is established that can be directly calculated from the two sets of samples from both distributions.
翻译:若概率分布模型旨在逼近未知的母分布,则必须建立衡量母分布与模型分布相似性的有效准则。本研究提出一种基于离散化(量化)样本的Hellinger距离的评判准则。与AIC等信息准则不同,该准则无需获取模型分布的概率密度函数——对于深度学习机器等复杂模型,此类函数无法显式获得。其次,该准则可在给定阈值下得出正向结论(即两种分布足够接近),而诸如Kolmogorov-Smirnov检验等统计假设检验在假设被接受时,本质上无法得出正向结论。本研究基于贝叶斯错误率推导出该准则的合理阈值,并给出了准则估计量的渐近偏差。基于上述结果,建立了一个可直接通过两类样本计算、合理且易于使用的评判准则。