Asymptotic theory for M-estimation problems usually focuses on the asymptotic convergence of the sample descriptor, defined as the minimizer of the sample loss function. Here, we explore a related question and formulate asymptotic theory for the minimum value of sample loss, the M-variance. Since the loss function value is always a real number, the asymptotic theory for the M-variance is comparatively simple. M-variance often satisfies a standard central limit theorem, even in situations where the asymptotics of the descriptor is more complicated as for example in case of smeariness, or if no asymptotic distribution can be given as can be the case if the descriptor space is a general metric space. We use the asymptotic results for the M-variance to formulate a hypothesis test to systematically determine for a given sample whether the underlying population loss function may have multiple global minima. We discuss three applications of our test to data, each of which presents a typical scenario in which non-uniqueness of descriptors may occur. These model scenarios are the mean on a non-euclidean space, non-linear regression and Gaussian mixture clustering.
翻译:M-估计问题的渐近理论通常关注样本描述符的渐近收敛性,该描述符定义为样本损失函数的最小化点。本文探讨一个相关问题,针对样本损失的最小值(即M-方差)建立渐近理论。由于损失函数值始终为实数,M-方差的渐近理论相对简单。即使在描述符渐近性质更为复杂的情况下(例如存在拖尾现象),或当描述符空间为一般度量空间而无法给出渐近分布时,M-方差仍常满足标准中心极限定理。我们利用M-方差的渐近结果构建假设检验,以系统判定给定样本对应的总体损失函数是否可能具有多个全局最小值。本文讨论了该检验在三个数据场景中的应用,每个场景均呈现描述符非唯一性可能出现的典型模式:非欧几里得空间上的均值估计、非线性回归及高斯混合聚类。