A simple way of obtaining robust estimates of the "center" (or the "location") and of the "scatter" of a dataset is to use the maximum likelihood estimate with a class of heavy-tailed distributions, regardless of the "true" distribution generating the data. We observe that the maximum likelihood problem for the Cauchy distributions, which have particularly heavy tails, is geodesically convex and therefore efficiently solvable (Cauchy distributions are parametrized by the upper half plane, i.e. by the hyperbolic plane). Moreover, it has an appealing geometrical meaning: the datapoints, living on the boundary of the hyperbolic plane, are attracting the parameter by unit forces, and we search the point where these forces are in equilibrium. This picture generalizes to several classes of multivariate distributions with heavy tails, including, in particular, the multivariate Cauchy distributions. The hyperbolic plane gets replaced by symmetric spaces of noncompact type. Geodesic convexity gives us an efficient numerical solution of the maximum likelihood problem for these distribution classes. This can then be used for robust estimates of location and spread, thanks to the heavy tails of these distributions.
翻译:获取数据集“中心”(或“位置”)与“散布”的稳健估计的一种简单方法是使用一类重尾分布的最大似然估计,无论生成数据的“真实”分布为何。我们观察到,具有特别重尾的柯西分布的最大似然问题是测地凸的,因此可高效求解(柯西分布由上半个平面参数化,即双曲平面)。此外,该问题具有引人入胜的几何意义:位于双曲平面边界上的数据点以单位力吸引参数,而我们寻求这些力平衡的点。这一图景可推广至多类具有重尾的多变量分布,尤其包括多变量柯西分布。双曲平面被非紧致类型的对称空间所替代。测地凸性使我们能够针对这些分布类别高效数值求解最大似然问题。借助这些分布的重尾特性,该结果可用于位置与离散度的稳健估计。