A simple way of obtaining robust estimates of the "center" (or the "location") and of the "scatter" of a dataset is to use the maximum likelihood estimate with a class of heavy-tailed distributions, regardless of the "true" distribution generating the data. We observe that the maximum likelihood problem for the Cauchy distributions, which have particularly heavy tails, is geodesically convex and therefore efficiently solvable (Cauchy distributions are parametrized by the upper half plane, i.e. by the hyperbolic plane). Moreover, it has an appealing geometrical meaning: the datapoints, living on the boundary of the hyperbolic plane, are attracting the parameter by unit forces, and we search the point where these forces are in equilibrium. This picture generalizes to several classes of multivariate distributions with heavy tails, including, in particular, the multivariate Cauchy distributions. The hyperbolic plane gets replaced by symmetric spaces of noncompact type. Geodesic convexity gives us an efficient numerical solution of the maximum likelihood problem for these distribution classes. This can then be used for robust estimates of location and spread, thanks to the heavy tails of these distributions.
翻译:获取数据集"中心"(或"位置")与"散布"稳健估计的简单方法,是采用一类重尾分布的最大似然估计,而无需考虑生成数据的"真实"分布。我们观察到,具有特别重尾特征的柯西分布的最大似然问题是测地凸的,因此可高效求解(柯西分布由上半平面即双曲平面参数化)。此外,该问题具有引人入胜的几何意义:位于双曲平面边界上的数据点,通过单位力吸引参数点,而我们寻找这些力达到平衡的点。这一图景可推广至多类多元重尾分布,特别是多元柯西分布。此时双曲平面被替换为非紧致对称空间。测地凸性为这些分布类的最大似然问题提供了高效的数值解法。得益于这些分布的重尾特性,该方法可用于位置与散布的稳健估计。