We introduce a class of regularized M-estimators of multivariate scatter and show, analogous to the popular spatial sign covariance matrix (SSCM), that they possess high breakdown points. We also show that the SSCM can be viewed as an extreme member of this class. Unlike the SSCM, this class of estimators takes into account the shape of the contours of the data cloud when down-weighing observations. We also propose a median based cross validation criterion for selecting the tuning parameter for this class of regularized M-estimators. This cross validation criterion helps assure the resulting tuned scatter estimator is a good fit to the data as well as having a high breakdown point. A motivation for this new median based criterion is that when it is optimized over all possible scatter parameters, rather than only over the tuned candidates, it results in a new high breakdown point affine equivariant multivariate scatter statistic.
翻译:我们提出了一类正则化的多元散布M-估计量,并证明其与流行的空间符号协方差矩阵(SSCM)类似,具有高崩溃点。我们同时指出,SSCM可视为该类估计量的极端特例。与SSCM不同的是,该类估计量在降低观测值权重时会考虑数据云轮廓的形状。我们还提出了一种基于中位数的交叉验证准则,用于选择此类正则化M-估计量的调参参数。该交叉验证准则有助于确保得到的调谐散布估计量既能良好拟合数据,又能保持高崩溃点。提出这种新型中位数准则的动机在于:若将其在所有可能的散布参数上(而非仅限调谐候选参数)进行优化,将得到一种具有高崩溃点的新仿射等变多元散布统计量。