Invariant coordinate selection (ICS) is an unsupervised multivariate data transformation useful in many contexts such as outlier detection or clustering. It is based on the simultaneous diagonalization of two affine equivariant and positive definite scatter matrices. Its classical implementation relies on a non-symmetric eigenvalue problem (EVP) by diagonalizing one scatter relatively to the other. In case of collinearity, at least one of the scatter matrices is singular and the problem cannot be solved. To address this limitation, three approaches are proposed based on: a Moore-Penrose pseudo inverse (GINV), a dimension reduction (DR), and a generalized singular value decomposition (GSVD). Their properties are investigated theoretically and in different empirical applications. Overall, the extension based on GSVD seems the most promising even if it restricts the choice of scatter matrices that can be expressed as cross-products. In practice, some of the approaches also look suitable in the context of data in high dimension low sample size (HDLSS).
翻译:不变坐标选择(ICS)是一种无监督多元数据变换方法,在异常值检测或聚类等众多场景中具有重要应用。该方法基于两个仿射等变正定散布矩阵的联合对角化实现。其经典实现依赖于非对称特征值问题(EVP),即通过一个散布矩阵相对于另一个进行对角化。当数据存在共线性时,至少一个散布矩阵会呈现奇异性,导致问题无法求解。为突破此限制,本文提出三种基于以下方法的解决方案:摩尔-彭罗斯伪逆(GINV)、降维处理(DR)以及广义奇异值分解(GSVD)。我们通过理论分析和多组实证应用对这些方法的性质进行了探究。总体而言,基于GSVD的扩展方案尽管限制了可表示为叉积形式的散布矩阵选择范围,但展现出最优的应用前景。在实际应用中,部分方法对于高维小样本(HDLSS)数据场景也表现出良好的适应性。