Kernel methods are applied to many problems in pattern recognition, including subspace clustering (SC). That way, nonlinear problems in the input data space become linear in mapped high-dimensional feature space. Thereby, computationally tractable nonlinear algorithms are enabled through implicit mapping by the virtue of kernel trick. However, kernelization of linear algorithms is possible only if square of the Froebenious norm of the error term is used in related optimization problem. That, however, implies normal distribution of the error. That is not appropriate for non-Gaussian errors such as gross sparse corruptions that are modeled by -norm. Herein, to the best of our knowledge, we propose for the first time robust kernel sparse SC (RKSSC) algorithm for data with gross sparse corruptions. The concept, in principle, can be applied to other SC algorithms to achieve robustness to the presence of such type of corruption. We validated proposed approach on two well-known datasets with linear robust SSC algorithm as a baseline model. According to Wilcoxon test, clustering performance obtained by the RKSSC algorithm is statistically significantly better than corresponding performance obtained by the robust SSC algorithm. MATLAB code of proposed RKSSC algorithm is posted on https://github.com/ikopriva/RKSSC.
翻译:核方法被应用于模式识别中的许多问题,包括子空间聚类。这样,输入数据空间中的非线性问题在高维映射特征空间中变为线性问题。通过核技巧的隐式映射,使得计算上可行的非线性算法得以实现。然而,线性算法的核化只有在相关优化问题中使用误差项的Frobenius范数平方时才可行。但这意味着误差服从正态分布,这对于非高斯误差(如由ℓ1范数建模的粗大稀疏污染)并不适用。本文首次提出了针对粗大稀疏污染数据的鲁棒核稀疏子空间聚类算法。该概念原则上可应用于其他子空间聚类算法,以增强对这类污染存在的鲁棒性。我们以线性鲁棒稀疏子空间聚类算法为基线模型,在两个知名数据集上验证了所提方法。根据Wilcoxon检验,RKSSC算法获得的聚类性能在统计学上显著优于鲁棒稀疏子空间聚类算法。所提出的RKSSC算法的MATLAB代码已发布在https://github.com/ikopriva/RKSSC上。