A first proposal of a sparse and cellwise robust PCA method is presented. Robustness to single outlying cells in the data matrix is achieved by substituting the squared loss function for the approximation error by a robust version. The integration of a sparsity-inducing $L_1$ or elastic net penalty offers additional modeling flexibility. For the resulting challenging optimization problem, an algorithm based on Riemannian stochastic gradient descent is developed, with the advantage of being scalable to high-dimensional data, both in terms of many variables as well as observations. The resulting method is called SCRAMBLE (Sparse Cellwise Robust Algorithm for Manifold-based Learning and Estimation). Simulations reveal the superiority of this approach in comparison to established methods, both in the casewise and cellwise robustness paradigms. Two applications from the field of tribology underline the advantages of a cellwise robust and sparse PCA method.
翻译:本文首次提出了一种稀疏且单元鲁棒的PCA方法。通过用鲁棒版本替代近似误差的平方损失函数,实现了对数据矩阵中单个离群单元的鲁棒性。引入诱导稀疏性的$L_1$或弹性网络惩罚项提供了额外的建模灵活性。针对由此产生的复杂优化问题,开发了一种基于黎曼随机梯度下降的算法,其优势在于能够扩展到高维数据(包括多变量和大样本量场景)。该方法被命名为SCRAMBLE(基于流形学习与估计的稀疏单元鲁棒算法)。仿真实验表明,在个案鲁棒与单元鲁棒两种范式下,该方法均优于现有方法。来自摩擦学领域的两个应用实例进一步印证了单元鲁棒稀疏PCA方法的优势。