This paper studies the computational and statistical aspects of quantile and pseudo-Huber tensor decomposition. The integrated investigation of computational and statistical issues of robust tensor decomposition poses challenges due to the non-smooth loss functions. We propose a projected sub-gradient descent algorithm for tensor decomposition, equipped with either the pseudo-Huber loss or the quantile loss. In the presence of both heavy-tailed noise and Huber's contamination error, we demonstrate that our algorithm exhibits a so-called phenomenon of two-phase convergence with a carefully chosen step size schedule. The algorithm converges linearly and delivers an estimator that is statistically optimal with respect to both the heavy-tailed noise and arbitrary corruptions. Interestingly, our results achieve the first minimax optimal rates under Huber's contamination model for noisy tensor decomposition. Compared with existing literature, quantile tensor decomposition removes the requirement of specifying a sparsity level in advance, making it more flexible for practical use. We also demonstrate the effectiveness of our algorithms in the presence of missing values. Our methods are subsequently applied to the food balance dataset and the international trade flow dataset, both of which yield intriguing findings.
翻译:本文研究了分位数和伪Huber张量分解的计算与统计问题。由于非光滑损失函数的引入,鲁棒张量分解中计算与统计问题的联合研究面临挑战。我们提出了一种基于投影次梯度下降的张量分解算法,该算法可采用伪Huber损失或分位数损失。在同时存在重尾噪声和Huber污染误差的情况下,我们证明算法在精心设计的步长调度下呈现出所谓的两阶段收敛现象。该算法线性收敛,并能给出在重尾噪声与任意异常值两方面均达到统计最优的估计量。值得注意的是,我们的结果首次在含噪张量分解的Huber污染模型下达到了极小极大最优速率。与现有文献相比,分位数张量分解无需预先指定稀疏度,使其在实际应用中更为灵活。我们还展示了算法在数据缺失场景下的有效性。最后,将方法应用于食物平衡数据集与国际贸易流量数据集,均得到了有趣的发现。