In this paper, we propose a nested matrix-tensor model which extends the spiked rank-one tensor model of order three. This model is particularly motivated by a multi-view clustering problem in which multiple noisy observations of each data point are acquired, with potentially non-uniform variances along the views. In this case, data can be naturally represented by an order-three tensor where the views are stacked. Given such a tensor, we consider the estimation of the hidden clusters via performing a best rank-one tensor approximation. In order to study the theoretical performance of this approach, we characterize the behavior of this best rank-one approximation in terms of the alignments of the obtained component vectors with the hidden model parameter vectors, in the large-dimensional regime. In particular, we show that our theoretical results allow us to anticipate the exact accuracy of the proposed clustering approach. Furthermore, numerical experiments indicate that leveraging our tensor-based approach yields better accuracy compared to a naive unfolding-based algorithm which ignores the underlying low-rank tensor structure. Our analysis unveils unexpected and non-trivial phase transition phenomena depending on the model parameters, ``interpolating'' between the typical behavior observed for the spiked matrix and tensor models.
翻译:本文提出了一种嵌套矩阵-张量模型,该模型将三阶尖峰秩一张量模型进行了推广。该模型主要针对多视图聚类问题设计,其中每个数据点被获取多次含噪观测,且各视图的方差可能非均匀分布。在此情形下,数据可自然地表示为三阶张量,其中各视图沿某一维度堆叠。基于此类张量,我们考虑通过执行最佳秩一张量逼近来估计隐藏聚类。为研究该方法的理论性能,我们在高维渐进框架下刻画了该最佳秩一逼近的行为特征,具体体现在所得成分向量与隐藏模型参数向量之间的对齐程度。特别地,我们表明理论结果能够精确预测所提聚类方法的准确度。此外,数值实验表明,与忽略潜在低秩张量结构的朴素展开式算法相比,基于张量的方法可取得更优的准确度。我们的分析揭示了依赖于模型参数的意外且非平凡的相变现象,该现象在尖峰矩阵模型与尖峰张量模型的典型行为之间起到了"插值"作用。