Higher-order multiway data is ubiquitous in machine learning and statistics and often exhibits community-like structures, where each component (node) along each different mode has a community membership associated with it. In this paper we propose the tensor mixed-membership blockmodel, a generalization of the tensor blockmodel positing that memberships need not be discrete, but instead are convex combinations of latent communities. We establish the identifiability of our model and propose a computationally efficient estimation procedure based on the higher-order orthogonal iteration algorithm (HOOI) for tensor SVD composed with a simplex corner-finding algorithm. We then demonstrate the consistency of our estimation procedure by providing a per-node error bound, which showcases the effect of higher-order structures on estimation accuracy. To prove our consistency result, we develop the $\ell_{2,\infty}$ tensor perturbation bound for HOOI under independent, heteroskedastic, subgaussian noise that may be of independent interest. Our analysis uses a novel leave-one-out construction for the iterates, and our bounds depend only on spectral properties of the underlying low-rank tensor under nearly optimal signal-to-noise ratio conditions such that tensor SVD is computationally feasible. Finally, we apply our methodology to real and simulated data, demonstrating some effects not identifiable from the model with discrete community memberships.
翻译:高阶多路数据在机器学习和统计学中普遍存在,且通常呈现类似社区的结构,其中每个维度上的每个组件(节点)都对应一个社区成员关系。本文提出张量混合成员分块模型,它是张量分块模型的推广,假定成员关系不必是离散的,而是潜在社区的凸组合。我们证明了该模型的可识别性,并提出了一种基于高阶正交迭代算法(HOOI)进行张量奇异值分解、并结合单纯形角点搜索算法的高效估计方法。随后,我们通过提供逐节点误差界证明了估计过程的一致性,这展示了高阶结构对估计精度的影响。为证明一致性结果,我们推导了HOOI在独立、异方差、亚高斯噪声下的 $\ell_{2,\infty}$ 张量扰动界,该结果可能具有独立的研究价值。我们的分析采用了一种新颖的迭代逐次留一法构造,且所导出的界仅依赖于底层低秩张量的谱性质,其信噪比条件接近最优,使得张量奇异值分解在计算上可行。最后,我们将该方法应用于真实数据与模拟数据,展示了某些无法从离散社区成员关系模型中识别出的效应。