Estimating Higher-Order Mixed Memberships via the $\ell_{2,\infty}$ Tensor Perturbation Bound

Higher-order multiway data is ubiquitous in machine learning and statistics and often exhibits community-like structures, where each component (node) along each different mode has a community membership associated with it. In this paper we propose the tensor mixed-membership blockmodel, a generalization of the tensor blockmodel positing that memberships need not be discrete, but instead are convex combinations of latent communities. We establish the identifiability of our model and propose a computationally efficient estimation procedure based on the higher-order orthogonal iteration algorithm (HOOI) for tensor SVD composed with a simplex corner-finding algorithm. We then demonstrate the consistency of our estimation procedure by providing a per-node error bound, which showcases the effect of higher-order structures on estimation accuracy. To prove our consistency result, we develop the $\ell_{2,\infty}$ tensor perturbation bound for HOOI under independent, possibly heteroskedastic, subgaussian noise that may be of independent interest. Our analysis uses a novel leave-one-out construction for the iterates, and our bounds depend only on spectral properties of the underlying low-rank tensor under nearly optimal signal-to-noise ratio conditions such that tensor SVD is computationally feasible. Whereas other leave-one-out analyses typically focus on sequences constructed by analyzing the output of a given algorithm with a small part of the noise removed, our leave-one-out analysis constructions use both the previous iterates and the additional tensor structure to eliminate a potential additional source of error. Finally, we apply our methodology to real and simulated data, including applications to two flight datasets and a trade network dataset, demonstrating some effects not identifiable from the model with discrete community memberships.

翻译：高阶多路数据在机器学习与统计学中普遍存在，且通常呈现类社区结构，其中每个模态的每个组件（节点）均关联一个社区成员关系。本文提出张量混合成员块模型，作为张量块模型的推广，该模型假定成员关系不必是离散的，而是潜在社区的凸组合。我们证明了模型的可辨识性，并基于高阶正交迭代算法（HOOI）结合单纯形角点搜索算法，提出一种计算高效的估计方法。通过提供逐节点误差界，我们展示了高阶结构对估计精度的影响，从而证明了估计过程的一致性。为证明一致性结果，我们建立了独立（可能异方差）次高斯噪声下HOOI的$\ell_{2,\infty}$张量扰动界，该结果可能具有独立研究价值。分析采用新颖的留一法构建迭代序列，且我们的界限仅依赖于底层低秩张量的谱性质，且信噪比条件接近最优，使得张量SVD在计算上可行。不同于传统留一法分析中通过移除部分噪声构造算法输出序列，我们的留一法分析同时利用先前迭代值与额外张量结构消除潜在误差源。最后，我们将所提方法应用于真实与模拟数据，包括两个航班数据集与一个贸易网络数据集，展示了离散社区成员关系模型无法辨识的效应。