Vertex hunting (VH) is the task of estimating a simplex from noisy data points and has many applications in areas such as network and text analysis. We introduce a new variant, semi-supervised vertex hunting (SSVH), in which partial information is available in the form of barycentric coordinates for some data points, known only up to an unknown transformation. To address this problem, we develop a method that leverages properties of orthogonal projection matrices, drawing on novel insights from linear algebra. We establish theoretical error bounds for our method and demonstrate that it achieves a faster convergence rate than existing unsupervised VH algorithms. Finally, we apply SSVH to two practical settings, semi-supervised network mixed membership estimation and semi-supervised topic modeling, resulting in efficient and scalable algorithms.
翻译:顶点搜索(Vertex Hunting,VH)是从含噪声的数据点中估计单纯形的任务,在网络分析、文本分析等领域具有广泛应用。本文提出一种新变体——半监督顶点搜索(Semi-supervised Vertex Hunting,SSVH),其中部分数据点的重心坐标信息以未知变换的形式已知。为解决该问题,我们基于线性代数的新颖见解,开发了一种利用正交投影矩阵性质的方法。我们建立了该方法的理论误差界,并证明其收敛速度优于现有无监督VH算法。最后,我们将SSVH应用于两个实际场景:半监督网络混合成员估计与半监督主题建模,从而得到高效且可扩展的算法。