In the graph label selection problem, one is given an $n$-vertex graph and a budget $k$, and seeks to select $k$ vertices whose labels enable accurate prediction of the labels on the remaining vertices. This problem formalizes distilling a small representative set from the whole graph. We present the first $\tilde{O}(\log^{1.5} n)$-approximation algorithm for graph label selection under the standard budget constraint. Prior work either relies on resource augmentation, allowing substantially more than $k$ labeled vertices, or consists primarily of heuristics without provable guarantees. Finally, we demonstrate that practical heuristic variants of our algorithm scale to significantly larger graphs than previous methods, while essentially retaining their quality.
翻译:在图标签选择问题中,给定一个包含 $n$ 个顶点的图和一个预算 $k$,目标是选取 $k$ 个顶点,使其标签能够准确预测剩余顶点的标签。该问题形式化地描述了从整个图中提炼一个具有代表性的小样本集的过程。我们提出了首个在标准预算约束下实现 $\tilde{O}(\log^{1.5} n)$ 近似比的图标签选择算法。此前的研究要么依赖资源增强(即允许标注远多于 $k$ 个顶点),要么主要依赖缺乏可证明保证的启发式方法。最后,我们证明,我们算法的实用启发式变体可扩展到比先前方法显著更大的图,同时基本保持其质量。