Identifying critical nodes in complex networks is a fundamental task in graph mining. Yet, methods addressing an all-or-nothing coverage mechanics in a bipartite dependency network, a graph with two types of nodes where edges represent dependency relationships across the two groups only, remain largely unexplored. We formalize the CriticalSet problem: given an arbitrary bipartite graph modeling dependencies of items on contributors, identify the set of k contributors whose removal isolates the largest number of items. We prove that this problem is NP-hard and requires maximizing a supermodular set function, for which standard forward greedy algorithms provide no approximation guarantees. Consequently, we model CriticalSet as a coalitional game, deriving a closed-form centrality, ShapleyCov, based on the Shapley value. This measure can be interpreted as the expected number of items isolated by a contributor's departure. Leveraging these insights, we propose MinCov, a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy, prioritizing contributors who uniquely support many items. Extensive experiments on synthetic and large-scale real datasets, including a Wikipedia graph with over 250 million edges, reveal that MinCov and ShapleyCov significantly outperform traditional baselines. Notably, MinCov achieves near-optimal performance, within 0.02 AUC of a Stochastic Hill Climbing metaheuristic, while remaining several orders of magnitude faster.
翻译:识别复杂网络中的关键节点是图挖掘中的基础任务。然而,针对二分依赖网络中“全有或全无”覆盖机制的方法——这类网络包含两类节点,边仅表示跨组间的依赖关系——仍鲜有探索。我们形式化定义了CriticalSet问题:给定一个对物品与贡献者依赖关系进行建模的任意二分图,找出使移除后孤立物品数量最大的k个贡献者集合。我们证明该问题属于NP难问题,且需要最大化超模集函数,而标准前向贪心算法对此无近似保证。因此,我们将CriticalSet建模为合作博弈,基于沙普利值推导出闭式中心性度量ShapleyCov。该度量可解释为单个贡献者离场导致物品被孤立的期望数量。基于上述洞察,我们提出MinCov算法——一种线性时间迭代剥离算法,该算法显式考虑连接冗余性,优先处理独特支撑大量物品的贡献者。在包含超过2.5亿条边的维基百科图等合成数据集与大规模真实数据集上的大量实验表明,MinCov与ShapleyCov显著优于传统基线方法。值得注意的是,MinCov在随机爬山元启发式算法0.02 AUC范围内实现了近最优性能,同时保持数个数量级的速度优势。