Predicting node labels on a given graph is a widely studied problem with many applications, including community detection and molecular graph prediction. This paper considers predicting multiple node labeling functions on graphs simultaneously and revisits this problem from a multitask learning perspective. For a concrete example, consider overlapping community detection: each community membership is a binary node classification task. Due to complex overlapping patterns, we find that negative transfer is prevalent when we apply naive multitask learning to multiple community detection, as task relationships are highly nonlinear across different node labeling. To address the challenge, we develop an algorithm to cluster tasks into groups based on a higher-order task affinity measure. We then fit a multitask model on each task group, resulting in a boosting procedure on top of the baseline model. We estimate the higher-order task affinity measure between two tasks as the prediction loss of one task in the presence of another task and a random subset of other tasks. Then, we use spectral clustering on the affinity score matrix to identify task grouping. We design several speedup techniques to compute the higher-order affinity scores efficiently and show that they can predict negative transfers more accurately than pairwise task affinities. We validate our procedure using various community detection and molecular graph prediction data sets, showing favorable results compared with existing methods. Lastly, we provide a theoretical analysis to show that under a planted block model of tasks on graphs, our affinity scores can provably separate tasks into groups.
翻译:在给定图上预测节点标签是一个被广泛研究的问题,具有许多应用,包括社区检测和分子图预测。本文考虑在图上同时预测多个节点标签函数,并从多任务学习的角度重新审视这一问题。以重叠社区检测为例:每个社区隶属关系都是一个二元节点分类任务。由于复杂的重叠模式,我们发现当对多个社区检测应用朴素的多任务学习时,任务关系在不同节点标签间高度非线性,导致负迁移普遍存在。为应对这一挑战,我们提出一种算法,基于高阶任务亲和度度量将任务聚类成组。然后,我们在每个任务组上拟合多任务模型,从而在基线模型之上形成一种提升过程。我们通过测量一个任务在存在另一任务及随机任务子集时的预测损失来估计两个任务之间的高阶任务亲和度度量。接着,我们利用亲和度得分矩阵上的谱聚类来识别任务分组。我们设计了多种加速技术以高效计算高阶亲和度得分,并证明相较于成对任务亲和度,这些得分能更精确地预测负迁移。我们使用多种社区检测和分子图预测数据集验证了该流程,结果显示其优于现有方法。最后,我们提供理论分析,证明在图上任务种植块模型假设下,我们的亲和度得分能够将任务可信地分离成组。