Inferring tie strengths in social networks is an essential task in social network analysis. Common approaches classify the ties as wea} and strong ties based on the strong triadic closure (STC). The STC states that if for three nodes, $A$, $B$, and $C$, there are strong ties between $A$ and $B$, as well as $A$ and $C$, there has to be a (weak or strong) tie between $B$ and $C$. A variant of the STC called STC+ allows adding a few new weak edges to obtain improved solutions. So far, most works discuss the STC or STC+ in static networks. However, modern large-scale social networks are usually highly dynamic, providing user contacts and communications as streams of edge updates. Temporal networks capture these dynamics. To apply the STC to temporal networks, we first generalize the STC and introduce a weighted version such that empirical a priori knowledge given in the form of edge weights is respected by the STC. Similarly, we introduce a generalized weighted version of the STC+. The weighted STC is hard to compute, and our main contribution is an efficient 2-approximation (resp. 3-approximation) streaming algorithm for the weighted STC (resp. STC+) in temporal networks. As a technical contribution, we introduce a fully dynamic $k$-approximation for the minimum weighted vertex cover problem in hypergraphs with edges of size $k$, which is a crucial component of our streaming algorithms. An empirical evaluation shows that the weighted STC leads to solutions that better capture the a priori knowledge given by the edge weights than the non-weighted STC. Moreover, we show that our streaming algorithm efficiently approximates the weighted STC in real-world large-scale social networks.
翻译:在社会网络分析中,推断连接强度是一项关键任务。常见方法基于强三元闭包(STC)将连接分为弱连接和强连接。STC指出:若三个节点A、B、C之间存在强连接(A与B、A与C均为强连接),则B与C之间必须存在(弱或强)连接。STC的变体STC+允许添加少量新弱边以改进解。现有研究大多在静态网络中讨论STC或STC+。然而,现代大规模社会网络通常高度动态,用户联系与通信以边更新流的形式呈现,时间网络恰好能捕捉此类动态性。为将STC应用于时间网络,我们首先泛化STC并引入加权版本,使STC能尊重以边权重形式给出的经验先验知识。类似地,我们提出STC+的广义加权版本。加权STC的计算难度较高,我们的主要贡献是为时间网络中的加权STC(及STC+)提出一个高效的2-近似(或3-近似)流算法。作为技术贡献,我们引入一个边大小为k的超图上最小加权顶点覆盖问题的全动态k-近似算法,该算法是流算法的核心组件。实验评估表明,与非加权STC相比,加权STC能更有效地利用边权重蕴含的先验知识。此外,我们证明该流算法能在真实大规模社会网络中高效逼近加权STC的解。