Large-scale network data can pose computational challenges, be expensive to acquire, and compromise the privacy of individuals in social networks. We show that the locations and scales of latent space cluster models can be inferred from the number of connections between groups alone. We demonstrate this modelling approach using synthetic data and apply it to friendships between students collected as part of the Add Health study, eliminating the need for node-level connection data. The method thus protects the privacy of individuals and simplifies data sharing. It also offers performance advantages over node-level latent space models because the computational cost scales with the number of clusters rather than the number of nodes.
翻译:大规模网络数据可能带来计算挑战、采集成本高昂,并威胁社交网络中个体的隐私。我们证明,潜空间聚类模型的位置和尺度可仅从组间连接数量推断得出。通过合成数据验证该建模方法后,我们将其应用于Add Health研究中收集的学生友谊数据,从而无需使用节点级连接数据。该方法既保护个体隐私又简化了数据共享,同时因计算成本随聚类数量而非节点数量扩展,在性能上优于节点级潜空间模型。