Network-linked data, where multivariate observations are interconnected by a network, are becoming increasingly prevalent in fields such as sociology and biology. These data often exhibit inherent noise and complex relational structures, complicating conventional modeling and statistical inference. Motivated by empirical challenges in analyzing such data sets, this paper introduces a family of network subspace generalized linear models designed for analyzing noisy, network-linked data. We propose a model inference method based on subspace-constrained maximum likelihood, which emphasizes flexibility in capturing network effects and provides a robust inference framework against network perturbations.We establish the asymptotic distributions of the estimators under network perturbations, demonstrating the method's accuracy through extensive simulations involving random network models and deep-learning-based embedding algorithms. The proposed methodology is applied to a comprehensive analysis of a large-scale study on school conflicts, where it identifies significant social effects, offering meaningful and interpretable insights into student behaviors.
翻译:网络关联数据——即通过网络相互连接的多变量观测数据——在社会学和生物学等领域日益普遍。这类数据通常包含固有噪声和复杂的关系结构,使得传统的建模与统计推断变得困难。受分析此类数据集时面临的实证挑战启发,本文提出了一类专门用于分析含噪声网络关联数据的网络子空间广义线性模型。我们提出了一种基于子空间约束最大似然的模型推断方法,该方法在捕捉网络效应方面具有灵活性,并提供了一个针对网络扰动的鲁棒推断框架。我们建立了网络扰动下估计量的渐近分布,通过涉及随机网络模型和基于深度学习的嵌入算法的广泛模拟验证了该方法的准确性。所提出的方法被应用于一项关于校园冲突的大规模研究的综合分析中,成功识别出显著的社会效应,为学生行为提供了有意义且可解释的见解。