When two variables depend on the same or similar underlying network, their shared network dependence structure can lead to spurious associations. While statistical associations between two variables sampled from interconnected subjects are a common inferential goal across various fields, little research has focused on how to disentangle shared dependence for valid statistical inference. We revisit two different approaches from distinct fields that may address shared network dependence: the pre-whitening approach, commonly used in time series analysis to remove the shared temporal dependence, and the network autocorrelation model, widely used in network analysis often to examine or account for autocorrelation of the outcome variable. We demonstrate how each approach implicitly entails assumptions about how a variable of interest propagates among nodes via network ties given the network structure. We further propose adaptations of existing pre-whitening methods to the network setting by explicitly reflecting underlying assumptions about "level of interaction" that induce network dependence, while accounting for its unique complexities. Our simulation studies demonstrate the effectiveness of the two approaches in reducing spurious associations due to shared network dependence when their respective assumptions hold. However, the results also show the sensitivity to assumption violations, underscoring the importance of correctly specifying the shared dependence structure based on available network information and prior knowledge about the interactions driving dependence.
翻译:当两个变量依赖于相同或相似的底层网络时,它们共享的网络依赖结构可能导致虚假关联。尽管从相互关联的个体中采样得到的两个变量之间的统计关联是各领域常见的推断目标,但如何解耦共享依赖以实现有效统计推断的研究却很少。我们重新审视了来自不同领域的两种可能处理共享网络依赖的方法:常用于时间序列分析以消除共享时间依赖的预白化方法,以及常用于网络分析以检验或解释结果变量自相关的网络自相关模型。我们论证了每种方法如何隐含地设定了感兴趣变量在给定网络结构下通过网络连接在节点间传播的假设。我们进一步通过明确反映引致网络依赖的"交互水平"的基本假设,同时考虑其独特复杂性,提出了将现有预白化方法适配到网络环境中的改进方案。模拟研究表明,在各自假设成立时,这两种方法能有效减少因共享网络依赖产生的虚假关联。然而,结果也显示出对假设违背的敏感性,强调了基于可用网络信息和关于驱动依赖的交互作用的先验知识正确设定共享依赖结构的重要性。