We consider two applications where we study how dependence structure between many variables is linked to external network data. We first study the interplay between social media connectedness and the co-evolution of the COVID-19 pandemic across USA counties. We next study study how the dependence between stock market returns across firms relates to similarities in economic and policy indicators from text regulatory filings. Both applications are modelled via Gaussian graphical models where one has external network data. We develop spike-and-slab and graphical LASSO frameworks to integrate the network data, both facilitating the interpretation of the graphical model and improving inference. The goal is to detect when the network data relates to the graphical model and, if so, explain how. We found that counties strongly connected on Facebook are more likely to have similar COVID-19 evolution (positive partial correlations), accounting for various factors driving the mean. We also found that the association in stock market returns depends in a stronger fashion on economic than on policy indicators. The examples show that data integration can improve interpretation, statistical accuracy, and out-of-sample prediction, in some instances using significantly sparser graphical models.
翻译:我们研究了两个应用场景,探讨多变量之间的依赖结构如何与外部网络数据相关联。首先,我们考察了社交媒体连接性与美国各县新冠肺炎大流行协同演变之间的相互作用。其次,我们研究了企业间股票市场收益的依赖性与文本监管申报文件中经济与政策指标相似性之间的关系。这两个应用均通过高斯图模型进行建模,其中包含外部网络数据。我们开发了spike-and-slab与图LASSO框架以整合网络数据,既能促进图模型的解释性,又能改进推断效果。其目标是检测网络数据是否与图模型相关,若相关则进一步阐释其关联机制。研究发现,在Facebook上强关联的县更可能表现出相似的新冠肺炎演变趋势(正偏相关系数),这一结论已考虑多种影响均值的因素。我们还发现,股票市场收益的关联性对经济指标的依赖程度显著强于政策指标。这些案例表明,数据整合能够提升模型解释性、统计精度及样本外预测能力,在某些情况下可使用显著更稀疏的图模型。