The problems of selecting partial correlation and causality graphs for count data are considered. A parameter driven generalized linear model is used to describe the observed multivariate time series of counts. Partial correlation and causality graphs corresponding to this model explain the dependencies between each time series of the multivariate count data. In order to estimate these graphs with tunable sparsity, an appropriate likelihood function maximization is regularized with an l1-type constraint. A novel MCEM algorithm is proposed to iteratively solve this regularized MLE. Asymptotic convergence results are proved for the sequence generated by the proposed MCEM algorithm with l1-type regularization. The algorithm is first successfully tested on simulated data. Thereafter, it is applied to observed weekly dengue disease counts from each ward of Greater Mumbai city. The interdependence of various wards in the proliferation of the disease is characterized by the edges of the inferred partial correlation graph. On the other hand, the relative roles of various wards as sources and sinks of dengue spread is quantified by the number and weights of the directed edges originating from and incident upon each ward. From these estimated graphs, it is observed that some special wards act as epicentres of dengue spread even though their disease counts are relatively low.
翻译:本文考虑计数数据的偏相关图与因果图选择问题。采用参数驱动的广义线性模型描述观测到的多元计数时间序列。该模型对应的偏相关图与因果图解释了多元计数数据中各时间序列之间的依赖关系。为估计具有可调稀疏性的这些图,通过l1型约束对适当的似然函数最大化进行正则化。提出一种新颖的MCEM算法迭代求解该正则化最大似然估计。针对所提出的带有l1型正则化的MCEM算法生成的序列,证明了其渐近收敛性。该算法首先在模拟数据上成功测试,随后应用于孟买大都市区各行政区每周观测到的登革热疾病计数数据。推断出的偏相关图边特征刻画了疾病扩散中不同行政区的相互依赖性;另一方面,各行政区作为登革热传播源与汇的相对作用,通过每个行政区发出与接收的有向边的数量及权重来量化。从这些估计图中观察到,部分行政区的疾病计数虽相对较低,却成为登革热扩散的中心点。