Many networks have event-driven dynamics (such as communication, social media and criminal networks), where the mean rate of the events occurring at a node in the network changes according to the occurrence of other events in the network. In particular, events associated with a node of the network could increase the rate of events at other nodes, depending on their influence relationship. Thus, it is of interest to use temporal data to uncover the directional, time-dependent, influence structure of a given network while also quantifying uncertainty even when knowledge of a physical network is lacking. Typically, methods for inferring the influence structure in networks require knowledge of a physical network or are only able to infer small network structures. In this paper, we model event-driven dynamics on a network by a multidimensional Hawkes process. We then develop a novel ensemble-based filtering approach for a time-series of count data (i.e., data that provides the number of events per unit time for each node in the network) that not only tracks the influence network structure over time but also approximates the uncertainty via ensemble spread. The method overcomes several deficiencies in existing methods such as existing methods for inferring multidimensional Hawkes processes are too slow to be practical for any network over ~50 nodes, can only deal with timestamp data (i.e. data on just when events occur not the number of events at each node), and that we do not need a physical network to start with. Our method is massively parallelizable, allowing for its use to infer the influence structure of large networks (~10,000 nodes). We demonstrate our method for large networks using both synthetic and real-world email communication data.
翻译:许多网络(如通信网络、社交媒体和犯罪网络)具有事件驱动动力学特性:网络中节点发生事件的平均速率会随网络内其他事件的发生而变化。具体而言,与某一节点相关的事件可能增加其他节点的事件发生率,这取决于它们之间的影响关系。因此,利用时序数据揭示给定网络的方向性、时间依赖性影响结构,并在缺乏物理网络知识的情况下量化不确定性具有重要意义。现有推断网络影响结构的方法通常需要物理网络知识,或仅能推断小型网络结构。本文采用多维霍克斯过程对网络上的事件驱动动力学进行建模,并提出一种新颖的基于集成滤波的方法,用于处理计数数据时间序列(即提供每个节点单位时间内事件数量的数据)。该方法不仅能实时追踪影响网络结构的变化,还能通过集成散度近似量化不确定性。所提方法克服了现有方法的若干缺陷:现有推断多维霍克斯过程的方法因计算速度过慢无法用于超过约50个节点的网络,仅能处理时间戳数据(即仅记录事件发生时间而非各节点事件数量的数据),且需要预设物理网络结构。该方法具有高度并行化特性,可用于推断大规模网络(约10000个节点)的影响结构。我们通过合成数据与真实电子邮件通信数据验证了该方法在大规模网络中的有效性。