The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are known, referred to as partially interval-censored data. The MHP is unsuitable for handling such data since its estimation requires event timestamps. In this study, we introduce the Partial Mean Behavior Poisson (PMBP) process, a novel point process which shares parameter equivalence with the MHP and can effectively model both timestamped and interval-censored data. We demonstrate the capabilities of the PMBP process using synthetic and real-world datasets. Firstly, we illustrate that the PMBP process can approximate MHP parameters and recover the spectral radius using synthetic event histories. Next, we assess the performance of the PMBP process in predicting YouTube popularity and find that it surpasses state-of-the-art methods. Lastly, we leverage the PMBP process to gain qualitative insights from a dataset comprising daily COVID-19 case counts from multiple countries and COVID-19-related news articles. By clustering the PMBP-modeled countries, we unveil hidden interaction patterns between occurrences of COVID-19 cases and news reporting.
翻译:多元霍克斯过程(MHP)广泛用于分析相互交互的数据流,其中事件通过自激发在自身维度内或通过交叉激发跨不同维度生成新事件。然而,在某些应用中,部分维度上的单个事件时间戳不可观测,仅能获得区间内的事件计数,此类数据称为部分区间删失数据。由于MHP的估计需依赖事件时间戳,故不适于处理此类数据。本研究提出部分平均行为泊松(PMBP)过程——一种与MHP具有参数等价性的新型点过程,可有效建模含时间戳和区间删失的混合数据。通过合成数据集与真实世界数据实验,我们验证了PMBP过程的能力:首先,使用合成事件历史证明PMBP过程可近似MHP参数并恢复谱半径;其次,评估PMBP过程在YouTube流行度预测中的表现,发现其超越现有最先进方法;最后,利用PMBP过程从多国每日COVID-19病例计数与新冠相关新闻文章组成的混合数据集中获取定性洞察——通过对PMBP建模的国家进行聚类,揭示了COVID-19病例发生与新闻报道之间的潜在交互模式。