The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are known, referred to as partially interval-censored data. The MHP is unsuitable for handling such data since its estimation requires event timestamps. In this study, we introduce the Partially Censored Multivariate Hawkes Process (PCMHP), a novel point process which shares parameter equivalence with the MHP and can effectively model both timestamped and interval-censored data. We demonstrate the capabilities of the PCMHP using synthetic and real-world datasets. Firstly, we illustrate that the PCMHP can approximate MHP parameters and recover the spectral radius using synthetic event histories. Next, we assess the performance of the PCMHP in predicting YouTube popularity and find that the PCMHP outperforms the popularity estimation algorithm Hawkes Intensity Process (HIP). Comparing with the fully interval-censored HIP, we show that the PCMHP improves prediction performance by accounting for point process dimensions, particularly when there exist significant cross-dimension interactions. Lastly, we leverage the PCMHP to gain qualitative insights from a dataset comprising daily COVID-19 case counts from multiple countries and COVID-19-related news articles. By clustering the PCMHP-modeled countries, we unveil hidden interaction patterns between occurrences of COVID-19 cases and news reporting.
翻译:多元霍克斯过程(MHP)被广泛用于分析相互影响的数据流,其中事件在其自身维度内(通过自激励)或跨不同维度(通过互激励)生成新事件。然而,在某些应用中,部分维度中单个事件的时间戳无法观测,仅能获知区间内的事件计数,此类数据被称为部分区间删失数据。由于MHP的估计需要事件时间戳,因此不适用于处理此类数据。本研究提出了部分删失多元霍克斯过程(PCMHP),这是一种与MHP具有参数等价性的新型点过程,能够有效建模同时包含时间戳数据与区间删失数据的情形。我们通过合成数据集与真实数据集展示了PCMHP的能力。首先,利用合成事件序列证明PCMHP能够逼近MHP参数并恢复谱半径。其次,在YouTube视频流行度预测任务中评估PCMHP性能,发现其优于流行度估计算法霍克斯强度过程(HIP)。与完全区间删失的HIP相比,PCMHP通过考虑点过程维度(尤其在存在显著跨维度交互时)提升了预测性能。最后,我们运用PCMHP对包含多国每日COVID-19病例数与疫情相关新闻文章的数据集进行定性分析。通过对PCMHP建模的国家进行聚类,揭示了COVID-19病例发生与新闻报道之间隐藏的交互模式。