Sequences of labeled events observed at irregular intervals in continuous time are ubiquitous across various fields. Temporal Point Processes (TPPs) provide a mathematical framework for modeling these sequences, enabling inferences such as predicting the arrival time of future events and their associated label, called mark. However, due to model misspecification or lack of training data, these probabilistic models may provide a poor approximation of the true, unknown underlying process, with prediction regions extracted from them being unreliable estimates of the underlying uncertainty. This paper develops more reliable methods for uncertainty quantification in neural TPP models via the framework of conformal prediction. A primary objective is to generate a distribution-free joint prediction region for the arrival time and mark, with a finite-sample marginal coverage guarantee. A key challenge is to handle both a strictly positive, continuous response and a categorical response, without distributional assumptions. We first consider a simple but overly conservative approach that combines individual prediction regions for the event arrival time and mark. Then, we introduce a more effective method based on bivariate highest density regions derived from the joint predictive density of event arrival time and mark. By leveraging the dependencies between these two variables, this method exclude unlikely combinations of the two, resulting in sharper prediction regions while still attaining the pre-specified coverage level. We also explore the generation of individual univariate prediction regions for arrival times and marks through conformal regression and classification techniques. Moreover, we investigate the stronger notion of conditional coverage. Finally, through extensive experimentation on both simulated and real-world datasets, we assess the validity and efficiency of these methods.
翻译:在连续时间中观察到的不规则间隔标记事件序列在各个领域普遍存在。时间点过程(TPP)为建模这些序列提供了数学框架,支持诸如预测未来事件到达时间及其关联标记(称为标签)等推断任务。然而,由于模型设定错误或训练数据不足,这些概率模型可能对真实且未知的底层过程提供较差的近似,从中提取的预测区间作为对底层不确定性的估计也不可靠。本文通过共形预测框架,开发了更可靠的神经TPP模型不确定性量化方法。主要目标是生成一个无分布的联合预测区间,用于到达时间和标记,并具有有限样本边际覆盖保证。关键挑战在于处理严格正连续响应和分类响应,且无需分布假设。我们首先考虑一种简单但过于保守的方法,该组合了事件到达时间和标记的单独预测区间。然后,我们引入一种更有效的方法,基于从事件到达时间和标记的联合预测密度导出的二元最高密度区域。通过利用这两个变量之间的依赖关系,该方法排除了两者的不可能组合,从而得到更精确的预测区间,同时仍达到预定覆盖水平。我们还通过共形回归和分类技术探索了到达时间和标记的单独单变量预测区间的生成。此外,我们研究了条件覆盖的更强概念。最后,通过在模拟和真实数据集上的广泛实验,我们评估了这些方法的有效性和效率。