Event sequences, characterized by irregular sampling intervals and a mix of categorical and numerical features, are common data structures in various real-world domains such as healthcare, finance, and user interaction logs. Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences. This complicates result comparison across different papers due to varying evaluation protocols, potentially misleading progress in this field. We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols, focusing on regression and classification problems with sequence-level targets. Our library simplifies benchmarking, dataset addition, and method integration through a unified interface. It includes a novel synthetic dataset and provides preprocessed real-world datasets, including the largest publicly available banking dataset. Our results provide an in-depth analysis of datasets, identifying some as unsuitable for model comparison. We investigate the importance of modeling temporal and sequential components, as well as the robustness and scaling properties of the models. These findings highlight potential directions for future research. Our benchmark aim is to facilitate reproducible research, expediting progress and increasing real-world impacts.
翻译:事件序列具有不规则采样间隔以及分类特征与数值特征混合的特点,是医疗、金融和用户交互日志等多种现实领域中的常见数据结构。尽管时序数据建模技术已取得进展,但目前仍缺乏用于评估事件序列模型性能的标准化基准。由于不同论文采用各异的评估方案,这导致结果对比变得复杂,可能误导该领域的进展。我们提出了EBES,这是一个包含标准化评估场景与协议的综合基准测试工具,重点关注具有序列级目标的回归与分类问题。我们的库通过统一接口简化了基准测试、数据集添加和方法集成。它包含一个新颖的合成数据集,并提供经过预处理的真实世界数据集,其中包括目前公开可用的最大规模银行数据集。我们的结果对数据集进行了深入分析,指出其中某些数据集不适用于模型比较。我们研究了建模时序与序列分量的重要性,以及模型的鲁棒性与扩展特性。这些发现指明了未来研究的潜在方向。本基准测试旨在促进可复现研究,加速进展并提升现实世界影响力。