Next activity prediction aims to forecast the future behavior of running process instances. Recent publications in this field predominantly employ deep learning techniques and evaluate their prediction performance using publicly available event logs. This paper presents empirical evidence that calls into question the effectiveness of these current evaluation approaches. We show that there is an enormous amount of example leakage in all of the commonly used event logs, so that rather trivial prediction approaches perform almost as well as ones that leverage deep learning. We further argue that designing robust evaluations requires a more profound conceptual engagement with the topic of next-activity prediction, and specifically with the notion of generalization to new data. To this end, we present various prediction scenarios that necessitate different types of generalization to guide future research.
翻译:下一活动预测旨在预测正在运行的过程实例的未来行为。该领域近年来的研究主要采用深度学习技术,并利用公开事件日志评估其预测性能。本文提供的实证证据对当前评估方法的有效性提出了质疑。我们发现所有常用事件日志中均存在大量示例泄露,导致即使采用较为简单的预测方法,其性能也几乎与基于深度学习的方法相当。我们进一步论证,设计稳健的评估方法需要对下一活动预测主题,特别是对新数据的泛化能力概念进行更深入的概念性探讨。为此,我们提出了需要不同类型泛化能力的多种预测场景,以指导未来研究。