We present a Python-based framework for event-log prediction in streaming mode, enabling predictions while data is being generated by a business process. The framework allows for easy integration of streaming algorithms, including language models like n-grams and LSTMs, and for combining these predictors using ensemble methods. Using our framework, we conducted experiments on various well-known process-mining data sets and compared classical batch with streaming mode. Though, in batch mode, LSTMs generally achieve the best performance, there is often an n-gram whose accuracy comes very close. Combining basic models in ensemble methods can even outperform LSTMs. The value of basic models with respect to LSTMs becomes even more apparent in streaming mode, where LSTMs generally lack accuracy in the early stages of a prediction run, while basic methods make sensible predictions immediately.
翻译:我们提出了一种基于Python的流式事件日志预测框架,能够在业务流程生成数据的同时进行预测。该框架支持流式算法的便捷集成,包括n-gram和LSTM等语言模型,并可通过集成方法组合这些预测器。利用本框架,我们在多个知名流程挖掘数据集上进行了实验,对比了经典批处理模式与流式模式。尽管在批处理模式下LSTM通常能获得最佳性能,但往往存在某个n-gram模型其准确率与之非常接近。通过集成方法组合基础模型甚至能够超越LSTM的性能。在流式模式下,基础模型相对于LSTM的价值更为凸显:LSTM通常在预测运行的初始阶段精度不足,而基础方法能够立即做出合理预测。