Prescriptive Process Monitoring is a prominent problem in Process Mining, which consists in identifying a set of actions to be recommended with the goal of optimising a target measure of interest or Key Performance Indicator (KPI). One challenge that makes this problem difficult is the need to provide Prescriptive Process Monitoring techniques only based on temporally annotated (process) execution data, stored in, so-called execution logs, due to the lack of well crafted and human validated explicit models. In this paper we aim at proposing an AI based approach that learns, by means of Reinforcement Learning (RL), an optimal policy (almost) only from the observation of past executions and recommends the best activities to carry on for optimizing a KPI of interest. This is achieved first by learning a Markov Decision Process for the specific KPIs from data, and then by using RL training to learn the optimal policy. The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches. The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.
翻译:规范性过程监控是过程挖掘领域中的一个重要问题,其核心在于识别一组待推荐的操作,以优化目标衡量指标或关键绩效指标(KPI)。该问题的难点在于,由于缺乏精心设计且经人工验证的显式模型,需要仅基于带时间标注的(过程)执行数据(即所谓的执行日志)来提供规范性过程监控技术。本文旨在提出一种基于人工智能的方法,该方法通过强化学习(RL)仅从历史执行观测中学习(几乎)最优策略,并推荐最有利于优化特定KPI的后续活动。具体实现分两步:首先从数据中学习针对特定KPI的马尔可夫决策过程,然后利用强化学习训练获取最优策略。该方法在真实数据集与合成数据集上进行了验证,并与离策略深度强化学习方法进行了对比。我们的方法能够与深度强化学习方法相匹敌甚至往往超越后者,这为在白盒强化学习技术仅能获取时序执行数据的场景中的实际应用提供了贡献。