In recent years, trace generation has emerged as a significant challenge within the Process Mining community. Deep Learning (DL) models have demonstrated accuracy in reproducing the features of the selected processes. However, current DL generative models are limited in their ability to adapt the learned distributions to generate data samples based on specific conditions or attributes. This limitation is particularly significant because the ability to control the type of generated data can be beneficial in various contexts, enabling a focus on specific behaviours, exploration of infrequent patterns, or simulation of alternative 'what-if' scenarios. In this work, we address this challenge by introducing a conditional model for process data generation based on a conditional variational autoencoder (CVAE). Conditional models offer control over the generation process by tuning input conditional variables, enabling more targeted and controlled data generation. Unlike other domains, CVAE for process mining faces specific challenges due to the multiperspective nature of the data and the need to adhere to control-flow rules while ensuring data variability. Specifically, we focus on generating process executions conditioned on control flow and temporal features of the trace, allowing us to produce traces for specific, identified sub-processes. The generated traces are then evaluated using common metrics for generative model assessment, along with additional metrics to evaluate the quality of the conditional generation
翻译:近年来,轨迹生成已成为过程挖掘领域的重要挑战。深度学习模型在复现选定过程的特征方面已展现出准确性。然而,当前深度学习生成模型在调整已学习分布以生成基于特定条件或属性的数据样本方面存在局限。这一限制尤为重要,因为控制生成数据类型的能力在多种情境下具有重要价值,能够聚焦特定行为、探索低频模式或模拟替代性的"假设"场景。本研究通过引入基于条件变分自编码器的过程数据条件生成模型来解决这一挑战。条件模型通过调节输入条件变量实现对生成过程的控制,从而实现更具针对性和可控性的数据生成。与其他领域不同,面向过程挖掘的CVAE因数据的多视角特性及在确保数据变异性的同时需遵循控制流规则而面临特定挑战。具体而言,我们专注于基于轨迹控制流与时间特征的条件化过程执行生成,从而为特定已识别子过程生成轨迹。生成轨迹随后使用生成模型评估的通用指标及额外设计的条件生成质量评估指标进行综合评价。