Anonymization of event logs facilitates process mining while protecting sensitive information of process stakeholders. Existing techniques, however, focus on the privatization of the control-flow. Other process perspectives, such as roles, resources, and objects are neglected or subject to randomization, which breaks the dependencies between the perspectives. Hence, existing techniques are not suited for advanced process mining tasks, e.g., social network mining or predictive monitoring. To address this gap, we propose PMDG, a framework to ensure privacy for multi-perspective process mining through data generalization. It provides group-based privacy guarantees for an event log, while preserving the characteristic dependencies between the control-flow and further process perspectives. Unlike existin privatization techniques that rely on data suppression or noise insertion, PMDG adopts data generalization: a technique where the activities and attribute values referenced in events are generalized into more abstract ones, to obtain equivalence classes that are sufficiently large from a privacy point of view. We demonstrate empirically that PMDG outperforms state-of-the-art anonymization techniques, when mining handovers and predicting outcomes.
翻译:事件日志的匿名化有助于过程挖掘,同时保护过程利益相关者的敏感信息。然而现有技术主要关注控制流的私有化,其他过程视角(如角色、资源和对象)或被忽视,或采用随机化处理,这破坏了各视角间的依赖关系。因此,现有技术不适用于高级过程挖掘任务(如社交网络挖掘或预测性监控)。为填补这一空白,我们提出PMDG框架——通过数据泛化实现多视角过程挖掘的隐私保护。该框架在保留控制流与其他过程视角特征依赖关系的同时,为事件日志提供基于组的隐私保证。不同于依赖数据抑制或噪声插入的现有私有化技术,PMDG采用数据泛化方法:将事件中涉及的活动和属性值泛化为更抽象的概念,从而从隐私角度获得足够大的等价类。实验证明,在挖掘任务交接与结果预测时,PMDG的性能优于当前最先进的匿名化技术。