Protecting personal data about individuals, such as event traces in process mining, is an inherently difficult task since an event trace leaks information about the path in a process model that an individual has triggered. Yet, prior anonymization methods of event traces like k-anonymity or event log sanitization struggled to protect against such leakage, in particular against adversaries with sufficient background knowledge. In this work, we provide a method that tackles the challenge of summarizing sensitive event traces by learning the underlying process tree in a privacy-preserving manner. We prove via the so-called Differential Privacy (DP) property that from the resulting summaries no useful inference can be drawn about any personal data in an event trace. On the technical side, we introduce a differentially private approximation (DPIM) of the Inductive Miner. Experimentally, we compare our DPIM with the Inductive Miner on 14 real-world event traces by evaluating well-known metrics: fitness, precision, simplicity, and generalization. The experiments show that our DPIM not only protects personal data but also generates faithful process trees that exhibit little utility loss above the Inductive Miner.
翻译:保护个人数据(如流程挖掘中的事件轨迹)本质上是一项艰巨的任务,因为事件轨迹会泄露个体在流程模型中触发的路径信息。然而,先前的事件轨迹匿名化方法(如k-匿名或事件日志清理)难以有效防范此类信息泄露,尤其无法抵御具备充分背景知识的攻击者。本研究提出一种方法,通过以隐私保护的方式学习底层流程树,来解决敏感事件轨迹摘要化的挑战。我们通过所谓的差分隐私(DP)性质证明,从生成的摘要中无法对事件轨迹中的任何个人数据作出有效推断。在技术层面,我们引入了归纳挖掘器的差分隐私近似算法(DPIM)。实验方面,我们在14个真实世界事件轨迹上,通过评估拟合度、精确度、简洁性和泛化性等常用指标,将DPIM与标准归纳挖掘器进行对比。实验结果表明,我们的DPIM不仅能有效保护个人数据,还能生成忠实反映流程的流程树,其效用损失相较于标准归纳挖掘器微乎其微。