Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques for releasing trace variants still do not fulfill all the requirements of industry-scale usage. Moreover, providing privacy guarantees when there exists a high rate of infrequent trace variants is still a challenge. In this paper, we introduce TraVaG as a new approach for releasing differentially private trace variants based on \text{Generative Adversarial Networks} (GANs) that provides industry-scale benefits and enhances the level of privacy guarantees when there exists a high ratio of infrequent variants. Moreover, TraVaG overcomes shortcomings of conventional privacy preservation techniques such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data show that our approach outperforms state-of-the-art techniques in terms of privacy guarantees, plain data utility preservation, and result utility preservation.
翻译:过程挖掘在工业界迅速发展。因此,过程挖掘算法所使用的事件数据中包含的敏感及隐私信息引发的隐私问题日益突出。现有研究主要关注为轨迹变体提供隐私保障(如差分隐私),这些变体被主流通用技术(如过程发现)所采用。然而,当前轨迹变体发布的隐私保护技术仍未能满足工业级应用的所有需求。此外,在存在高频稀有轨迹变体时提供隐私保障仍是一项挑战。本文提出TraVaG,这是一种基于生成对抗网络(GAN)发布差分隐私轨迹变体的新方法,该方法具有工业级应用优势,并在稀有变体占比较高时增强了隐私保障水平。同时,TraVaG克服了传统隐私保护技术的缺陷,例如对变体长度进行限制以及引入虚假变体。在真实事件数据上的实验结果表明,我们的方法在隐私保障、原始数据效用保持及结果效用保持方面均优于现有技术。