Privacy-preserving utility mining (PPUM) aims to hide sensitive high-utility patterns while preserving the utility of the sanitized database. In practice, however, many datasets are associated with taxonomic information, which makes the identification and processing of generalized items more challenging. To address this, we investigate the cross-level privacy-preserving utility mining (CLPPUM) problem and propose a method for protecting generalized items. Based on different victim item selection strategies, we develop three CLPPUM algorithms: minimum RGISU first (Min-RF), maximum RGISU first (Max-RF), and best NSC first (Best-NSCF). Furthermore, to enable efficient victim item identification, a novel dictionary structure named GI-dic is designed to accelerate the computation of required utility metrics. Experimental results on multiple datasets demonstrate that the proposed algorithms successfully hide all sensitive cross-level high-utility itemsets without introducing artificial itemsets. The results also show that our method performs well on sparse datasets, and both Min-RF and Best-NSCF consistently outperform Max-RF. Overall, Min-RF achieves the best performance, particularly when the minimum utility threshold is low and the dataset is dense.
翻译:隐私保护高效用挖掘(PPUM)旨在隐藏敏感的高效用模式,同时保留清洗后数据库的效用性。然而在实践中,许多数据集与分类信息相关联,这使得泛化项的识别和处理更具挑战性。为此,我们研究了跨级别隐私保护高效用挖掘(CLPPUM)问题,并提出了一种保护泛化项的方法。基于不同的受害者项选择策略,我们开发了三种CLPPUM算法:最小RGISU优先(Min-RF)、最大RGISU优先(Max-RF)和最佳NSC优先(Best-NSCF)。此外,为实现高效的受害者项识别,我们设计了一种名为GI-dic的新型字典结构,以加速所需效用度量的计算。在多个数据集上的实验结果表明,所提算法成功隐藏了所有敏感的跨级别高效用项集,且未引入人工项集。结果还显示,我们的方法在稀疏数据集上表现良好,且Min-RF和Best-NSCF始终优于Max-RF。总体而言,Min-RF实现了最佳性能,尤其是在最小效用阈值较低且数据集密集的情况下。