Utility-driven mining is an essential task in data science, as it can provide deeper insight into the real world. High-utility sequential rule mining (HUSRM) aims at discovering sequential rules with high utility and high confidence. It can certainly provide reliable information for decision-making because it uses confidence as an evaluation metric, as well as some algorithms like HUSRM and US-Rule. However, in current rule-growth mining methods, the linkage between HUSRs and their generation remains ambiguous. Specifically, it is unclear whether the addition of new items affects the utility or confidence of the former rule, leading to an increase or decrease in their values. Therefore, in this paper, we formulate the problem of mining HUSRs with an increasing utility ratio. To address this, we introduce a novel algorithm called SRIU for discovering all HUSRs with an increasing utility ratio using two distinct expansion methods, including left-right expansion and right-left expansion. SRIU also utilizes the item pair estimated utility pruning strategy (IPEUP) to reduce the search space. Moreover, for the two expansion methods, two sets of upper bounds and corresponding pruning strategies are introduced. To enhance the efficiency of SRIU, several optimizations are incorporated. These include utilizing the Bitmap to reduce memory consumption and designing a compact utility table for the mining procedure. Finally, extensive experimental results from both real-world and synthetic datasets demonstrate the effectiveness of the proposed method. Moreover, to better assess the quality of the generated sequential rules, metrics such as confidence and conviction are employed, which further demonstrate that SRIU can improve the relevance of mining results.
翻译:效用驱动挖掘是数据科学中的一项关键任务,因为它能够为现实世界提供更深入的洞察。高效用序列规则挖掘(HUSRM)旨在发现具有高效用和高置信度的序列规则。由于它使用置信度作为评估指标,并采用了如HUSRM和US-Rule等算法,因此无疑能为决策提供可靠信息。然而,在当前基于规则增长的挖掘方法中,高效用序列规则与其生成过程之间的关联仍然模糊不清。具体而言,尚不清楚新项的添加是否会影响前序规则的效用或置信度,从而导致其值增加或减少。因此,本文提出了挖掘具有递增效用比的高效用序列规则的问题。为解决此问题,我们引入了一种名为SRIU的新算法,该算法采用两种不同的扩展方法(包括左右扩展和右左扩展)来发现所有具有递增效用比的高效用序列规则。SRIU还利用项对估计效用剪枝策略(IPEUP)来缩减搜索空间。此外,针对两种扩展方法,分别引入了两组上界及相应的剪枝策略。为提升SRIU的效率,算法融合了多项优化措施,包括利用位图降低内存消耗,以及为挖掘过程设计紧凑的效用表。最后,基于真实世界和合成数据集的广泛实验结果验证了所提方法的有效性。此外,为更好地评估所生成序列规则的质量,采用了置信度和确信度等指标,进一步证明SRIU能够提升挖掘结果的相关性。