Within the domain of data mining, one critical objective is the discovery of sequential rules with high utility. The goal is to discover sequential rules that exhibit both high utility and strong confidence, which are valuable in real-world applications. However, existing high-utility sequential rule mining algorithms suffer from redundant utility computations, as different rules may consist of the same sequence of items. When these items can form multiple distinct rules, additional utility calculations are required. To address this issue, this study proposes a sequential rule mining algorithm that utilizes segmentation guided by confidence (RSC), which employs confidence-guided segmentation to reduce redundant utility computation. It adopts a method that precomputes the confidence of segmented rules by leveraging the support of candidate subsequences in advance. Once the segmentation point is determined, all rules with different antecedents and consequents are generated simultaneously. RSC uses a utility-linked table to accelerate candidate sequence generation and introduces a stricter utility upper bound, called the reduced remaining utility of a sequence, to address sequences with duplicate items. Finally, the proposed RSC method was evaluated on multiple datasets, and the results demonstrate improvements over state-of-the-art approaches.
翻译:在数据挖掘领域中,一个关键目标是发现具有高效用性的序列规则。该目标旨在挖掘既展现高效用性又具备强置信度的序列规则,这类规则在实际应用中具有重要价值。然而,现有高效用序列规则挖掘算法存在效用计算冗余的问题,因为不同的规则可能由相同的项目序列构成。当这些项目能够形成多个不同规则时,就需要进行额外的效用计算。为解决这一问题,本研究提出一种利用置信度引导分割的序列规则挖掘算法(RSC),该算法通过置信度引导的分割来减少冗余的效用计算。它采用一种方法,通过预先利用候选子序列的支持度来预计算分割规则的置信度。一旦分割点确定,所有具有不同前件和后件的规则将同时生成。RSC使用效用链表来加速候选序列的生成,并引入一种更严格的效用上界,称为序列的缩减剩余效用,以处理包含重复项目的序列。最后,所提出的RSC方法在多个数据集上进行了评估,结果表明其性能优于现有先进方法。