Guided Exploration of Sequential Rules

In pattern mining, sequential rules provide a formal framework to capture the temporal relationships and inferential dependencies between items. However, the discovery process is computationally intensive. To obtain mining results efficiently and flexibly, many methods have been proposed that rely on specific evaluation metrics (i.e., ensuring results meet minimum threshold requirements). A key issue with these methods, however, is that they generate many sequential rules that are irrelevant to users. Such rules not only incur additional computational overhead but also complicate downstream analysis. In this paper, we investigate how to efficiently discover user-centric sequential rules. The original database is first processed to determine whether a target query rule is present. To prune unpromising items and avoid unnecessary expansions, we design tight and generalizable upper bounds. We introduce a novel method for efficiently generating target sequential rules using the proposed techniques and pruning strategies. In addition, we propose the corresponding mining algorithms for two common evaluation metrics: frequency and utility. We also design two rule similarity metrics to help discover the most relevant sequential rules. Extensive experiments demonstrate that our algorithms outperform state-of-the-art approaches in terms of runtime and memory usage, while discovering a concise set of sequential rules under flexible similarity settings. Targeted sequential rule search can handle sequence data with personalized features and achieve pattern discovery. The proposed solution addresses several challenges and can be applied to two common mining tasks.

翻译：在模式挖掘中，序列规则提供了一个形式化框架，用于捕捉项之间的时序关系与推断依赖。然而，发现过程计算密集。为高效灵活地获取挖掘结果，已有许多方法被提出，这些方法依赖于特定的评估指标（即确保结果满足最小阈值要求）。但这些方法的一个关键问题在于，它们会生成大量与用户无关的序列规则。此类规则不仅带来额外的计算开销，还会使下游分析复杂化。本文研究了如何高效发现以用户为中心的序列规则。首先对原始数据库进行处理，以判断是否存在目标查询规则。为剪枝无前景的项并避免不必要的扩展，我们设计了紧致且可泛化的上界。我们引入了一种新颖的方法，利用所提出的技术与剪枝策略高效生成目标序列规则。此外，我们针对两种常见的评估指标——频度与效用——提出了相应的挖掘算法。我们还设计了两种规则相似性度量，以帮助发现最相关的序列规则。大量实验表明，我们的算法在运行时间和内存使用方面优于现有先进方法，同时能在灵活的相似性设置下发现一组简洁的序列规则。定向序列规则搜索能够处理具有个性化特征的序列数据，并实现模式发现。所提出的解决方案解决了若干挑战，可应用于两种常见的挖掘任务。