Association rule mining techniques can generate a large volume of sequential data when implemented on transactional databases. Extracting insights from a large set of association rules has been found to be a challenging process. When examining a ruleset, the fundamental question is how to summarise and represent meaningful mined knowledge efficiently. Many algorithms and strategies have been developed to address issue of knowledge extraction; however, the effectiveness of this process can be limited by the data structures. A better data structure can sufficiently affect the speed of the knowledge extraction process. This paper proposes a novel data structure, called the Trie of rules, for storing a ruleset that is generated by association rule mining. The resulting data structure is a prefix-tree graph structure made of pre-mined rules. This graph stores the rules as paths within the prefix-tree in a way that similar rules overlay each other. Each node in the tree represents a rule where a consequent is this node, and an antecedent is a path from this node to the root of the tree. The evaluation showed that the proposed representation technique is promising. It compresses a ruleset with almost no data loss and benefits in terms of time for basic operations such as searching for a specific rule and sorting, which is the base for many knowledge discovery methods. Moreover, our method demonstrated a significant improvement in traversing time, achieving an 8-fold increase compared to traditional data structures.
翻译:关联规则挖掘技术在事务数据库上实施时,可生成大量序列数据。从大规模关联规则集中提取洞察一直被视为一项具有挑战性的过程。在审视规则集时,核心问题在于如何高效地总结和表示有意义的挖掘知识。尽管已发展出诸多算法与策略以解决知识提取问题,但这一过程的效率可能受限于数据结构。更优的数据结构能够显著影响知识提取的速度。本文提出一种名为规则Trie的新型数据结构,用于存储由关联规则挖掘生成的规则集。所得数据结构是一种由预挖掘规则构成的前缀树图结构。该图将规则存储为前缀树中的路径,使得相似规则彼此重叠。树中每个节点代表一条规则,其中结论为该节点,前提是从该节点到树根的路径。评估表明,所提出的表示技术具有前景:它在几乎无数据损失的情况下压缩规则集,并在基本操作(如搜索特定规则和排序,这些是许多知识发现方法的基础)的时间效率方面具有优势。此外,我们的方法在遍历时间上表现出显著提升,相比传统数据结构实现了8倍的加速。