Association rule mining techniques can generate a large volume of sequential data when implemented on transactional databases. Extracting insights from a large set of association rules has been found to be a challenging process. When examining a ruleset, the fundamental question is how to summarise and represent meaningful mined knowledge efficiently. Many algorithms and strategies have been developed to address issue of knowledge extraction; however, the effectiveness of this process can be limited by the data structures. A better data structure can sufficiently affect the speed of the knowledge extraction process. This paper proposes a novel data structure, called the Trie of rules, for storing a ruleset that is generated by association rule mining. The resulting data structure is a prefix-tree graph structure made of pre-mined rules. This graph stores the rules as paths within the prefix-tree in a way that similar rules overlay each other. Each node in the tree represents a rule where a consequent is this node, and an antecedent is a path from this node to the root of the tree. The evaluation showed that the proposed representation technique is promising. It compresses a ruleset with almost no data loss and benefits in terms of time for basic operations such as searching for a specific rule and sorting, which is the base for many knowledge discovery methods. Moreover, our method demonstrated a significant improvement in traversing time, achieving an 8-fold increase compared to traditional data structures.
翻译:关联规则挖掘技术在应用于事务数据库时,可能生成大量序列数据。从大规模关联规则集中提取洞察一直是一项具有挑战性的过程。在审视规则集时,根本问题在于如何高效地总结和表示有意义的挖掘知识。尽管已有多种算法和策略被开发用于解决知识提取问题,但该过程的效率往往受限于数据结构。更优的数据结构能显著提升知识提取过程的速度。本文提出一种名为"规则字典树"的新型数据结构,用于存储由关联规则挖掘生成的规则集。该数据结构是由预挖掘规则构成的前缀树图结构,其中规则以路径形式存储在前缀树中,相似规则彼此重叠。树中的每个节点代表一条规则:该节点本身是结论,从该节点到树根的路径则是前提。评估结果表明,所提出的表示技术具有良好前景:它能在几乎无数据丢失的情况下压缩规则集,并在基本操作(如特定规则搜索与排序——众多知识发现方法的基础)方面具有时间效率优势。此外,与传统数据结构相比,本方法在遍历时间上实现了显著提升,达到8倍的性能增幅。