Due to the rapid development of science and technology, the importance of imprecise, noisy, and uncertain data is increasing at an exponential rate. Thus, mining patterns in uncertain databases have drawn the attention of researchers. Moreover, frequent sequences of items from these databases need to be discovered for meaningful knowledge with great impact. In many real cases, weights of items and patterns are introduced to find interesting sequences as a measure of importance. Hence, a constraint of weight needs to be handled while mining sequential patterns. Besides, due to the dynamic nature of databases, mining important information has become more challenging. Instead of mining patterns from scratch after each increment, incremental mining algorithms utilize previously mined information to update the result immediately. Several algorithms exist to mine frequent patterns and weighted sequences from incremental databases. However, these algorithms are confined to mine the precise ones. Therefore, we have developed an algorithm to mine frequent sequences in an uncertain database in this work. Furthermore, we have proposed two new techniques for mining when the database is incremental. Extensive experiments have been conducted for performance evaluation. The analysis showed the efficiency of our proposed framework.
翻译:随着科学技术的快速发展,不精确、含噪和不确定数据的重要性呈指数级增长。因此,在不确定数据库中挖掘模式已引起研究人员的关注。此外,需要从这些数据库中发现具有重大影响的有意义知识的频繁项序列。在许多实际案例中,引入项和模式的权重作为重要性度量来寻找有趣的序列。因此,在挖掘序列模式时需要处理权重约束。同时,由于数据库的动态特性,挖掘重要信息变得更具挑战性。增量挖掘算法不是每次增量后从头开始挖掘模式,而是利用先前挖掘的信息立即更新结果。已有多种算法可从增量数据库中挖掘频繁模式和加权序列。然而,这些算法局限于挖掘精确模式。因此,在本工作中,我们开发了一种在不确定数据库中挖掘频繁序列的算法。此外,我们提出了两种针对增量数据库的新挖掘技术。我们进行了大量实验以评估性能,分析结果表明了我们提出的框架的高效性。