In this uncertain world, data uncertainty is inherent in many applications and its importance is growing drastically due to the rapid development of modern technologies. Nowadays, researchers have paid more attention to mine patterns in uncertain databases. A few recent works attempt to mine frequent uncertain sequential patterns. Despite their success, they are incompetent to reduce the number of false-positive pattern generation in their mining process and maintain the patterns efficiently. In this paper, we propose multiple theoretically tightened pruning upper bounds that remarkably reduce the mining space. A novel hierarchical structure is introduced to maintain the patterns in a space-efficient way. Afterward, we develop a versatile framework for mining uncertain sequential patterns that can effectively handle weight constraints as well. Besides, with the advent of incremental uncertain databases, existing works are not scalable. There exist several incremental sequential pattern mining algorithms, but they are limited to mine in precise databases. Therefore, we propose a new technique to adapt our framework to mine patterns when the database is incremental. Finally, we conduct extensive experiments on several real-life datasets and show the efficacy of our framework in different applications.
翻译:在这个充满不确定性的世界中,数据不确定性在许多应用场景中固有存在,且随着现代技术的飞速发展,其重要性日益凸显。当前,研究者们更加关注不确定数据库中的模式挖掘问题。近期部分工作尝试挖掘频繁不确定序列模式。尽管这些方法取得了一定成效,但它们在挖掘过程中无法有效减少假阳性模式的生成数量,且模式维护效率低下。本文提出了多个理论上收紧的剪枝上界,显著缩减了挖掘空间。我们引入了一种新颖的层次结构,以空间高效的方式维护模式。随后,我们开发了一个通用框架用于挖掘不确定序列模式,该框架还能有效处理权重约束。此外,随着增量式不确定数据库的出现,现有方法缺乏可扩展性。虽然存在若干增量式序列模式挖掘算法,但它们仅限于精确数据库中的挖掘。因此,我们提出了一种新技术,使我们的框架能够适应增量式数据库中的模式挖掘。最后,我们在多个真实数据集上进行了大量实验,展示了所提框架在不同应用中的有效性。