Predictive pattern mining is an approach used to construct prediction models when the input is represented by structured data, such as sets, graphs, and sequences. The main idea behind predictive pattern mining is to build a prediction model by considering substructures, such as subsets, subgraphs, and subsequences (referred to as patterns), present in the structured data as features of the model. The primary challenge in predictive pattern mining lies in the exponential growth of the number of patterns with the complexity of the structured data. In this study, we propose the Safe Pattern Pruning (SPP) method to address the explosion of pattern numbers in predictive pattern mining. We also discuss how it can be effectively employed throughout the entire model building process in practical data analysis. To demonstrate the effectiveness of the proposed method, we conduct numerical experiments on regression and classification problems involving sets, graphs, and sequences.
翻译:预测模式挖掘是一种用于构建预测模型的方法,其输入数据以结构化形式表示,例如集合、图与序列。该方法的核心理念是通过将结构化数据中的子结构(如子集、子图与子序列,统称为模式)作为模型特征来建立预测模型。预测模式挖掘的主要挑战在于:模式数量随结构化数据复杂度的增长呈指数级膨胀。本研究提出安全模式剪枝(SPP)方法以解决预测模式挖掘中的模式爆炸问题,并进一步探讨如何在数据分析实践中将其有效应用于整个模型构建过程。为验证所提方法的有效性,我们针对涉及集合、图与序列的回归与分类问题开展了数值实验。