Pattern discovery plays a central role in both descriptive and predictive tasks across multiple domains. Actionable patterns must meet rigorous statistical significance criteria and, in the presence of target variables, further uphold discriminative power. Our work addresses the underexplored area of guiding pattern discovery by integrating statistical significance and discriminative power criteria into state-of-the-art algorithms while preserving pattern quality. We also address how pattern quality thresholds, imposed by some algorithms, can be rectified to accommodate these additional criteria. To test the proposed methodology, we select the triclustering task as the guiding pattern discovery case and extend well-known greedy and multi-objective optimization triclustering algorithms, $\delta$-Trimax and TriGen, that use various pattern quality criteria, such as Mean Squared Residual (MSR), Least Squared Lines (LSL), and Multi Slope Measure (MSL). Results from three case studies show the role of the proposed methodology in discovering patterns with pronounced improvements of discriminative power and statistical significance without quality deterioration, highlighting its importance in supervisedly guiding the search. Although the proposed methodology is motivated over multivariate time series data, it can be straightforwardly extended to pattern discovery tasks involving multivariate, N-way (N>3), transactional, and sequential data structures. Availability: The code is freely available at https://github.com/JupitersMight/MOF_Triclustering under the MIT license.
翻译:模式发现在多个领域的描述性和预测性任务中均占据核心地位。有效模式必须满足严格的统计显著性标准,并在存在目标变量的情况下进一步保持判别能力。本研究针对一个尚未充分探索的领域,通过将统计显著性和判别能力标准整合进最先进的算法中,同时保持模式质量,来引导模式发现。我们还探讨了如何调整某些算法施加的模式质量阈值,以适应这些附加标准。为测试所提出的方法,我们选取三重聚类任务作为模式发现的指导案例,并对使用多种模式质量准则(如均方残差(MSR)、最小二乘线(LSL)和多斜率度量(MSL))的知名贪心算法与多目标优化三重聚类算法$\delta$-Trimax和TriGen进行了扩展。三个案例研究的结果表明,所提出的方法在发现模式时能够在不降低质量的前提下显著提升判别能力和统计显著性,凸显了其在监督性引导搜索中的重要性。尽管所提出方法源于对多元时间序列数据的验证,但其可直接推广至涉及多元、N维(N>3)、交易及序列数据结构的模式发现任务。可用性:代码基于MIT许可协议在https://github.com/JupitersMight/MOF_Triclustering免费获取。