Association Rule Mining (ARM) is the task of mining patterns among data features in the form of logical rules, with applications across a myriad of domains. However, high-dimensional datasets often result in an excessive number of rules, increasing execution time and negatively impacting downstream task performance. Managing this rule explosion remains a central challenge in ARM research. To address this, we introduce Aerial+, a novel neurosymbolic ARM method. Aerial+ leverages an under-complete autoencoder to create a neural representation of the data, capturing associations between features. It extracts rules from this neural representation by exploiting the model's reconstruction mechanism. Extensive evaluations on five datasets against seven baselines demonstrate that Aerial+ achieves state-of-the-art results by learning more concise, high-quality rule sets with full data coverage. When integrated into rule-based interpretable machine learning models, Aerial+ significantly reduces execution time while maintaining or improving accuracy.
翻译:关联规则挖掘(ARM)是从数据特征中挖掘逻辑规则形式的模式的任务,在众多领域具有广泛应用。然而,高维数据集通常会导致规则数量激增,从而增加执行时间并对下游任务性能产生负面影响。管理这种规则爆炸问题仍然是ARM研究中的核心挑战。为此,我们提出了一种新颖的神经符号ARM方法——Aerial+。Aerial+利用欠完备自编码器创建数据的神经表示,以捕获特征间的关联。该方法通过利用模型的重建机制,从神经表示中提取规则。在五个数据集上对七种基线方法进行的广泛评估表明,Aerial+通过学习更简洁、高质量且具有完全数据覆盖的规则集,实现了最先进的性能。当将Aerial+集成到基于规则的可解释机器学习模型中时,该方法在保持或提高准确性的同时,显著减少了执行时间。