Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the ``one-versus-rest" approach. To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlapping rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically ``independent" and hence truly unordered.
翻译:规则集学习因其可解释性近来频繁被重新审视。然而,现有方法存在若干缺陷。首先,大多数现有方法显式或隐式地对规则施加顺序,这降低了模型的可理解性。其次,由于处理重叠(即被多条规则覆盖的实例)引发的冲突存在困难,现有方法通常不考虑概率性规则。第三,针对多类别目标的分类规则学习研究不足,因为大多数现有方法聚焦于二分类或通过"一对多"方法处理多分类。为解决这些缺陷,我们提出TURS(真正无顺序规则集)。为化解重叠规则引起的冲突,我们提出一种新模型,利用规则集的概率属性,其直觉是仅允许具有相似概率输出的规则重叠。接下来,我们基于MDL原则形式化学习TURS模型的问题,并开发精心设计的启发式算法。我们与大量基于规则的方法进行基准测试,证明我们的方法能够学习到模型复杂度更低且预测性能极具竞争力的规则集。此外,我们通过实验表明,我们模型中的规则在经验上是"独立的",因此是真正无顺序的。