The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.
翻译:数据逻辑分析(LAD)是一种基于析取范式表示的布尔函数生成二分类器的技术。尽管LAD算法采用了优化技术,但由此产生的二分类器或二值规则并不会导致过拟合。我们通过估计由少量立方单项式构成的析取范式假设集对应的LAD模型的Vapnik-Chervonenkis维度(VC维),提出了一种无过拟合现象的理论依据。我们通过实验验证并证实了上述观察结果。