In the Naive Bayes classification model the class conditional densities are estimated as the products of their marginal densities along the cardinal basis directions. We study the problem of obtaining an alternative basis for this factorisation with the objective of enhancing the discriminatory power of the associated classification model. We formulate the problem as a projection pursuit to find the optimal linear projection on which to perform classification. Optimality is determined based on the multinomial likelihood within which probabilities are estimated using the Naive Bayes factorisation of the projected data. Projection pursuit offers the added benefits of dimension reduction and visualisation. We discuss an intuitive connection with class conditional independent components analysis, and show how this is realised visually in practical applications. The performance of the resulting classification models is investigated using a large collection of (162) publicly available benchmark data sets and in comparison with relevant alternatives. We find that the proposed approach substantially outperforms other popular probabilistic discriminant analysis models and is highly competitive with Support Vector Machines. Code to implement the proposed approach, in the form of an R package, is available from https://github.com/DavidHofmeyr/OPNB
翻译:在朴素贝叶斯分类模型中,类条件密度被估计为沿基数方向上的边缘密度之积。本研究旨在探索为这一因式分解获取一个替代基,以增强相关分类模型的判别能力。我们将该问题表述为一种投影寻踪,旨在找到执行分类的最优线性投影。最优性基于多项似然确定,其中概率使用投影数据的朴素贝叶斯因式分解进行估计。投影寻踪还带来了降维和可视化的额外优势。我们讨论了其与类条件独立成分分析的一种直观联系,并展示了在实际应用中如何可视化地实现这种联系。我们使用一个包含大量(162个)公开可用基准数据集的集合,并与相关替代方法进行比较,研究了所得分类模型的性能。研究发现,所提出的方法显著优于其他流行的概率判别分析模型,并且与支持向量机相比具有高度竞争力。实现所提方法的代码(以R包形式提供)可从 https://github.com/DavidHofmeyr/OPNB 获取。