In today's data-intensive landscape, where high-dimensional datasets are increasingly common, reducing the number of input features is essential to prevent overfitting and improve model accuracy. Despite numerous efforts to tackle dimensionality reduction, most approaches apply a universal set of features across all classes, potentially missing the unique characteristics of individual classes. This paper presents the Explainable Class-Specific Naive Bayes (XNB) classifier, which introduces two critical innovations: 1) the use of Kernel Density Estimation to calculate posterior probabilities, allowing for a more accurate and flexible estimation process, and 2) the selection of class-specific feature subsets, ensuring that only the most relevant variables for each class are utilized. Extensive empirical analysis on high-dimensional genomic datasets shows that XNB matches the classification performance of traditional Naive Bayes while drastically improving model interpretability. By isolating the most relevant features for each class, XNB not only reduces the feature set to a minimal, distinct subset for each class but also provides deeper insights into how the model makes predictions. This approach offers significant advantages in fields where both precision and explainability are critical.
翻译:在当今数据密集型环境下,高维数据集日益普遍,减少输入特征数量对于防止过拟合并提升模型精度至关重要。尽管已有诸多降维方法,但大多数方法对所有类别采用统一的特征集,可能忽略了各个类别的独特特性。本文提出了可解释的类特定朴素贝叶斯(XNB)分类器,其引入了两项关键创新:1)采用核密度估计计算后验概率,实现了更精确灵活的概率估计过程;2)选择类特定特征子集,确保仅使用对每个类别最相关的变量。在高维基因组数据集上的大量实证分析表明,XNB在保持传统朴素贝叶斯分类性能的同时,显著提升了模型可解释性。通过为每个类别隔离最相关的特征,XNB不仅将特征集缩减为各类别最小且独特的子集,还更深入地揭示了模型的预测机制。该方法在精度与可解释性均至关重要的领域具有显著优势。