Generative classifiers are constructed on the basis of a joint probability distribution and are typically learned using closed-form procedures that rely on data statistics and maximize scores related to data fitting. However, these scores are not directly linked to supervised classification metrics such as the error, i.e., the expected 0-1 loss. To address this limitation, we propose a learning procedure called risk-based calibration (RC) that iteratively refines the generative classifier by adjusting its joint probability distribution according to the 0-1 loss in training samples. This is achieved by reinforcing data statistics associated with the true classes while weakening those of incorrect classes. As a result, the classifier progressively assigns higher probability to the correct labels, improving its training error. Results on 20 heterogeneous datasets using both na\"ive Bayes and quadratic discriminant analysis show that RC significantly outperforms closed-form learning procedures in terms of both training error and generalization error. In this way, RC bridges the gap between traditional generative approaches and learning procedures guided by performance measures, ensuring a closer alignment with supervised classification objectives.
翻译:生成式分类器建立在联合概率分布基础上,通常采用依赖数据统计量并最大化与数据拟合相关得分的闭式学习过程。然而,这些得分与监督分类指标(如错误率,即期望0-1损失)并无直接关联。为克服这一局限,我们提出一种称为基于风险校准(RC)的学习方法,该方法根据训练样本的0-1损失迭代优化生成式分类器,通过调整其联合概率分布实现。具体而言,RC会强化与真实类别相关的数据统计量,同时削弱错误类别的统计量。这使得分类器逐渐为正确标签分配更高概率,从而改善训练误差。在20个异构数据集上使用朴素贝叶斯和二次判别分析的结果表明,RC在训练误差和泛化误差方面均显著优于闭式学习方法。通过这种方式,RC弥合了传统生成式方法与性能指标引导的学习过程之间的鸿沟,确保与监督分类目标更紧密地结合。