We introduce a general iterative procedure called risk-based calibration (RC) designed to minimize the empirical risk under the 0-1 loss (empirical error) for probabilistic classifiers. These classifiers are based on modeling probability distributions, including those constructed from the joint distribution (generative) and those based on the class conditional distribution (conditional). RC can be particularized to any probabilistic classifier provided a specific learning algorithm that computes the classifier's parameters in closed form using data statistics. RC reinforces the statistics aligned with the true class while penalizing those associated with other classes, guided by the 0-1 loss. The proposed method has been empirically tested on 30 datasets using na\"ive Bayes, quadratic discriminant analysis, and logistic regression classifiers. RC improves the empirical error of the original closed-form learning algorithms and, more notably, consistently outperforms the gradient descent approach with the three classifiers.
翻译:我们提出了一种称为基于风险的校准(RC)的通用迭代方法,旨在最小化概率分类器在0-1损失(经验误差)下的经验风险。这些分类器基于概率分布建模,包括从联合分布(生成式)构建的分类器以及基于类条件分布(条件式)构建的分类器。只要提供一种特定的学习算法,能够利用数据统计量以闭式形式计算分类器参数,RC便可适用于任何概率分类器。在0-1损失的指导下,RC强化与真实类别一致的统计量,同时惩罚与其他类别相关的统计量。所提方法已在30个数据集上使用朴素贝叶斯、二次判别分析和逻辑回归分类器进行了实证检验。RC改善了原始闭式学习算法的经验误差,更显著的是,在三种分类器上均持续优于梯度下降方法。