Quantification represents the problem of predicting class distributions in a dataset. It also represents a growing research field in supervised machine learning, for which a large variety of different algorithms has been proposed in recent years. However, a comprehensive empirical comparison of quantification methods that supports algorithm selection is not available yet. In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods on overall more than 40 data sets, considering binary as well as multiclass quantification settings. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods including the threshold selection-based Median Sweep and TSMax methods, the DyS framework, and Friedman's method that performs best in the binary setting. For the multiclass setting, we observe that a different group of algorithms yields good performance, including the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM algorithm for quantification, and Friedman's method. We also find that tuning the underlying classifiers has in most cases only a limited impact on the quantification performance. More generally, we find that the performance on multiclass quantification is inferior to the results obtained in the binary setting. Our results can guide practitioners who intend to apply quantification algorithms and help researchers to identify opportunities for future research.
翻译:量化(Quantification)处理的是数据集中类别分布预测问题,也是监督机器学习中一个快速发展的研究领域。近年来已有大量不同算法被提出,但目前尚无全面的量化方法实证比较研究支持算法选择。本研究通过系统实证比较24种量化方法在40余个数据集上的性能,涵盖二分类与多分类量化场景,填补了这一研究空白。结果表明,不存在普遍适用于所有场景的单一最优算法,但发现一组方法在二分类场景中表现最佳,包括基于阈值选择的Median Sweep和TSMax方法、DyS框架及Friedman方法。对于多分类场景,另一组算法展现了优异性能,包括广义概率调整计数(Generalized Probabilistic Adjusted Count)、readme方法、能量距离最小化方法、用于量化的EM算法及Friedman方法。我们还发现,基础分类器的调优在大多数情况下对量化性能影响有限。总体而言,多分类量化性能逊于二分类结果。本研究结果可指导实际应用量化算法的从业者,并帮助研究者识别未来研究方向。