Although being a crucial question for the development of machine learning algorithms, there is still no consensus on how to compare classifiers over multiple data sets with respect to several criteria. Every comparison framework is confronted with (at least) three fundamental challenges: the multiplicity of quality criteria, the multiplicity of data sets and the randomness of the selection of data sets. In this paper, we add a fresh view to the vivid debate by adopting recent developments in decision theory. Based on so-called preference systems, our framework ranks classifiers by a generalized concept of stochastic dominance, which powerfully circumvents the cumbersome, and often even self-contradictory, reliance on aggregates. Moreover, we show that generalized stochastic dominance can be operationalized by solving easy-to-handle linear programs and moreover statistically tested employing an adapted two-sample observation-randomization test. This yields indeed a powerful framework for the statistical comparison of classifiers over multiple data sets with respect to multiple quality criteria simultaneously. We illustrate and investigate our framework in a simulation study and with a set of standard benchmark data sets.
翻译:尽管这是机器学习算法发展的关键问题,但对于如何在多个数据集上根据多个准则比较分类器,目前仍未有共识。每个比较框架都面临(至少)三个基本挑战:质量准则的多样性、数据集的多样性以及数据集选择的随机性。在本文中,我们通过采用决策理论的最新进展,为这场激烈的辩论增添了新的视角。基于所谓的偏好系统,我们的框架通过广义随机占优概念对分类器进行排序,该概念巧妙地规避了依赖聚合指标这一繁琐且往往自相矛盾的问题。此外,我们证明了广义随机占优可以通过求解易于处理的线性规划来实现操作化,并且可以通过采用经过调整的双样本观测随机化检验进行统计检验。这确实为在多个数据集上同时根据多个质量准则进行分类器统计比较提供了一个强有力的框架。我们通过仿真研究和一组标准基准数据集来展示并验证我们的框架。