In fields such as medicine and drug discovery, the ultimate goal of a classification is not to guess a class, but to choose the optimal course of action among a set of possible ones, usually not in one-one correspondence with the set of classes. This decision-theoretic problem requires sensible probabilities for the classes. Probabilities conditional on the features are computationally almost impossible to find in many important cases. The main idea of the present work is to calculate probabilities conditional not on the features, but on the trained classifier's output. This calculation is cheap, needs to be made only once, and provides an output-to-probability "transducer" that can be applied to all future outputs of the classifier. In conjunction with problem-dependent utilities, the probabilities of the transducer allow us to find the optimal choice among the classes or among a set of more general decisions, by means of expected-utility maximization. This idea is demonstrated in a simplified drug-discovery problem with a highly imbalanced dataset. The transducer and utility maximization together always lead to improved results, sometimes close to theoretical maximum, for all sets of problem-dependent utilities. The one-time-only calculation of the transducer also provides, automatically: (i) a quantification of the uncertainty about the transducer itself; (ii) the expected utility of the augmented algorithm (including its uncertainty), which can be used for algorithm selection; (iii) the possibility of using the algorithm in a "generative mode", useful if the training dataset is biased.
翻译:在医学和药物发现等领域,分类的最终目标并非猜测类别,而是在一组可能的行动方案中(通常与类别集合并非一一对应)选择最优策略。这一决策理论问题需要类别具有合理的概率。在许多重要情况下,基于特征的条件概率在计算上几乎无法实现。本文的核心思路是:不计算基于特征的概率,而是基于训练后分类器输出的条件概率。这种计算成本低廉,仅需执行一次,即可生成一个“输出-概率”转换器,可应用于该分类器的所有未来输出。结合问题相关的效用函数,该转换器提供的概率可通过期望效用最大化,在类别或更广泛的决策集合中找到最优选择。这一思路在一个高度不平衡数据集的简化药物发现问题中得到验证。对于所有问题相关的效用函数集合,转换器与效用最大化的结合始终能带来改进的结果,有时甚至接近理论最优值。转换器的一次性计算还能自动提供:(i) 对转换器自身不确定性的量化;(ii) 增强算法(包括其不确定性)的期望效用,可用于算法选择;(iii) 在训练数据集存在偏差时,以“生成模式”使用该算法的可能性。