The cost of errors related to machine learning classifiers, namely, false positives and false negatives, are not equal and are application dependent. For example, in cybersecurity applications, the cost of not detecting an attack is very different from marking a benign activity as an attack. Various design choices during machine learning model building, such as hyperparameter tuning and model selection, allow a data scientist to trade-off between these two errors. However, most of the commonly used metrics to evaluate model quality, such as $F_1$ score, which is defined in terms of model precision and recall, treat both these errors equally, making it difficult for users to optimize for the actual cost of these errors. In this paper, we propose a new cost-aware metric, $C_{score}$ based on precision and recall that can replace $F_1$ score for model evaluation and selection. It includes a cost ratio that takes into account the differing costs of handling false positives and false negatives. We derive and characterize the new cost metric, and compare it to $F_1$ score. Further, we use this metric for model thresholding for five cybersecurity related datasets for multiple cost ratios. The results show an average cost savings of 49%.
翻译:机器学习分类器相关错误(即假阳性和假阴性)的成本并不相等,且依赖于具体应用场景。例如,在网络安全应用中,未能检测到攻击的成本与将良性活动标记为攻击的成本截然不同。在机器学习模型构建过程中的各种设计选择,如超参数调优和模型选择,允许数据科学家在这两种错误之间进行权衡。然而,大多数常用的模型质量评估指标(如基于模型精确率和召回率定义的$F_1$分数)均平等对待这两种错误,使得用户难以针对这些错误的实际成本进行优化。本文提出一种新的成本感知指标$C_{score}$,该指标基于精确率和召回率构建,可替代$F_1$分数用于模型评估与选择。它包含一个成本比率参数,用于考量处理假阳性与假阴性的不同成本。我们推导并刻画了这一新的成本度量指标,并将其与$F_1$分数进行比较。此外,我们在五种网络安全相关数据集上针对多个成本比率应用该指标进行模型阈值设定。结果显示平均可节省49%的成本。