An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification

Predictive models may generate biased predictions when classifying imbalanced datasets. This happens when the model favors the majority class, leading to low performance in accurately predicting the minority class. To address this issue, balancing or resampling methods are critical data-centric AI approaches in the modeling process to improve prediction performance. However, there have been debates and questions about the functionality of these methods in recent years. In particular, many candidate models may exhibit very similar predictive performance, called the Rashomon effect, in model selection, and they may even produce different predictions for the same observations. Selecting one of these models without considering the predictive multiplicity -- which is the case of yielding conflicting models' predictions for any sample -- can result in blind selection. In this paper, the impact of balancing methods on predictive multiplicity is examined using the Rashomon effect. It is crucial because the blind model selection in data-centric AI is risky from a set of approximately equally accurate models. This may lead to severe problems in model selection, validation, and explanation. To tackle this matter, we conducted real dataset experiments to observe the impact of balancing methods on predictive multiplicity through the Rashomon effect by using a newly proposed metric obscurity in addition to the existing ones: ambiguity and discrepancy. Our findings showed that balancing methods inflate the predictive multiplicity and yield varying results. To monitor the trade-off between the prediction performance and predictive multiplicity for conducting the modeling process responsibly, we proposed using the extended version of the performance-gain plot when balancing the training data.

翻译：在对不平衡数据集进行分类时，预测模型可能产生有偏差的预测。当模型倾向于多数类时，会导致对少数类的准确预测性能低下。为解决此问题，平衡或重采样方法作为建模过程中以数据为中心的关键人工智能方法，对于提升预测性能至关重要。然而，近年来关于这些方法的功能一直存在争议和疑问。特别是在模型选择中，许多候选模型可能表现出非常相似的预测性能（称为拉希蒙效应），它们甚至可能对相同的观测样本产生不同的预测。若在选择模型时未考虑预测多重性——即对于任何样本可能产生相互冲突的模型预测的情况——则可能导致盲目选择。本文通过拉希蒙效应研究了平衡方法对预测多重性的影响。这一点至关重要，因为在以数据为中心的人工智能中，从一组近似同等准确的模型中进行盲目选择具有风险，可能导致模型选择、验证和解释方面的严重问题。为解决此问题，我们通过真实数据集实验，除使用现有指标（模糊性和差异性）外，还采用新提出的混淆度指标，观察平衡方法通过拉希蒙效应对预测多重性的影响。我们的研究结果表明，平衡方法会加剧预测多重性并产生不同的结果。为在建模过程中负责任地权衡预测性能与预测多重性，我们建议在平衡训练数据时使用扩展版的性能增益图。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日