Ultra-imbalanced classification guided by statistical information

Imbalanced data are frequently encountered in real-world classification tasks. Previous works on imbalanced learning mostly focused on learning with a minority class of few samples. However, the notion of imbalance also applies to cases where the minority class contains abundant samples, which is usually the case for industrial applications like fraud detection in the area of financial risk management. In this paper, we take a population-level approach to imbalanced learning by proposing a new formulation called \emph{ultra-imbalanced classification} (UIC). Under UIC, loss functions behave differently even if infinite amount of training samples are available. To understand the intrinsic difficulty of UIC problems, we borrow ideas from information theory and establish a framework to compare different loss functions through the lens of statistical information. A novel learning objective termed Tunable Boosting Loss is developed which is provably resistant against data imbalance under UIC, as well as being empirically efficient verified by extensive experimental studies on both public and industrial datasets.

翻译：在实际分类任务中，不平衡数据是常见现象。先前关于不平衡学习的研究大多集中于少数类样本稀缺的情况。然而，不平衡的概念同样适用于少数类包含大量样本的场景，这在金融风险管理领域的欺诈检测等工业应用中尤为典型。本文从总体层面研究不平衡学习问题，提出了一种称为"超不平衡分类"的新理论框架。在UIC框架下，即使存在无限训练样本，损失函数仍会呈现不同的行为特性。为探究UIC问题的本质困难，我们借鉴信息论思想，建立了通过统计信息视角比较不同损失函数的分析框架。在此基础上，我们提出了一种名为"可调增强损失"的新型学习目标，该目标在理论上被证明能够有效抵抗UIC下的数据不平衡问题，同时通过在公开数据集和工业数据集上的大量实验研究，验证了其在实际应用中的高效性。

相关内容

UIC

关注 0

第16届IEEE泛在智能与计算国际会议（IEEE International Conference on Ubiquitous Intelligence and Computing 2019）将包括一个高选择性的技术论文计划，并附有研讨会、演示、小组讨论和主题演讲。我们欢迎高质量的论文，这些论文描述了推动普适智能和计算技术发展的原创和未发表的研究。官网链接：http://www.smart-world.org/2019/uic/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日