Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures and benchmarks on how to evaluate these algorithms. This work proposes a standardized, exhaustive, and comprehensive experimental framework to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to a large-scale experimental study comparing state-of-the-art classifiers in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental framework is fully reproducible and easy to extend with new methods. This way, we propose a standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create complete, trustworthy, and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.
翻译:类别不平衡为数据流分类带来了新挑战。近期文献中提出的众多算法采用数据级、算法级和集成方法等多种策略应对该问题。然而,当前缺乏标准化且公认的评估流程与基准。本文提出一个标准化、详尽且全面的实验框架,用于评估算法在多种具挑战性的不平衡数据流场景中的性能。实验研究在515条不平衡数据流上评估了24种最先进的数据流算法,这些数据流结合了静态与动态类别不平衡比率、实例级难度、概念漂移、真实世界与半合成数据集,涵盖二分类与多分类场景。这构成了数据流挖掘领域大规模比较最先进分类器的实验研究。我们探讨了各场景中最先进分类器的优缺点,并为终端用户选择最佳不平衡数据流算法提供了通用建议。此外,我们提出了该领域的未决挑战与未来方向。我们的实验框架完全可复现且易于扩展以集成新方法。通过此举,我们提出了一种标准化方法,以进行不平衡数据流实验,可供其他研究人员用于对所提新方法进行完整、可信且公平的评估。实验框架可从 https://github.com/canoalberto/imbalanced-streams 下载。