Ensemble methods are commonly used in classification due to their remarkable performance. Achieving high accuracy in a data stream environment is a challenging task considering disruptive changes in the data distribution, also known as concept drift. A greater diversity of ensemble components is known to enhance prediction accuracy in such settings. Despite the diversity of components within an ensemble, not all contribute as expected to its overall performance. This necessitates a method for selecting components that exhibit high performance and diversity. We present a novel ensemble construction and maintenance approach based on MMR (Maximal Marginal Relevance) that dynamically combines the diversity and prediction accuracy of components during the process of structuring an ensemble. The experimental results on both four real and 11 synthetic datasets demonstrate that the proposed approach (DynED) provides a higher average mean accuracy compared to the five state-of-the-art baselines.
翻译:论文摘要:集成方法因其卓越性能而在分类中广泛使用。在数据流环境中实现高精度是一项具有挑战性的任务,因为这需要考虑数据分布中的突变性变化(即概念漂移)。已知在集成组件中实现更高的多样性可提升此类场景下的预测精度。尽管集成内部组件具有多样性,但并非所有组件都能按预期对整体性能做出贡献。因此需要一种方法来选择兼具高性能与多样性的组件。本文提出一种基于最大边际相关性(MMR)的新型集成构建与维护方法,该方法在集成构建过程中动态融合组件的多样性与预测精度。在四个真实数据集和十一个合成数据集上的实验结果表明,所提出的DynED方法相较五个最先进基线方法取得了更高的平均精度。