This paper presents a novel framework for continual feature selection (CFS) in data preprocessing, particularly in the context of an open and dynamic environment where unknown classes may emerge. CFS encounters two primary challenges: the discovery of unknown knowledge and the transfer of known knowledge. To this end, the proposed CFS method combines the strengths of continual learning (CL) with granular-ball computing (GBC), which focuses on constructing a granular-ball knowledge base to detect unknown classes and facilitate the transfer of previously learned knowledge for further feature selection. CFS consists of two stages: initial learning and open learning. The former aims to establish an initial knowledge base through multi-granularity representation using granular-balls. The latter utilizes prior granular-ball knowledge to identify unknowns, updates the knowledge base for granular-ball knowledge transfer, reinforces old knowledge, and integrates new knowledge. Subsequently, we devise an optimal feature subset mechanism that incorporates minimal new features into the existing optimal subset, often yielding superior results during each period. Extensive experimental results on public benchmark datasets demonstrate our method's superiority in terms of both effectiveness and efficiency compared to state-of-the-art feature selection methods.
翻译:本文提出了一种在数据预处理中进行连续特征选择(CFS)的新框架,尤其适用于未知类别可能出现的开放动态环境。CFS面临两大挑战:未知知识的发现与已知知识的迁移。为此,所提出的CFS方法结合了持续学习(CL)与粒球计算(GBC)的优势,通过构建粒球知识库来检测未知类别,并促进先前学习知识的迁移以进一步优化特征选择。CFS包含两个阶段:初始学习与开放学习。前者利用粒球的多粒度表示建立初始知识库;后者则借助先前粒球知识识别未知类别,更新知识库以实现粒球知识迁移,强化旧知识并整合新知识。随后,我们设计了一种最优特征子集机制,将最少的新特征融入现有最优子集中,在每一周期中往往能取得更优结果。在公开基准数据集上的大量实验表明,本方法在有效性与效率上均优于当前最先进的特征选择方法。