Few-shot class-incremental learning is crucial for developing scalable and adaptive intelligent systems, as it enables models to acquire new classes with minimal annotated data while safeguarding the previously accumulated knowledge. Nonetheless, existing methods deal with continuous data streams in a centralized manner, limiting their applicability in scenarios that prioritize data privacy and security. To this end, this paper introduces federated few-shot class-incremental learning, a decentralized machine learning paradigm tailored to progressively learn new classes from scarce data distributed across multiple clients. In this learning paradigm, clients locally update their models with new classes while preserving data privacy, and then transmit the model updates to a central server where they are aggregated globally. However, this paradigm faces several issues, such as difficulties in few-shot learning, catastrophic forgetting, and data heterogeneity. To address these challenges, we present a synthetic data-driven framework that leverages replay buffer data to maintain existing knowledge and facilitate the acquisition of new knowledge. Within this framework, a noise-aware generative replay module is developed to fine-tune local models with a balance of new and replay data, while generating synthetic data of new classes to further expand the replay buffer for future tasks. Furthermore, a class-specific weighted aggregation strategy is designed to tackle data heterogeneity by adaptively aggregating class-specific parameters based on local models performance on synthetic data. This enables effective global model optimization without direct access to client data. Comprehensive experiments across three widely-used datasets underscore the effectiveness and preeminence of the introduced framework.
翻译:小样本类增量学习对于开发可扩展和自适应智能系统至关重要,它使得模型能够利用最少的标注数据学习新类别,同时保护先前积累的知识。然而,现有方法以集中式方式处理连续数据流,这限制了其在重视数据隐私与安全的场景中的适用性。为此,本文提出了联邦小样本类增量学习,这是一种去中心化的机器学习范式,旨在从分布在多个客户端上的稀缺数据中逐步学习新类别。在该学习范式中,客户端在本地使用新类别更新其模型,同时保持数据隐私,然后将模型更新传输至中央服务器进行全局聚合。然而,该范式面临若干挑战,例如小样本学习困难、灾难性遗忘以及数据异构性。为应对这些挑战,我们提出了一种基于合成数据的框架,该框架利用回放缓冲区数据来维持现有知识并促进新知识的获取。在此框架内,开发了一个噪声感知生成回放模块,通过平衡新数据和回放数据对本地模型进行微调,同时生成新类别的合成数据以进一步扩展未来任务的回放缓冲区。此外,设计了一种类别特定的加权聚合策略,通过基于本地模型在合成数据上的性能自适应地聚合类别特定参数,以应对数据异构性问题。这使得无需直接访问客户端数据即可实现有效的全局模型优化。在三个广泛使用的数据集上进行的大量实验验证了所提出框架的有效性和优越性。