In real-world scenarios, extensive manual annotation for continual learning is impractical due to prohibitive costs. Although prior arts, influenced by large-scale webly supervised training, suggest leveraging web-scraped data in continual learning, this poses challenges such as data imbalance, usage restrictions, and privacy concerns. Addressing the risks of continual webly supervised training, we present an online continual learning framework - Generative Name only Continual Learning (G-NoCL). The proposed G-NoCL uses a set of generators G along with the learner. When encountering new concepts (i.e., classes), G-NoCL employs the novel sample complexity-guided data ensembling technique DIverSity and COmplexity enhancing ensemBlER (DISCOBER) to optimally sample training data from generated data. Through extensive experimentation, we demonstrate superior performance of DISCOBER in G-NoCL online CL benchmarks, covering both In-Distribution (ID) and Out-of-Distribution (OOD) generalization evaluations, compared to naive generator-ensembling, web-supervised, and manually annotated data.
翻译:在现实场景中,持续学习所需的大量人工标注因成本过高而不可行。尽管受大规模网络监督训练的影响,现有方法提出在持续学习中利用网络爬取数据,但这会带来数据不平衡、使用限制和隐私问题等挑战。为应对持续网络监督训练的风险,我们提出了一种在线持续学习框架——仅基于名称的生成式持续学习(G-NoCL)。该框架使用一组生成器G与学习器协作。当遇到新概念(即新类别)时,G-NoCL采用新颖的样本复杂度引导数据集成技术——多样性-复杂度增强集成器(DISCOBER),从生成数据中优化采样训练数据。通过大量实验,我们证明了在G-NoCL在线持续学习基准测试中,DISCOBER在分布内(ID)和分布外(OOD)泛化评估中均优于朴素生成器集成、网络监督数据和人工标注数据的显著性能。