Online learning methods often rely on supervised data. However, under data distribution shifts, such as in continual learning (CL), where continuously arriving online data streams incorporate new concepts (e.g., classes), real-time manual annotation is impractical due to its costs and latency, which hinder real-time adaptation. To alleviate this, 'name-only' setup has been proposed, requiring only the name of concepts, not the supervised samples. A recent approach tackles this setup by supplementing data with web-scraped images, but such data often suffers from issues of data imbalance, noise, and copyright. To overcome the limitations of both human supervision and webly supervision, we propose GenOL using generative models for name-only training. But naive application of generative models results in limited diversity of generated data. Here, we enhance (i) intra-diversity, the diversity of images generated by a single model, by proposing a diverse prompt generation method that generates diverse text prompts for text-to-image models, and (ii) inter-diversity, the diversity of images generated by multiple generative models, by introducing an ensemble strategy that selects minimally overlapping samples. We empirically validate that the proposed \frameworkname outperforms prior arts, even a model trained with fully supervised data by large margins, in various tasks, including image recognition and multi-modal visual reasoning.
翻译:在线学习方法通常依赖于监督数据。然而,在数据分布漂移的场景下(如持续学习),当连续到达的在线数据流包含新概念(例如新类别)时,实时人工标注会因成本和延迟问题而难以实现,阻碍了模型的实时自适应。为缓解这一问题,研究者提出了“名称仅”设置,该设置仅需概念名称而无需监督样本。近期一种方法通过补充网络爬取的图像来解决该设置,但此类数据常面临数据不平衡、噪声及版权问题。为克服人工监督与网络监督的双重局限,我们提出GenOL——利用生成模型实现名称仅训练。但生成模型的直接应用会导致生成数据多样性不足。为此,我们通过以下两方面增强多样性:(i) 通过提出多样化提示生成方法生成文本到图像模型所需的多样化文本提示,增强模型内多样性(即单一模型生成的图像多样性);(ii) 通过引入选取最小重叠样本的集成策略,增强模型间多样性(即多个生成模型生成的图像多样性)。实验证明,在图像识别与多模态视觉推理等多种任务中,所提框架性能显著超越现有方法,甚至大幅优于使用全监督数据训练的模型。