The current dominant paradigm when building a machine learning model is to iterate over a dataset over and over until convergence. Such an approach is non-incremental, as it assumes access to all images of all categories at once. However, for many applications, non-incremental learning is unrealistic. To that end, researchers study incremental learning, where a learner is required to adapt to an incoming stream of data with a varying distribution while preventing forgetting of past knowledge. Significant progress has been made, however, the vast majority of works focus on the fully supervised setting, making these algorithms label-hungry thus limiting their real-life deployment. To that end, in this paper, we make the first attempt to survey recently growing interest in label-efficient incremental learning. We identify three subdivisions, namely semi-, few-shot- and self-supervised learning to reduce labeling efforts. Finally, we identify novel directions that can further enhance label-efficiency and improve incremental learning scalability. Project website: {https://github.com/kilickaya/label-efficient-il.
翻译:当前构建机器学习模型的主流范式是反复迭代同一数据集直至收敛。这种方法无法实现增量式学习,因为它假设可同时访问所有类别的全部图像。然而,对于许多实际应用而言,非增量式学习并不现实。为此,研究者们探索了增量学习方法,要求学习器能够适应具有动态分布变化的连续数据流,同时防止过去知识的遗忘。尽管该领域已取得显著进展,但绝大多数研究集中于全监督场景,导致这些算法依赖大量标签数据,从而限制其实际应用部署。基于此,本文首次尝试对近期备受关注的标签高效增量学习进行综述。我们识别出三个子方向:半监督学习、小样本学习和自监督学习,用以降低标注负担。最后,我们指出了可进一步提升标签效率与增量学习可扩展性的新型研究方向。项目网站:{https://github.com/kilickaya/label-efficient-il}