Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize main applications of DAL in Natural Language Processing (NLP), Computer Vision (CV), and Data Mining (DM), etc. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field.
翻译:主动学习旨在通过更少的训练样本实现高性能。它通过人在回路的方式,迭代地向标注者请求标注新选定的样本。该技术因其广泛的适用性而日益普及,然而其综述论文,特别是针对基于深度学习的主动学习(DAL)的综述,仍然稀少。因此,我们对DAL进行了深入且全面的综述。首先,我们介绍了论文的收集与筛选过程。其次,正式定义了DAL任务,并总结了最具影响力的基线和广泛使用的数据集。第三,系统性地从五个视角对DAL方法进行了分类,包括标注类型、查询策略、深度模型架构、学习范式和训练过程,并客观分析了它们的优缺点。然后,全面总结了DAL在自然语言处理(NLP)、计算机视觉(CV)和数据挖掘(DM)等领域的主要应用。最后,在对当前研究进行详细分析后,探讨了挑战与前景。本工作旨在为研究人员克服DAL中的困难提供一份有用且快速的指南。我们希望这篇综述能推动这一新兴领域的进一步发展。