Modern AI algorithms require labeled data. In real world, majority of data are unlabeled. Labeling the data are costly. this is particularly true for some areas requiring special skills, such as reading radiology images by physicians. To most efficiently use expert's time for the data labeling, one promising approach is human-in-the-loop active learning algorithm. In this work, we propose a novel active learning framework with significant potential for application in modern AI systems. Unlike the traditional active learning methods, which only focus on determining which data point should be labeled, our framework also introduces an innovative perspective on incorporating different query scheme. We propose a model to integrate the information from different types of queries. Based on this model, our active learning frame can automatically determine how the next question is queried. We further developed a data driven exploration and exploitation framework into our active learning method. This method can be embedded in numerous active learning algorithms. Through simulations on five real-world datasets, including a highly complex real image task, our proposed active learning framework exhibits higher accuracy and lower loss compared to other methods.
翻译:现代AI算法需要标注数据。现实中绝大多数数据未经标注,而数据标注成本高昂,尤其在需要专业技能的领域(如医师阅片)更为突出。为最大化利用专家时间进行数据标注,人机协同主动学习算法是一种极具前景的方案。本研究提出一种新型主动学习框架,在现代AI系统中具有重要应用潜力。与传统主动学习方法仅关注数据点选取不同,本框架创新性地引入了不同查询机制的整合视角。我们构建了融合多类型查询信息的模型,据此使主动学习框架能自主决策下一查询方式。在此基础上,我们进一步开发了数据驱动的探索-利用框架,该组件可嵌入多种主动学习算法。通过在五个真实数据集(含高度复杂的真实图像任务)上的仿真实验,本框架在准确率与损失函数方面均优于对比方法。