Modern AI algorithms require labeled data. In real world, majority of data are unlabeled. Labeling the data are costly. this is particularly true for some areas requiring special skills, such as reading radiology images by physicians. To most efficiently use expert's time for the data labeling, one promising approach is human-in-the-loop active learning algorithm. In this work, we propose a novel active learning framework with significant potential for application in modern AI systems. Unlike the traditional active learning methods, which only focus on determining which data point should be labeled, our framework also introduces an innovative perspective on incorporating different query scheme. We propose a model to integrate the information from different types of queries. Based on this model, our active learning frame can automatically determine how the next question is queried. We further developed a data driven exploration and exploitation framework into our active learning method. This method can be embedded in numerous active learning algorithms. Through simulations on five real-world datasets, including a highly complex real image task, our proposed active learning framework exhibits higher accuracy and lower loss compared to other methods.
翻译:现代AI算法需要标注数据。然而在现实世界中,大多数数据都是未标注的。数据标注成本高昂,对于需要特殊技能的领域尤其如此,例如医生解读放射影像。为最高效利用专家进行数据标注的时间,人机协同主动学习算法是一种极具前景的方法。本研究提出了一种具有显著应用潜力的新型主动学习框架。与传统主动学习方法仅关注确定哪些数据点需要标注不同,我们的框架还引入了整合不同查询方案的创新视角。我们提出了一个整合多类型查询信息的模型。基于该模型,我们的主动学习框架能自动决定后续问题的查询方式。我们进一步将数据驱动的探索与利用框架融入主动学习方法中。该方法可嵌入多种主动学习算法。通过在五个真实数据集(包括一项高度复杂的真实图像任务)上的仿真实验,我们提出的主动学习框架相比其他方法展现出更高的准确率和更低的损失。