Deep learning (DL) is revolutionizing the scientific computing community. To reduce the data gap caused by usually expensive simulations or experimentation, active learning has been identified as a promising solution for the scientific computing community. However, the deep active learning (DAL) literature is currently dominated by image classification problems and pool-based methods, which are not directly transferrable to scientific computing problems, dominated by regression problems with no pre-defined 'pool' of unlabeled data. Here for the first time, we investigate the robustness of DAL methods for scientific computing problems using ten state-of-the-art DAL methods and eight benchmark problems. We show that, to our surprise, the majority of the DAL methods are not robust even compared to random sampling when the ideal pool size is unknown. We further analyze the effectiveness and robustness of DAL methods and suggest that diversity is necessary for a robust DAL for scientific computing problems.
翻译:深度学习正在彻底改变科学计算领域。为了解决因通常昂贵的模拟或实验所导致的数据缺口,主动学习已被认定为科学计算领域的一个有前景的解决方案。然而,目前深度主动学习的文献主要集中在图像分类问题和基于池的方法上,这些方法无法直接迁移到以回归问题为主、且没有预定义未标注数据“池”的科学计算问题中。为此,我们首次使用十种最先进的深度主动学习方法和八个基准问题,研究了深度主动学习方法在科学计算问题中的鲁棒性。令人惊讶的是,我们发现,在理想池大小未知的情况下,大多数深度主动学习方法甚至不如随机采样鲁棒。我们进一步分析了深度主动学习方法的有效性和鲁棒性,并提出多样性是构建适用于科学计算问题的鲁棒深度主动学习方法的必要条件。