Active Learning (AL) has remained relatively unexplored for LiDAR perception tasks in autonomous driving datasets. In this study we evaluate Bayesian active learning methods applied to the task of dataset distillation or core subset selection (subset with near equivalent performance as full dataset). We also study the effect of application of data augmentation (DA) within Bayesian AL based dataset distillation. We perform these experiments on the full Semantic-KITTI dataset. We extend our study over our existing work only on 1/4th of the same dataset. Addition of DA and BALD have a negative impact over the labeling efficiency and thus the capacity to distill datasets. We demonstrate key issues in designing a functional AL framework and finally conclude with a review of challenges in real world active learning.
翻译:主动学习(Active Learning, AL)在自动驾驶数据集的LiDAR感知任务中仍鲜有探索。本研究评估了应用于数据集蒸馏或核心子集选择(性能与完整数据集近乎相当的子集)任务的贝叶斯主动学习方法,同时研究了在基于贝叶斯主动学习的数据集蒸馏中应用数据增强(Data Augmentation, DA)的效果。我们在完整的Semantic-KITTI数据集上进行了实验,将研究范围从先前仅基于该数据集四分之一的成果进行了扩展。结果表明,DA和BALD的引入对标注效率产生负面影响,进而削弱了数据集蒸馏的能力。我们揭示了设计实用主动学习框架的关键问题,并最终总结了现实世界中主动学习所面临的挑战。