Data subsampling has become widely recognized as a tool to overcome computational and economic bottlenecks in analyzing massive datasets. We contribute to the development of adaptive design for estimation of finite population characteristics, using active learning and adaptive importance sampling. We propose an active sampling strategy that iterates between estimation and data collection with optimal subsamples, guided by machine learning predictions on yet unseen data. The method is illustrated on virtual simulation-based safety assessment of advanced driver assistance systems. Substantial performance improvements are demonstrated compared to traditional sampling methods.
翻译:数据子采样已被广泛认为是克服海量数据集分析中计算与经济瓶颈的有效工具。本文通过结合主动学习与自适应重要性采样,为有限总体特征估计的自适应设计方法发展做出贡献。我们提出一种主动采样策略,该方法在机器学习对未观测数据的预测指导下,基于最优子样本在估计与数据收集之间进行迭代。本方法通过高级驾驶辅助系统的虚拟仿真安全评估案例进行演示。与传统采样方法相比,该方法展现出显著的性能提升。