High-performance GPU-accelerated particle filter methods are critical for object detection applications, ranging from autonomous driving, robot localization, to time-series prediction. In this work, we investigate the design, development and optimization of particle-filter using half-precision on CUDA cores and compare their performance and accuracy with single- and double-precision baselines on Nvidia V100, A100, A40 and T4 GPUs. To mitigate numerical instability and precision losses, we introduce algorithmic changes in the particle filters. Using half-precision leads to a performance improvement of 1.5-2x and 2.5-4.6x with respect to single- and double-precision baselines respectively, at the cost of a relatively small loss of accuracy.
翻译:高性能GPU加速粒子滤波方法在目标检测应用中至关重要,涵盖自动驾驶、机器人定位及时序预测等领域。本研究探究了在CUDA核心上采用半精度设计、开发与优化粒子滤波器的方案,并在Nvidia V100、A100、A40及T4 GPU上将其性能与精度与单精度和双精度基线进行了对比。为缓解数值不稳定性和精度损失,我们在粒子滤波器中引入了算法改进。相较于单精度与双精度基线,采用半精度分别实现了1.5-2倍与2.5-4.6倍的性能提升,同时仅以相对较小的精度损失为代价。