Accurate 3D person detection is critical for safety in applications such as robotics, industrial monitoring, and surveillance. This work presents a systematic evaluation of 3D person detection using camera-only, LiDAR-only, and camera-LiDAR fusion. While most existing research focuses on autonomous driving, we explore detection performance and robustness in diverse indoor and outdoor scenes using the JRDB dataset. We compare three representative models - BEVDepth (camera), PointPillars (LiDAR), and DAL (camera-LiDAR fusion) - and analyze their behavior under varying occlusion and distance levels. Our results show that the fusion-based approach consistently outperforms single-modality models, particularly in challenging scenarios. We further investigate robustness against sensor corruptions and misalignments, revealing that while DAL offers improved resilience, it remains sensitive to sensor misalignment and certain LiDAR-based corruptions. In contrast, the camera-based BEVDepth model showed the lowest performance and was most affected by occlusion, distance, and noise. Our findings highlight the importance of utilizing sensor fusion for enhanced 3D person detection, while also underscoring the need for ongoing research to address the vulnerabilities inherent in these systems.
翻译:精确的三维人体检测对于机器人、工业监控及安防等应用的安全性至关重要。本研究系统评估了仅使用相机、仅使用LiDAR以及相机-LiDAR融合三种模式下的三维人体检测性能。现有研究多集中于自动驾驶场景,而本文利用JRDB数据集探究了多样化室内外场景中的检测性能与鲁棒性。我们比较了三种代表性模型——BEVDepth(相机)、PointPillars(LiDAR)和DAL(相机-LiDAR融合),并分析了它们在遮挡程度和距离变化下的表现。实验结果表明,融合方法在各项指标上持续优于单模态模型,尤其在挑战性场景中优势显著。我们进一步研究了传感器数据损坏与错位对系统的影响,发现尽管DAL表现出更强的抗干扰能力,但仍对传感器错位及特定类型的LiDAR数据损坏较为敏感。相比之下,基于相机的BEVDepth模型性能最低,且最易受遮挡、距离和噪声的影响。本研究结果凸显了利用传感器融合技术提升三维人体检测性能的重要性,同时指出仍需持续研究以解决此类系统固有的脆弱性问题。