MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

Low-light conditions and occluded scenarios impede object detection in real-world Internet of Things (IoT) applications like autonomous vehicles and security systems. While advanced machine learning models strive for accuracy, their computational demands clash with the limitations of resource-constrained devices, hampering real-time performance. In our current research, we tackle this challenge, by introducing "YOLO Phantom", one of the smallest YOLO models ever conceived. YOLO Phantom utilizes the novel Phantom Convolution block, achieving comparable accuracy to the latest YOLOv8n model while simultaneously reducing both parameters and model size by 43%, resulting in a significant 19% reduction in Giga Floating Point Operations (GFLOPs). YOLO Phantom leverages transfer learning on our multimodal RGB-infrared dataset to address low-light and occlusion issues, equipping it with robust vision under adverse conditions. Its real-world efficacy is demonstrated on an IoT platform with advanced low-light and RGB cameras, seamlessly connecting to an AWS-based notification endpoint for efficient real-time object detection. Benchmarks reveal a substantial boost of 17% and 14% in frames per second (FPS) for thermal and RGB detection, respectively, compared to the baseline YOLOv8n model. For community contribution, both the code and the multimodal dataset are available on GitHub.

翻译：低光照条件和遮挡场景会阻碍物联网（IoT）在实际应用（如自动驾驶车辆和安全系统）中的目标检测。尽管先进的机器学习模型追求高精度，但其计算需求与资源受限设备的局限性相冲突，从而影响实时性能。在本研究中，我们通过引入"YOLO Phantom"（有史以来设计的最小YOLO模型之一）来应对这一挑战。YOLO Phantom 利用新型幻卷积模块，在实现与最新YOLOv8n模型相当精度的同时，将参数和模型大小均减少了43%，导致十亿次浮点运算次数（GFLOPs）显著降低19%。YOLO Phantom 利用基于多模态RGB-红外数据集的迁移学习来解决低光照和遮挡问题，使其在恶劣条件下具备鲁棒视觉能力。其实用效果在配备先进低光照和RGB摄像头的物联网平台上得到验证，该平台无缝连接至基于AWS的通知端点，以实现高效的实时目标检测。基准测试表明，与基线YOLOv8n模型相比，热成像和RGB检测的每秒帧数（FPS）分别显著提升了17%和14%。为促进社区贡献，相关代码及多模态数据集已发布于GitHub。

相关内容

Yolo

关注 28

Yolo算法，其全称是You Only Look Once: Unified, Real-Time Object Detection,You Only Look Once说的是只需要一次CNN运算，Unified指的是这是一个统一的框架，提供end-to-end的预测，而Real-Time体现是Yolo算法速度快。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日