Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel $\rho$-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed \r{ho}-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed $\rho$-Vision that eliminates the ISP are the potential reductions in computations and processing times.
翻译:传统相机通过传感器捕获图像辐照度,并利用图像信号处理器(ISP)将其转换为RGB图像。这些图像随后可用于摄影或各类应用中的视觉计算任务,例如公共安全监控和自动驾驶。可以论证的是,由于RAW图像包含了所有捕获信息,对于视觉计算而言,使用ISP将RAW转换为RGB并非必要。本文提出了一种新颖的$\rho$-Vision框架,无需使用沿用数十年的ISP子系统,直接利用RAW图像进行高层语义理解与低层压缩。考虑到RAW图像数据集的稀缺性,我们首先基于无监督CycleGAN开发了非配对CycleR2R网络,利用非配对的RAW和RGB图像训练模块化展开ISP与逆ISP(invISP)模型。随后,我们能够利用任意现有RGB图像数据集灵活生成模拟RAW图像(simRAW),并微调原本针对RGB域训练的不同模型,使其处理真实相机的RAW图像。我们在RAW域中使用RAW域YOLOv3和RAW图像压缩器(RIC),对来自多种相机的快照进行了目标检测与图像压缩能力验证。定量结果表明,与RGB域处理相比,RAW域任务推理提供了更优的检测精度与压缩性能。此外,所提出的$\rho$-Vision可泛化至不同相机传感器及多种任务特定模型。该$\rho$-Vision框架因省去ISP而带来的额外优势在于,可潜在降低计算量与处理时间。