ISP Distillation - 专知论文

Nowadays, many of the images captured are `observed' by machines only and not by humans, e.g., in autonomous systems. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed into some canonical image space by the camera \ans{Image Signal Processor (ISP)}. However, the camera ISP is optimized for producing visually pleasing images for human observers and not for machines. Therefore, one may spare the ISP compute time and apply vision models directly to RAW images. Yet, it has been shown that training such models directly on RAW images results in a performance drop. To mitigate this drop, we use a RAW and RGB image pairs dataset, which can be easily acquired with no human labeling. We then train a model that is applied directly to the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images. Our experiments show that our performance on RAW images for object classification and semantic segmentation is significantly better than models trained on labeled RAW images. It also reasonably matches the predictions of a pre-trained model on processed RGB images, while saving the ISP compute overhead.

翻译：如今，许多拍摄的图像仅由机器而非人类“观察”，例如在自主系统中。高级机器视觉模型（如目标识别或语义分割）假设图像已通过相机图像信号处理器（ISP）转换为某种规范图像空间。然而，相机ISP是为生成人类观察者视觉上愉悦的图像而优化的，并非针对机器。因此，可以省去ISP计算时间，直接将视觉模型应用于RAW图像。但研究表明，直接在RAW图像上训练此类模型会导致性能下降。为缓解这一问题，我们利用RAW与RGB图像配对数据集（无需人工标注即可轻松获取），通过知识蒸馏技术训练一个直接应用于RAW数据的模型，使其对RAW图像的预测与现成预训练模型对处理过的RGB图像的预测对齐。实验表明，我们的方法在RAW图像上的目标分类与语义分割性能显著优于基于标注RAW图像训练的模型，同时与预训练模型在处理过的RGB图像上的预测较为匹配，且节省了ISP计算开销。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/