Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.
翻译:输入到深度神经网络的图像通常经过若干人工设计的图像信号处理(ISP)操作,这些操作均以生成视觉美观的图像为目标进行了优化。本研究提出假设:与RAW图像表示相比,视觉美观图像的中间表征对于下游计算机视觉任务并非最优。我们建议ISP操作应面向最终任务进行优化,即在训练过程中联合学习各操作的参数。我们扩展了该领域的先前研究,提出一种新的可学习操作,使目标检测器能够获得优于先前工作及传统RGB图像的表现。在公开的PASCALRAW数据集上的实验,实证验证了我们的假设。