The success of deep learning is frequently described as the ability to train all parameters of a network on a specific application in an end-to-end fashion. Yet, several design choices on the camera level, including the pixel layout of the sensor, are considered as pre-defined and fixed, and high resolution, regular pixel layouts are considered to be the most generic ones in computer vision and graphics, treating all regions of an image as equally important. While several works have considered non-uniform, \eg, hexagonal or foveated, pixel layouts in hardware and image processing, the layout has not been integrated into the end-to-end learning paradigm so far. In this work, we present the first truly end-to-end trained imaging pipeline that optimizes the size and distribution of pixels on the imaging sensor jointly with the parameters of a given neural network on a specific task. We derive an analytic, differentiable approach for the sensor layout parameterization that allows for task-specific, local varying pixel resolutions. We present two pixel layout parameterization functions: rectangular and curvilinear grid shapes that retain a regular topology. We provide a drop-in module that approximates sensor simulation given existing high-resolution images to directly connect our method with existing deep learning models. We show that network predictions benefit from learnable pixel layouts for two different downstream tasks, classification and semantic segmentation.
翻译:深度学习的成功常被描述为能够以端到端方式在特定应用上训练网络的所有参数。然而,相机层面上的若干设计选择——包括传感器的像素布局——仍被视为预定义且固定不变的,而高分辨率、规则化像素布局在计算机视觉和图形学中被认为是最通用的方案,即将图像所有区域视为同等重要。尽管已有若干工作研究了非均匀(例如六边形或凹窝状)像素布局在硬件及图像处理中的应用,但至今尚未将布局纳入端到端学习范式。本文首次提出真正端到端训练成像流水线,该流水线在特定任务下联合优化成像传感器上像素的尺寸与分布,以及给定神经网络的参数。我们推导出一种解析可微分的传感器布局参数化方法,可实现任务特定的局部可变像素分辨率。本文提出两种像素布局参数化函数:保持规则拓扑结构的矩形网格与曲线形网格。我们提供一个即插即用模块,利用现有高分辨率图像近似传感器仿真,从而直接将本方法与现有深度学习模型对接。实验表明,在分类与语义分割两类下游任务中,网络预测可受益于可学习像素布局。