We present DySample, an ultra-lightweight and effective dynamic upsampler. While impressive performance gains have been witnessed from recent kernel-based dynamic upsamplers such as CARAFE, FADE, and SAPA, they introduce much workload, mostly due to the time-consuming dynamic convolution and the additional sub-network used to generate dynamic kernels. Further, the need for high-res feature guidance of FADE and SAPA somehow limits their application scenarios. To address these concerns, we bypass dynamic convolution and formulate upsampling from the perspective of point sampling, which is more resource-efficient and can be easily implemented with the standard built-in function in PyTorch. We first showcase a naive design, and then demonstrate how to strengthen its upsampling behavior step by step towards our new upsampler, DySample. Compared with former kernel-based dynamic upsamplers, DySample requires no customized CUDA package and has much fewer parameters, FLOPs, GPU memory, and latency. Besides the light-weight characteristics, DySample outperforms other upsamplers across five dense prediction tasks, including semantic segmentation, object detection, instance segmentation, panoptic segmentation, and monocular depth estimation. Code is available at https://github.com/tiny-smart/dysample.
翻译:我们提出了DySample——一种超轻量且高效的动态上采样器。尽管近年来基于核的动态上采样器(如CARAFE、FADE和SAPA)取得了显著的性能提升,但它们引入了大量计算负担,这主要归因于耗时的高分辨率动态卷积以及用于生成动态核的附加子网络。此外,FADE和SAPA对高分辨率特征引导的需求在某种程度上限制了其应用场景。为解决这些问题,我们绕开动态卷积,从点采样的角度重新定义上采样过程——该方法更具资源效率,且可通过PyTorch标准内置函数轻松实现。我们首先展示了一种朴素设计方案,然后逐步演示如何强化其采样行为,最终构建出新上采样器DySample。与先前的基于核的动态上采样器相比,DySample无需定制CUDA包,且参数量、FLOPs、GPU内存占用和延迟显著降低。除轻量化特性外,DySample在语义分割、目标检测、实例分割、全景分割和单目深度估计等五项密集预测任务中均优于其他现有上采样器。代码开源地址:https://github.com/tiny-smart/dysample。