We study the problem of improving the efficiency of segmentation transformers by using disparate amounts of computation for different parts of the image. Our method, PAUMER, accomplishes this by pausing computation for patches that are deemed to not need any more computation before the final decoder. We use the entropy of predictions computed from intermediate activations as the pausing criterion, and find this aligns well with semantics of the image. Our method has a unique advantage that a single network trained with the proposed strategy can be effortlessly adapted at inference to various run-time requirements by modulating its pausing parameters. On two standard segmentation datasets, Cityscapes and ADE20K, we show that our method operates with about a $50\%$ higher throughput with an mIoU drop of about $0.65\%$ and $4.6\%$ respectively.
翻译:我们研究了通过为图像不同区域分配不同计算量来提升分割变换器效率的问题。所提出的方法PAUMER通过暂停对被认为在最终解码器之前无需额外计算的补丁块的计算来实现这一目标。我们采用中间激活计算出的预测熵作为暂停标准,并发现该标准与图像语义高度一致。该方法具有独特优势:采用所提策略训练的单一网络可在推理阶段通过调节暂停参数,轻松适配不同的运行时需求。在Cityscapes和ADE20K两个标准分割数据集上的实验表明,本方法可将吞吐量提升约50%,同时mIoU分别下降约0.65%和4.6%。