PolyMaX: General Dense Prediction with Mask Transformer

Xuan Yang,Liangzhe Yuan,Kimberly Wilber,Astuti Sharma,Xiuye Gu,Siyuan Qiao,Stephanie Debats,Huisheng Wang,Hartwig Adam,Mikhail Sirotenko,Liang-Chieh Chen

from arxiv, WACV 2024

Dense prediction tasks, such as semantic segmentation, depth estimation, and surface normal prediction, can be easily formulated as per-pixel classification (discrete outputs) or regression (continuous outputs). This per-pixel prediction paradigm has remained popular due to the prevalence of fully convolutional networks. However, on the recent frontier of segmentation task, the community has been witnessing a shift of paradigm from per-pixel prediction to cluster-prediction with the emergence of transformer architectures, particularly the mask transformers, which directly predicts a label for a mask instead of a pixel. Despite this shift, methods based on the per-pixel prediction paradigm still dominate the benchmarks on the other dense prediction tasks that require continuous outputs, such as depth estimation and surface normal prediction. Motivated by the success of DORN and AdaBins in depth estimation, achieved by discretizing the continuous output space, we propose to generalize the cluster-prediction based method to general dense prediction tasks. This allows us to unify dense prediction tasks with the mask transformer framework. Remarkably, the resulting model PolyMaX demonstrates state-of-the-art performance on three benchmarks of NYUD-v2 dataset. We hope our simple yet effective design can inspire more research on exploiting mask transformers for more dense prediction tasks. Code and model will be made available.

翻译：密集预测任务，如语义分割、深度估计和表面法线预测，可轻松形式化为逐像素分类（离散输出）或回归（连续输出）。由于全卷积网络的普及，这种逐像素预测范式始终受到青睐。然而，在最近的分割任务前沿领域，随着Transformer架构（特别是掩码Transformer）的出现，研究界正见证从逐像素预测到聚类预测的范式转变——该类方法直接预测掩码的标签而非像素。尽管存在这种转变，基于逐像素预测范式的方法仍主导着其他需要连续输出的密集预测任务基准（如深度估计和表面法线预测）。受DORN和AdaBins通过离散化连续输出空间在深度估计中取得的成功启发，我们提出将基于聚类预测的方法推广至通用密集预测任务，从而能够用掩码Transformer框架统一各类密集预测任务。值得注意的是，所得模型PolyMaX在NYUD-v2数据集的三项基准测试中展现出最先进的性能。我们希望这种简单而有效的设计能启发更多利用掩码Transformer处理密集预测任务的研究。代码与模型将开源。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日