We present Decomposer, a semi-supervised reconstruction model that decomposes distorted image sequences into their fundamental building blocks - the original image and the applied augmentations, i.e., shadow, light, and occlusions. To solve this problem, we use the SIDAR dataset that provides a large number of distorted image sequences: each sequence contains images with shadows, lighting, and occlusions applied to an undistorted version. Each distortion changes the original signal in different ways, e.g., additive or multiplicative noise. We propose a transformer-based model to explicitly learn this decomposition. The sequential model uses 3D Swin-Transformers for spatio-temporal encoding and 3D U-Nets as prediction heads for individual parts of the decomposition. We demonstrate that by separately pre-training our model on weakly supervised pseudo labels, we can steer our model to optimize for our ambiguous problem definition and learn to differentiate between the different image distortions.
翻译:我们提出了Decomposer,一种半监督重建模型,能够将失真图像序列分解为基本构成要素——原始图像及其应用的增强(如阴影、光照和遮挡)。为解决该问题,我们使用SIDAR数据集,该数据集提供大量失真图像序列:每个序列包含对未失真版本施加阴影、光照和遮挡后的图像。每种失真以不同方式改变原始信号,例如加性噪声或乘性噪声。我们提出一种基于Transformer的模型来显式学习这种分解。该序列模型采用3D Swin-Transformers进行时空编码,并以3D U-Nets作为预测头来分解各个组成部分。我们证明,通过在弱监督伪标签上分别预训练模型,可以引导模型优化模糊的问题定义,并学习区分不同图像失真。