Supervised training of deep neural networks on pairs of clean image and noisy measurement achieves state-of-the-art performance for many image reconstruction tasks, but such training pairs are difficult to collect. Self-supervised methods enable training based on noisy measurements only, without clean images. In this work, we investigate the cost of self-supervised training in terms of sample complexity for a class of self-supervised methods that enable the computation of unbiased estimates of gradients of the supervised loss, including noise2noise methods. We analytically show that a model trained with such self-supervised training is as good as the same model trained in a supervised fashion, but self-supervised training requires more examples than supervised training. We then study self-supervised denoising and accelerated MRI empirically and characterize the cost of self-supervised training in terms of the number of additional samples required, and find that the performance gap between self-supervised and supervised training vanishes as a function of the training examples, at a problem-dependent rate, as predicted by our theory.
翻译:基于干净图像与噪声测量值配对的有监督深度神经网络训练,在众多图像重建任务中达到了最先进的性能,但此类训练配对数据难以收集。自监督方法则仅依赖于噪声测量值(无需干净图像)即可实现训练。本文针对一类能够计算有监督损失梯度无偏估计的自监督方法(包括noise2noise方法),从样本复杂度角度研究了自监督训练的成本。我们通过分析证明,采用此类自监督训练得到的模型与有监督训练下的相同模型性能相当,但自监督训练需要比有监督训练更多的样本。随后,我们通过自监督去噪和加速MRI实验进行实证研究,量化了自监督训练在所需额外样本数量方面的成本,并发现与理论预测一致,自监督训练与有监督训练之间的性能差距随训练样本数量增加而消失,且该收敛速率取决于具体问题。