Supervised convolutional neural networks (CNNs) are widely used to solve imaging inverse problems, achieving state-of-the-art performance in numerous applications. However, despite their empirical success, these methods are poorly understood from a theoretical perspective and often treated as black boxes. To bridge this gap, we analyze trained neural networks through the lens of the Minimum Mean Square Error (MMSE) estimator, incorporating functional constraints that capture two fundamental inductive biases of CNNs: translation equivariance and locality via finite receptive fields. Under the empirical training distribution, we derive an analytic, interpretable, and tractable formula for this constrained variant, termed Local-Equivariant MMSE (LE-MMSE). Through extensive numerical experiments across various inverse problems (denoising, inpainting, deconvolution), datasets (FFHQ, CIFAR-10, FashionMNIST), and architectures (U-Net, ResNet, PatchMLP), we demonstrate that our theory matches the neural networks outputs (PSNR $\gtrsim25$dB). Furthermore, we provide insights into the differences between \emph{physics-aware} and \emph{physics-agnostic} estimators, the impact of high-density regions in the training (patch) distribution, and the influence of other factors (dataset size, patch size, etc).
翻译:监督式卷积神经网络(CNNs)被广泛用于解决成像逆问题,在众多应用中实现了最先进的性能。然而,尽管这些方法在经验上取得了成功,但从理论角度理解甚少,通常被视为黑箱。为弥合这一差距,我们通过最小均方误差(MMSE)估计器的视角分析训练后的神经网络,并引入捕捉CNN两个基本归纳偏好的函数约束:平移等变性和通过有限感受野实现的局部性。在经验训练分布下,我们推导出该约束变体——称为局部等变MMSE(LE-MMSE)——的一个解析、可解释且易于处理的公式。通过在多种逆问题(去噪、修复、去卷积)、数据集(FFHQ、CIFAR-10、FashionMNIST)和架构(U-Net、ResNet、PatchMLP)上进行的大量数值实验,我们证明该理论与神经网络的输出结果相符(PSNR $\gtrsim25$dB)。此外,我们深入探讨了**物理感知**与**物理无关**估计器之间的差异、训练(图像块)分布中高密度区域的影响,以及其他因素(数据集规模、图像块大小等)的作用。