Optimal Regularization for a Data Source

In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a ``computational sufficient statistic'' as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization.

翻译：在基于优化的反问题与统计估计方法中，通常会在确保数据保真度的准则项基础上，添加用于促进解具备期望结构特性的正则化项。正则化器的选择通常综合了先验领域知识与计算可行性考量。凸正则化器虽具备计算优势，但所能促进的结构类型有限；而非凸正则化器在结构促进形式上更为灵活，并在某些应用中展现出卓越的实证表现，但求解相关优化问题带来了计算挑战。本文旨在系统理解凸正则化的能力与局限性，通过研究以下问题：给定数据分布，针对该分布采样数据的最优正则化器是什么？数据源的何种属性决定了最优正则化器是否具有凸性？我们针对由连续、正齐次且原点附近取正值的泛函所定义的正则化器类别展开研究。若以正则化器能量函数定义的吉布斯密度在所有正则化器诱导的吉布斯密度中最大化总体似然（等价于最小化交叉熵损失），则称该正则化器对该数据分布为最优。由于所考虑的正则化器与星体存在一一对应关系，我们借助对偶Brunn-Minkowski理论，证明源自数据分布的径向函数类似于"计算充分统计量"，它是识别最优正则化器及评估数据源对凸正则化适应性的关键量。