Optimal Regularization for a Data Source

In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a ``computational sufficient statistic'' as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization.

翻译：在基于优化的逆问题与统计估计方法中，通常会在确保数据保真度的准则基础上增加正则化项，以促进解具有所需的结构特性。正则化器的选择通常由先验领域知识和计算可行性共同驱动。凸正则化器在计算上具有吸引力，但其可促进的结构类型有限；而非凸正则化器在结构形式方面更具灵活性，并在某些应用中展现出强大的实证性能，但伴随而来的是求解相关优化问题的计算挑战。本文通过探究以下问题，系统性地理解凸正则化的能力与局限性：给定数据分布，针对该分布生成的数据，最优正则化器是什么？数据源的何种特性决定了最优正则化器是否为凸？我们针对由连续、正齐次且除原点外保持正值的函数泛函所定义的正则化器类别进行研究。若以正则化器能量定义的吉布斯密度在所有正则化诱导的吉布斯密度族中最大化总体似然（即最小化交叉熵损失），则称该正则化器对给定数据分布是最优的。由于所考虑的正则化器与星形体一一对应，我们利用对偶布鲁恩-闵可夫斯基理论表明：由数据分布导出的径向函数本质上可视为"计算充分统计量"，它是识别最优正则化器、评估数据源是否适合凸正则化的关键量。