A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We examine such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity analysis and parameter learning problems. Under partial smoothness and other mild assumptions, we apply Implicit (ID) and Automatic Differentiation (AD) to the fixed-point iterations of proximal splitting algorithms. We show that AD of the sequence generated by these algorithms converges (linearly under further assumptions) to the derivative of the solution mapping. For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD and moreover provide faster convergence theoretically. We numerically illustrate the convergence and convergence rates of AD and FPAD on Lasso and Group Lasso problems and demonstrate the working of FPAD on prototypical image denoising problems by learning the regularization term.
翻译:大量非光滑实际优化问题可表述为光滑函数与部分光滑函数之和的最小化问题。本文研究此类同时依赖于参数向量的结构化问题,并探讨其解映射关于参数的微分问题,该问题在灵敏度分析与参数学习领域具有广泛应用。在部分光滑性及其他温和假设下,我们将隐式微分与自动微分应用于邻近分裂算法的不动点迭代。我们证明这些算法生成序列的自动微分收敛于解映射的导数(在进一步假设下呈线性收敛)。针对自动微分的一种变体——我们称之为不动点自动微分,我们解决了反向模式自动微分的内存开销问题,并在理论上提供了更快的收敛速度。我们通过Lasso和Group Lasso问题数值验证了自动微分与不动点自动微分的收敛性及收敛速率,并通过学习正则化项展示了不动点自动微分在典型图像去噪问题中的应用。