Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding

Solving linear inverse problems plays a crucial role in numerous applications. Algorithm unfolding based, model-aware data-driven approaches have gained significant attention for effectively addressing these problems. Learned iterative soft-thresholding algorithm (LISTA) and alternating direction method of multipliers compressive sensing network (ADMM-CSNet) are two widely used such approaches, based on ISTA and ADMM algorithms, respectively. In this work, we study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs, for finite-layer unfolded networks such as LISTA and ADMM-CSNet with smooth soft-thresholding in an over-parameterized (OP) regime. We achieve this by leveraging a modified version of the Polyak-Lojasiewicz, denoted PL$^*$, condition. Satisfying the PL$^*$ condition within a specific region of the loss landscape ensures the existence of a global minimum and exponential convergence from initialization using gradient descent based methods. Hence, we provide conditions, in terms of the network width and the number of training samples, on these unfolded networks for the PL$^*$ condition to hold. We achieve this by deriving the Hessian spectral norm of these networks. Additionally, we show that the threshold on the number of training samples increases with the increase in the network width. Furthermore, we compare the threshold on training samples of unfolded networks with that of a standard fully-connected feed-forward network (FFNN) with smooth soft-thresholding non-linearity. We prove that unfolded networks have a higher threshold value than FFNN. Consequently, one can expect a better expected error for unfolded networks than FFNN.

翻译：解决线性逆问题在众多应用中扮演着关键角色。基于模型感知、数据驱动的算法展开方法因有效处理此类问题而受到广泛关注。学习型迭代软阈值算法（LISTA）和交替方向乘子法压缩感知网络（ADMM-CSNet）是两种广泛使用的方法，分别基于ISTA和ADMM算法。本文研究有限层展开网络（如采用平滑软阈值的LISTA和ADMM-CSNet）在过参数化（OP）情形下的优化保证，即随着学习轮次增加实现近零训练损失。我们通过利用修正版Polyak-Lojasiewicz条件（记为PL$^*$）来实现这一目标。在损失景观特定区域内满足PL$^*$条件可确保存在全局最小值，并通过基于梯度下降的方法实现从初始化开始的指数收敛。因此，我们根据网络宽度和训练样本数量，给出了这些展开网络满足PL$^*$条件的条件。我们通过推导这些网络的Hessian谱范数来实现这一点。此外，我们表明训练样本数量的阈值随网络宽度的增加而增加。进一步，我们将展开网络的训练样本阈值与采用平滑软阈值非线性的标准全连接前馈网络（FFNN）进行了比较。我们证明展开网络的阈值高于FFNN。因此，展开网络相比FFNN可预期获得更好的期望误差。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日