We study the problem of efficiently computing the derivative of the fixed-point of a parametric non-differentiable contraction map. This problem has wide applications in machine learning, including hyperparameter optimization, meta-learning and data poisoning attacks. We analyze two popular approaches: iterative differentiation (ITD) and approximate implicit differentiation (AID). A key challenge behind the nonsmooth setting is that the chain rule does not hold anymore. Building upon the recent work by Bolte et al. (2022), who proved the linear convergence of non-differentiable ITD, we provide refined linear convergence rates for both ITD and AID in the deterministic case. We further introduce NSID, a new method to compute the implicit derivative when the fixed point is defined as the composition of an outer map and an inner map which is accessible only through a stochastic unbiased estimator. We establish rates for the convergence of NSID to the true derivative, encompassing the best available rates in the smooth setting. We present illustrative experiments confirming our analysis.
翻译:我们研究了高效计算参数化非可微压缩映射不动点导数的问题。该问题在机器学习中具有广泛应用,包括超参数优化、元学习及数据投毒攻击。我们分析了两种主流方法:迭代微分(ITD)和近似隐式微分(AID)。非光滑场景下的关键挑战在于链式法则不再成立。基于Bolte等人(2022)近期证明非可微ITD线性收敛性的工作,我们为确定情形下的ITD和AID提供了改进的线性收敛速率。进一步,我们提出NSID——一种新型方法,用于计算当不动点定义为外映射与仅通过随机无偏估计器可访问的内映射复合时的隐式导数。我们建立了NSID收敛到真实导数的速率,涵盖了光滑情形下现有的最优速率。通过实验验证了理论分析的正确性。