We investigate the fundamental optimization question of minimizing a target function $f(x)$ whose gradients are expensive to compute or have limited availability, given access to some auxiliary side function $h(x)$ whose gradients are cheap or more available. This formulation captures many settings of practical relevance such as i) re-using batches in SGD, ii) transfer learning, iii) federated learning, iv) training with compressed models/dropout, etc. We propose two generic new algorithms which are applicable in all these settings and prove using only an assumption on the Hessian similarity between the target and side information that we can benefit from this framework.
翻译:我们研究了在给定辅助函数 \(h(x)\)(其梯度计算成本低或更易获取)的情况下,最小化目标函数 \(f(x)\)(其梯度计算成本高或获取受限)这一基础优化问题。该形式涵盖了多种实际场景,例如:(i) 随机梯度下降(SGD)中批次的重复利用、(ii) 迁移学习、(iii) 联邦学习、(iv) 使用压缩模型/丢弃法进行训练等。我们提出了两种通用的新算法,可适用于所有这些场景,并仅基于目标函数与辅助函数之间的 Hessian 矩阵相似性假设,证明我们能够从这一框架中获益。