Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent

Optimization techniques in deep learning are predominantly led by first-order gradient methodologies, such as SGD. However, neural network training can greatly benefit from the rapid convergence characteristics of second-order optimization. Newton's GD stands out in this category, by rescaling the gradient using the inverse Hessian. Nevertheless, one of its major bottlenecks is matrix inversion, which is notably time-consuming in $O(N^3)$ time with weak scalability. Matrix inversion can be translated into solving a series of linear equations. Given that quantum linear solver algorithms (QLSAs), leveraging the principles of quantum superposition and entanglement, can operate within a $\text{polylog}(N)$ time frame, they present a promising approach with exponential acceleration. Specifically, one of the most recent QLSAs demonstrates a complexity scaling of $O(d\cdot\kappa \log(N\cdot\kappa/\epsilon))$, depending on: {size~$N$, condition number~$\kappa$, error tolerance~$\epsilon$, quantum oracle sparsity~$d$} of the matrix. However, this also implies that their potential exponential advantage may be hindered by certain properties (i.e. $\kappa$ and $d$). We propose Q-Newton, a hybrid quantum-classical scheduler for accelerating neural network training with Newton's GD. Q-Newton utilizes a streamlined scheduling module that coordinates between quantum and classical linear solvers, by estimating & reducing $\kappa$ and constructing $d$ for the quantum solver. Our evaluation showcases the potential for Q-Newton to significantly reduce the total training time compared to commonly used optimizers like SGD. We hypothesize a future scenario where the gate time of quantum machines is reduced, possibly realized by attoseconds physics. Our evaluation establishes an ambitious and promising target for the evolution of quantum computing.

翻译：深度学习中的优化技术主要由一阶梯度方法（如SGD）主导。然而，神经网络训练可从二阶优化的快速收敛特性中获益。牛顿梯度下降法通过使用Hessian逆矩阵重新缩放梯度，在此类方法中脱颖而出。但其主要瓶颈之一是矩阵求逆，该过程耗时显著且可扩展性弱，时间复杂度为$O(N^3)$。矩阵求逆可转化为求解一系列线性方程组。基于量子叠加与纠缠原理的量子线性求解算法（QLSAs）可在$\text{polylog}(N)$时间框架内运行，因此有望实现指数级加速。具体而言，最新QLSAs的复杂度为$O(d\cdot\kappa \log(N\cdot\kappa/\epsilon))$，其参数包括：矩阵规模$N$、条件数$\kappa$、误差容限$\epsilon$、量子Oracle稀疏度$d$。然而，这也意味着潜在指数优势可能受制于某些属性（即$\kappa$和$d$）。我们提出Q-Newton——一种用于加速牛顿梯度下降神经网络训练的混合量子-经典调度器。Q-Newton通过协调量子与经典线性求解器的调度模块，通过估算与降低$\kappa$值，并为量子求解器构建稀疏矩阵$d$。实验评估表明，与SGD等常用优化器相比，Q-Newton可显著减少总训练时间。我们推测未来量子门时间可能通过阿秒物理实现缩减，这为量子计算演进设定了雄心勃勃且极具前景的目标。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日