We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function. Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators, allowing for sparsification of both the model and gradient during training. We establish insightful properties of the proposed objective function and highlight its connections to the standard formulation. Furthermore, we present several variants of the Stochastic Gradient Descent (SGD) method adapted to the new problem formulation, including SGD with general sampling, a distributed version, and SGD with variance reduction techniques. We achieve tighter convergence rates and relax assumptions, bridging the gap between theoretical principles and practical applications, covering several important techniques such as Dropout and Sparse training. This work presents promising opportunities to enhance the theoretical understanding of model training through a sparsification-aware optimization approach.
翻译:我们提出了一种新颖的优化问题形式,其不同于传统上将机器学习模型损失作为黑箱函数进行最小化的方式。与传统形式不同,所提出的方法明确融入了初始预训练模型和随机草图算子,允许在训练过程中对模型和梯度进行稀疏化。我们论证了所提出目标函数的有趣性质,并突显了其与标准形式的联系。此外,我们提出了几种适用于新问题形式的随机梯度下降(SGD)方法变体,包括带通用采样的SGD、分布式版本以及带方差缩减技术的SGD。我们取得了更紧的收敛率并放宽了假设,弥合了理论原理与实际应用之间的鸿沟,涵盖了Dropout和稀疏训练等若干重要技术。这项工作通过一种稀疏感知的优化方法,为提升模型训练的理论理解提供了有前景的机会。