It seems that in the current age, computers, computation, and data have an increasingly important role to play in scientific research and discovery. This is reflected in part by the rise of machine learning and artificial intelligence, which have become great areas of interest not just for computer science but also for many other fields of study. More generally, there have been trends moving towards the use of bigger, more complex and higher capacity models. It also seems that stochastic models, and stochastic variants of existing deterministic models, have become important research directions in various fields. For all of these types of models, gradient-based optimization remains as the dominant paradigm for model fitting, control, and more. This dissertation considers unconstrained, nonlinear optimization problems, with a focus on the gradient itself, that key quantity which enables the solution of such problems. In chapter 1, we introduce the notion of reverse differentiation, a term which describes the body of techniques which enables the efficient computation of gradients. We cover relevant techniques both in the deterministic and stochastic cases. We present a new framework for calculating the gradient of problems which involve both deterministic and stochastic elements. In chapter 2, we analyze the properties of the gradient estimator, with a focus on those properties which are typically assumed in convergence proofs of optimization algorithms. Chapter 3 gives various examples of applying our new gradient estimator. We further explore the idea of working with piecewise continuous models, that is, models with distinct branches and if statements which define what specific branch to use.
翻译:在当前时代,计算机、计算与数据在科学研究和发现中的作用日益重要。这在机器学习与人工智能的兴起中有所体现——它们不仅成为计算机科学,也成为众多其他研究领域的重大关注方向。更广泛而言,模型正趋向于更庞大、更复杂且容量更高。同时,随机模型及现有确定性模型的随机变体已成为各领域的重要研究方向。对于所有这些模型类型,基于梯度的优化仍是模型拟合、控制等领域的主流范式。本文关注无约束非线性优化问题,聚焦于梯度本身——这一解决此类问题的关键量。第一章引入反向微分概念,该术语描述了一组实现梯度高效计算的技术体系。我们涵盖了确定性与随机情形下的相关技术,并提出一种适用于同时包含确定性与随机元素问题的梯度计算新框架。第二章分析梯度估计器的性质,重点关注优化算法收敛性证明中通常假设的那些性质。第三章给出应用新梯度估计器的多种示例,并进一步探索分段连续模型——即包含不同分支及条件语句以决定具体使用哪个分支的模型——的构建思路。