Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible. As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two. Differentiable programming is not merely the differentiation of programs, but also the thoughtful design of programs intended for differentiation. By making programs differentiable, we inherently introduce probability distributions over their execution, providing a means to quantify the uncertainty associated with program outputs.
翻译:人工智能近期取得了显著进展,这得益于大模型、海量数据集、加速硬件,以及不可忽视的可微编程这一变革性力量。这种新的编程范式能够对复杂计算机程序(包括包含控制流和数据结构的程序)实现端到端微分,从而使得基于梯度的程序参数优化成为可能。作为一种新兴范式,可微编程建立在计算机科学和应用数学的多个领域之上,包括自动微分、图模型、优化和统计学。本书全面回顾了可微编程所需的基本概念。我们采用优化和概率两种主要视角,并在二者之间建立了清晰的类比关系。可微编程不仅仅是对程序进行微分,更是为微分而精心设计程序。通过使程序可微,我们天然地在其执行过程中引入了概率分布,从而为量化程序输出相关的不确定性提供了手段。