We develop a new continuous-time stochastic gradient descent method for optimizing over the stationary distribution of stochastic differential equation (SDE) models. The algorithm continuously updates the SDE model's parameters using an estimate for the gradient of the stationary distribution. The gradient estimate is simultaneously updated using forward propagation of the SDE state derivatives, asymptotically converging to the direction of steepest descent. We rigorously prove convergence of the online forward propagation algorithm for linear SDE models (i.e., the multi-dimensional Ornstein-Uhlenbeck process) and present its numerical results for nonlinear examples. The proof requires analysis of the fluctuations of the parameter evolution around the direction of steepest descent. Bounds on the fluctuations are challenging to obtain due to the online nature of the algorithm (e.g., the stationary distribution will continuously change as the parameters change). We prove bounds for the solutions of a new class of Poisson partial differential equations (PDEs), which are then used to analyze the parameter fluctuations in the algorithm. Our algorithm is applicable to a range of mathematical finance applications involving statistical calibration of SDE models and stochastic optimal control for long time horizons where ergodicity of the data and stochastic process is a suitable modeling framework. Numerical examples explore these potential applications, including learning a neural network control for high-dimensional optimal control of SDEs and training stochastic point process models of limit order book events.
翻译:我们提出了一种新的连续时间随机梯度下降方法,用于优化随机微分方程(SDE)模型的平稳分布。该算法利用平稳分布梯度的估计值,持续更新SDE模型参数。梯度估计通过前向传播SDE状态导数同步更新,并从渐近意义上收敛至最速下降方向。我们严格证明了线性SDE模型(即多维Ornstein-Uhlenbeck过程)在线前向传播算法的收敛性,并给出了非线性案例的数值结果。该证明需要分析参数演化相对于最速下降方向的波动。由于算法的在线特性(例如,平稳分布将随参数变化而持续改变),波动界限的推导具有挑战性。我们针对一类新型泊松偏微分方程(PDE)的解证明了界限,并利用这些界限分析了算法中的参数波动。本算法适用于一系列数学金融应用场景,包括SDE模型的统计标定以及基于长时间尺度遍历性与随机过程建模框架的最优随机控制问题。数值实验探索了这些潜在应用,包括学习高维SDE最优控制的神经网络控制器,以及训练限价订单簿事件的随机点过程模型。