We propose a machine learning algorithm for solving finite-horizon stochastic control problems based on a deep neural network representation of the optimal policy functions. The algorithm has three features: (1) It can solve high-dimensional (e.g., over 100 dimensions) and finite-horizon time-inhomogeneous stochastic control problems. (2) It has a monotonicity of performance improvement in each iteration, leading to good convergence properties. (3) It does not rely on the Bellman equation. To demonstrate the efficiency of the algorithm, it is applied to solve various finite-horizon time-inhomogeneous problems including recursive utility optimization under a stochastic volatility model, a multi-sector stochastic growth, and optimal control under a dynamic stochastic integration of climate and economy model with eight-dimensional state vectors and 600 time periods.
翻译:本文提出一种基于深度神经网络最优策略函数表示的机器学习算法,用于求解有限时域随机控制问题。该算法具有三个特征:(1) 能够求解高维(例如超过100维)有限时域非时齐随机控制问题;(2) 每次迭代具有性能改进的单调性,从而具有良好的收敛特性;(3) 不依赖于Bellman方程。为验证算法效率,我们将其应用于求解多种有限时域非时齐问题,包括随机波动率模型下的递归效用优化、多部门随机增长问题,以及气候与经济动态随机整合模型中八维状态向量、600个时间周期的最优控制问题。