We propose a machine learning algorithm for solving finite-horizon stochastic control problems based on a deep neural network representation of the optimal policy functions. The algorithm has three features: (1) It can solve high-dimensional (e.g., over 100 dimensions) and finite-horizon time-inhomogeneous stochastic control problems. (2) It has a monotonicity of performance improvement in each iteration, leading to good convergence properties. (3) It does not rely on the Bellman equation. To demonstrate the efficiency of the algorithm, it is applied to solve various finite-horizon time-inhomogeneous problems including recursive utility optimization under a stochastic volatility model, a multi-sector stochastic growth, and optimal control under a dynamic stochastic integration of climate and economy model with eight-dimensional state vectors and 600 time periods.
翻译:我们提出了一种基于深度神经网络表示最优策略函数的机器学习算法,用于求解有限时域随机控制问题。该算法具有三个特点:(1) 能够求解高维(例如超过100维)且有限时域时间非齐次的随机控制问题。(2) 每次迭代具有性能改进的单调性,从而具有良好的收敛特性。(3) 不依赖于贝尔曼方程。为验证算法的效率,我们将其应用于求解多种有限时域时间非齐次问题,包括随机波动率模型下的递归效用优化、多部门随机增长问题,以及一个具有八维状态向量和600个时间段的动态随机气候-经济综合模型下的最优控制问题。