We study whether a risk-sensitive objective from asset-pricing theory -- recursive utility -- improves reinforcement learning for portfolio allocation. The Bellman equation under recursive utility involves a certainty equivalent (CE) of future value that has no closed form under observed returns; we approximate it by $K$-sample Monte Carlo and train actor-critic (PPO, A2C) on the resulting value target and an approximate advantage estimate (AAE) that generalizes the Bellman residual to multi-step with state-dependent weights. This formulation applies only to critic-based algorithms. On 10 chronological train/test splits of South Korean ETF data, the recursive-utility agent improves on the discounted (naive) baseline in Sharpe ratio, max drawdown, and cumulative return. Derivations, world model and metrics, and full result tables are in the appendices.
翻译:我们研究资产定价理论中的风险敏感目标——递归效用——是否能够提升投资组合分配的强化学习性能。递归效用下的贝尔曼方程涉及未来价值的确定性等价,该等价在观测收益率下无闭式解;我们通过K样本蒙特卡洛方法进行近似,并在所得价值目标及泛化至多步且具有状态依赖权重的贝尔曼残差的近似优势估计上训练演员-评论家算法(PPO、A2C)。该公式仅适用于基于评论家算法的方案。在韩国交易所交易基金数据的10个时序训练/测试划分上,基于递归效用的智能体在夏普比率、最大回撤和累计收益率方面均优于折现基线。推导过程、世界模型与评估指标以及完整结果表格详见附录。