Portfolio optimization tasks describe sequential decision problems in which the investor's wealth is distributed across a set of assets. Allocation constraints are used to enforce minimal or maximal investments into particular subsets of assets to control for objectives such as limiting the portfolio's exposure to a certain sector due to environmental concerns. Although methods for constrained Reinforcement Learning (CRL) can optimize policies while considering allocation constraints, it can be observed that these general methods yield suboptimal results. In this paper, we propose a novel approach to handle allocation constraints based on a decomposition of the constraint action space into a set of unconstrained allocation problems. In particular, we examine this approach for the case of two constraints. For example, an investor may wish to invest at least a certain percentage of the portfolio into green technologies while limiting the investment in the fossil energy sector. We show that the action space of the task is equivalent to the decomposed action space, and introduce a new reinforcement learning (RL) approach CAOSD, which is built on top of the decomposition. The experimental evaluation on real-world Nasdaq-100 data demonstrates that our approach consistently outperforms state-of-the-art CRL benchmarks for portfolio optimization.
翻译:投资组合优化任务描述了投资者将财富分配至一组资产的序贯决策问题。分配约束用于强制对特定资产子集进行最小或最大投资,以控制诸如因环境因素限制投资组合特定行业敞口等目标。尽管约束强化学习(CRL)方法能在考虑分配约束的同时优化策略,但可观察到这些通用方法会得到次优结果。本文提出一种基于将约束动作空间分解为无约束分配问题集的新型分配约束处理方法。我们重点研究了双约束情形下的该方法——例如投资者可能希望将投资组合中至少一定比例投资于绿色技术,同时限制化石能源领域的投资。我们证明了该任务的动作空间等价于分解后的动作空间,并提出了基于该分解的新型强化学习(RL)方法CAOSD。基于真实纳斯达克-100数据的实验评估表明,该方法在投资组合优化任务中持续优于现有最优的CRL基准方法。